Merge - auto function?
How about a right-click option on the Duplicate Items folder - 'Merge all'?
It's just that I've got a couple of thousand items in Duplicates (how?? I can't be _that_ stupid) and I'm just click-click-clicking through them. Some are different types which need reassigning and they couldn't be automatically merges but the others - well, that's what computers are for, isn't it?
It's just that I've got a couple of thousand items in Duplicates (how?? I can't be _that_ stupid) and I'm just click-click-clicking through them. Some are different types which need reassigning and they couldn't be automatically merges but the others - well, that's what computers are for, isn't it?
The main reason why there isn't a Merge All function at the moment is because merging can lead to data loss if the two versions of an item have different content in the fields (e.g., one has only author initials rather than full names). Such conflicts require human input to choose which version of the fields to keep. Though it would be nice if there were at least an option to merge all truly identical items.
There's essentially two flavors this could take:
1. Merging all duplicates. The most intuitive options, but this has the particular problem of false positives, which currently can't be marked as such -- and merging items that are not actually identical can be quite the disaster, leading e.g. to missing items in the database and to potentially the wrong item being cited in a document (because Zotero cites the merged item). So yes, very convenient, but the risk here is quite high.
2. Merging all _selected_ duplicates. You have 100 items that you _know_ are duplicates and you want to merge. Being able to do so with 1 click instead of 100 seems convenient, but this is tricky from a UX point of view -- currently clicking "Merge" with multiple items merges them all into one item. This is different and possibly confusing. Whether that one is worth the trouble might be a question on how often such a function is helpful.
I guess I'm not sure about this -- I only get manually added duplicates and those are easily enough managed. The target group for the auto option would seem to be more systematic reviews etc., so bwiernik, you may have a clearer sense of what would be helpful.
(2) Would be great. Particularly for systematic review merging. For this to work, I think there would need to be a check mark column in the Merge Duplicates view that would enable users to check which duplicate sets to apply merging to (one check box per duplicate set). (Potential duplicates would also have to always be shown together, unlike now.) There should be a select all visible check box at the top row, with a pop up warning the user to verify that all selected items are truly duplicates (with an option to not show again). This would certainly save me a bunch of time.
And a warning, with an explanation of what is about to happen, would be appropriate.
http://imglf3.nosdn.127.net/img/M3B0VGdGQVdIRGdZZ0d2NFFwSEZPdG1lTDY0UFQxMGd0ZmRnNGVzcE95TjBOZUY0cURVa29BPT0.png?imageView&thumbnail=500x0&quality=96&stripmeta=0
It is important to assess the quality of each item before you merge them.
Also, carefully read the November 5th comments by @adamsmith and @bwiernik above. Improper merging can lead to disaster.
One more thing, it is the opposite of helpful to ask the same thing in multiple threads. The developers and key volunteers read them all. I (far from a developer) answered you in one of the threads but other people with similar concerns to yours may not find my reply because they don't look at all the relevant threads.
In my experience, Zotero serves primarily as a way to organize and access papers across computers and devices. In the end, I will only cite a small fraction of the entries that go into Zotero. When they are cited, I simply review the citations for errors, which needs to be performed regardless of the purity of the database.
The daily usability of the product would be improved tremendously with automatic merger (perhaps based on a slightly different algorithm). Any edge cases identified could then be brought to the attention of the developers for possible improvement of the algorithm. And there would of course continue to be some errors or inconsistencies, but that would be a fraction of the current number of potential duplicates.
It would mean that you have items inexplicably disappearing from people's database (because they're merged with the wrong item) and in a worst case scenario would mean that the wrong reference would end up in an article/manuscript (because the correct one was incorrectly merged to a different one). So the trade-off isn't between convenience and a pristine database, but between convenience and lost data and a potentially broken scientific record, which I hope you'll agree should be weighed quite heavily.
If I perform a systematic review search and want to add 500 references into Zotero, I have no way of knowing if some references are in Zotero without leaving the webpage, and individually searching for each reference in Zotero, then returning to the webpage and adding it to Zotero. That is 500 individual database searches, which is essentially time prohibitive.
The only alternative solution I have at present is to bring all references into Zotero, where I typically have >10% true duplicates (i.e. >50 duplicates and >100 individual items listed as duplicates. But because I now have to review hundreds of true positives, I also have to interact with all the old false positives that remain listed as duplicates (due to the unrelated issue of marking false positives). The number of false positives is still far less than <1% of the entire database, but I have to interact with them every time I add any batch of references to Zotero, which is on a daily basis. Yes, there is an issue with marking false positives, but it primarily comes up because of the underlying workflow issue, which is orders of magnitude larger in my experience.
There are alternatives to address (but not solve) the issue that provide users with greater autonomy (and thus a better user experience):
1. Allow the user to specify if they want to use auto-merge (could be a hidden setting even).
2. Allow the user to specify how they want to auto-merge by giving the user control over the fields to merge on (again, could be a hidden setting)
3. Assuming an equal weighting, allow user to specify a threshold for matching -- more than 5/8 fields is the threshold for auto-merge
4. Have a lower threshold for merging potential duplicate entries that come from the same web source or database
5. Have the Zotero connector search Zotero database to see if there is a high probability that the reference is already in the database. If there is a high probability match, display this information to the user before they add the reference to the database. (see for example Sente, Paperpile, etc.)
Thanks for any consideration you give this!
Some version of 1 makes sense, but our risk/benefit analysis differs there, likely because your focus is on the systematic review scenario.
I'd personally rather tell 1000 researchers that they'll need to do some more manual work than I'd tell 1 researcher that Zotero, automatically and unbeknownst to them, switched out a reference in one of their papers.
It was inexplicable to me that Zotero required me to manually merge identical duplicates.
When I viewed these duplicates, every single field was identical across the two records except the date-time when it was created in Zotero. (Incidentally, note that you have to already know how Zotero displays non-identical duplicates in order to tell that what you are looking at is an identical duplicate.) There is absolutely no human intelligence to be applied to the question of whether two identical records should be merged. By definition, the content in the fields is identical. If I have previously referenced one in a manuscript, then it would have been equally good if I had referenced the other. The only rational decision is to click Merge.
This case—records where every single content field is identical—is absolutely crying out for automatic merge. There is *zero* reason why a human decision should be made here. Providing there is a sufficient number of fields completed that it does represent an actual record of something, there is absolutely no case where merging such records could present a problem. If there are PDFs attached to each, and they are not identical, then just attach both—let the user decide when they view them which to keep. If necessary, change the underlying code so that Zotero supports multiple attachments, if it doesn't.
There are plenty of ways a user can end up with identical records. Key among them, which is what happened in my case, is through importing multiple previous databases, in my case from other referencing software, that were project-specific but which referenced overlapping literature. In terms of UX feedback, let me tell you I was gobsmacked that Zotero imported 100% identical records and just linked them together as likely duplicates but failed to just merge them as any user would, I suggest, have expected. This has nothing to do with edge case risks of what might happen if nearly-identical records were merged. That is an irrelevant discussion to a decision about merging identical records.
The entire, defining feature of reference management software is to reduce manual drudgery around referencing for people writing academic documents. 15 minutes a human spends merging records that are algorithmically provable as identical is 15 minutes that should not have had to be wasted. My $2: automatically merging demonstrably identical records is squarely within Zotero's reason for existing.
When puting together multiple biblographic database we HAVE TO remove duplicates in an automatic manner.
Zotero should provide a choice for doing this automatically. Obvioulsy there is no optimal way to do it and so me mistakes will occur. But THIS IS THE ONLY WAY !
Zotero should provide some choices and people will choose depending on what they want. Among obvious choice that could be provided by zotero are:
1) keep only the more recent file
2) keep the biggest (+ pdf if attached) file
3) keep the one that have more fields filled or simply add empy fields by others ..
etc..
It is inexplicable to me that Zotero do not provide such tool!!!
Exporting a BibTex-file from the web app yields me:
@article{charness2016,
title = {{The effect of charitable giving on workers’ performance: Experimental evidence}},
author = {Charness, Gary and Cobo-Reyes, Ramón and Sánchez, Ángela},
journal = {Journal of Economic Behavior \& Organization},
abstract = {We investigate how ... not paying anything at all.},
volume = {131},
pages = {61--74},
issn = {0167-2681},
eissn = {1879-1751},
doi = {10.1016/j.jebo.2016.08.009},
year = {2016},
rating = {3},
keywords = {\#ExperimentalEconomics,\#BehavioralEconomics,\#RET,\#RealEffortTask,\#RealEffortExperiment,\#LabExperiment,\#Incentives,\#Charities}
}
while exporting a BibTex-file from the desktop app yields:
@article{Charness_The_2016,
author={Charness, Gary and {Cobo-Reyes,} Ram{\'o}n and S{\'a}nchez, {\'A}ngela},
pages={61-74},
abstract = {We investigate how ... not paying anything at all.},
title={The effect of charitable giving on workers’ performance: Experimental evidence},
doi={10.1016/j.jebo.2016.08.009},
issn={0167-2681},
volume={131},
year={2016},
file={/Users/waloszec/Documents/ReadCube Media/Charness et al-2016-J Econ Behav Organ.pdf}
}
To summarize:
- neither of the two include the following features of ReadCube: "flags", "color labels" and "reading status (read/unread)"
- exporting from the web app includes "rating" and "keywords", while the desktop app includes "file".
To move as much information as possible from ReadCube to Zotero I would have to import both BibTex versions - I end up with a library of duplicates for each article. Merging both versions easily and quickly would be ideal (let alone importing the same article several times, because it is part of different compilations ...).
The "rating" given by ReadCube appears in the "Extra" field in Zotero as
"ZSCC: NoCitationData[s0]
Citation Key: charness2016
bibtex*[eissn=1879-1751;rating=3]"
– which is not very helpful.
Can anyone recommend a workaround to transfer ReadCube's "flags", "color labels" and "reading status" to Zotero, as they are not included in the BibTex or RIS export from ReadCuve?
Many thanks in advance!
Does the desktop app really export
{Cobo-Reyes,}
with the comma inside the braces? That's pretty bizarre.This is from the desktop export as bibtex:
author={Charness, Gary and {Cobo-Reyes,} Ram{\'o}n and S{\'a}nchez, {\'A}ngela},
so just as you said.
Without the desktop app there's not much I can do; it's possible to get the raw JSON data from the web app (although they seem to have removed the easy menu entry to do it) but I don't see a way to easily get the collections and attachments.
That's the nice thing about web apps for those who provide them. By default, your tenants can't move.
Maybe I can start over from scratch? Reimport all my pdfs, and somehow sync the tags (extracted from Mendeley) to the files without making duplicate entries? Any thoughts or hints on this?
Programmatically, this design makes sense to me. Zotero is a great piece of software and the community is awesome!
I wish someone with a bit more JavaScript knowledge would post a hack to allow, e.g., the merge button to be repeatedly pushed until it can't be again.