Two simple improvements for Duplicate Items (#papercut)
While the current implementation of Duplicate Items probably works reasonably well for many users, in large libraries, it quickly becomes unusable. In my >13.000 item library, the Duplicate Items view is permanently populated by 149 items, most of which are true non-duplicates (e.g. different editions, articles that come in multiple parts, books that come in multiple volumes, etc.), and another sizable subset of which are near-duplicates that are unresolvable because of item type differences. Any newly introduced simple full duplicates are almost impossible to spot in this mess.
Here are two simple improvements for Duplicate Items that would make life easier for people like me:
1. Allow users to mark items as non-duplicates. Allow users to hide items from the Duplicate Items to avoid the Duplicate Items pane filling up with false positives that are not actionable. See this recent thread but also this this ancient comment by @danstillman, where the hope still was that the detection algorithm would improve soonish. A rough solution would be to simply prevent items marked as non-duplicates from showing up in the Duplicate Items view. (A problem with that could be that newly introduced duplicates to those items would then also not show up, but for such boundary cases the benefits outweight the cost.)
2. Allow resolving near-duplicates with item type differences. Near-duplicates where only item type differs are show under Duplicate Items but do not permit any action because 'Merged items must be of the same type'. See this ancient message describing the problem. Note that this is increasingly common with the rise of preprint servers, which have items starting out as preprints and later coming out as papers. It looks like the existing version resolution dialog can already display (and therefore handle) item type differences, so little extra UI work seems needed (can't speak to what needs to be done under the hood to make this happen of course).
Thanks for considering these papercuts!
Here are two simple improvements for Duplicate Items that would make life easier for people like me:
1. Allow users to mark items as non-duplicates. Allow users to hide items from the Duplicate Items to avoid the Duplicate Items pane filling up with false positives that are not actionable. See this recent thread but also this this ancient comment by @danstillman, where the hope still was that the detection algorithm would improve soonish. A rough solution would be to simply prevent items marked as non-duplicates from showing up in the Duplicate Items view. (A problem with that could be that newly introduced duplicates to those items would then also not show up, but for such boundary cases the benefits outweight the cost.)
2. Allow resolving near-duplicates with item type differences. Near-duplicates where only item type differs are show under Duplicate Items but do not permit any action because 'Merged items must be of the same type'. See this ancient message describing the problem. Note that this is increasingly common with the rise of preprint servers, which have items starting out as preprints and later coming out as papers. It looks like the existing version resolution dialog can already display (and therefore handle) item type differences, so little extra UI work seems needed (can't speak to what needs to be done under the hood to make this happen of course).
Thanks for considering these papercuts!
Besides the two things I noted above, I will add a third, which is by far the quickest to implement:
3. Hide duplicate items that cannot be resolved from the Duplicate Items pane. The rationale is that the Duplicate Items pane as it is currently is unusable in any library of reasonable size. As long as we can't merge near-duplicates of different types it doesn't make sense to taunt me with their existence and their listing crowds out the actual duplicates I do want to resolve.