Zotero Mislabels Same-Title Articles as Duplicates — A CV Fix?
I've encountered a recurring issue in Zotero where articles sharing the same title—but from different issues and volumes of the same journal—are incorrectly flagged as duplicates. This can lead to accidental merging or deletion of valid records in academic libraries.
To improve accuracy, I suggest incorporating computer vision (CV) techniques to distinguish articles by analyzing their PDF content and snapshot images. A viable approach could involve applying Siamese Neural Networks or Perceptual Hashing (pHash) to compare visual or structural content rather than relying solely on metadata like titles.
Such enhancement would help ensure Zotero correctly identifies truly duplicate entries and avoids undermining research integrity through false positives.
Would love to hear if others have experienced this—or if any developers are working on similar features!
To improve accuracy, I suggest incorporating computer vision (CV) techniques to distinguish articles by analyzing their PDF content and snapshot images. A viable approach could involve applying Siamese Neural Networks or Perceptual Hashing (pHash) to compare visual or structural content rather than relying solely on metadata like titles.
Such enhancement would help ensure Zotero correctly identifies truly duplicate entries and avoids undermining research integrity through false positives.
Would love to hear if others have experienced this—or if any developers are working on similar features!
-
AbeJellinekA manual "mark items as different" action is planned. We already hash attachment content to decide whether to merge PDFs, but that wouldn't help when the potential duplicates don't have any attachments, and running it on every single duplicate set would be very slow.