Duplicate detection and merging using hashes
Is there a tool out there that finds and merges pdfs based on hashes so as to eliminate duplicates? Has there been any discussion on creating such a tool?
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
We'd take a patch to automatically remove one of two identical PDFs when merging duplicates (and to scan existing sibling attachments), but it would have to somehow deal with any other metadata that might exist on the attachment item (different titles, different filenames, tags, attachment notes).
Along with the above complications, many PDFs are watermarked, which is why this has never been a higher priority.