Best way to match 200+ PDF with metadata
Hello,
I've recently moved from another reference organiser/manager and in the process ended up with hundreds pdfs that Zotero matching magic didn't recognise. I have records with the correct title and first author name - and nothing else. The CrossRef and Google Scholar lookups do not work on them, even though when I just copy-paste the title to Google Scholar it finds the correct paper.
I do not want to manually match all these papers. Is there any way to do this automatically? I was thinking that Zotero Storage solution would have some sort of matching service built-in (like Apple's iTunes Match), where the messy records would be straightened out based on other records in the cloud. But I don't think they do that.
Any ideas would be appreciated...
Best
yot
I've recently moved from another reference organiser/manager and in the process ended up with hundreds pdfs that Zotero matching magic didn't recognise. I have records with the correct title and first author name - and nothing else. The CrossRef and Google Scholar lookups do not work on them, even though when I just copy-paste the title to Google Scholar it finds the correct paper.
I do not want to manually match all these papers. Is there any way to do this automatically? I was thinking that Zotero Storage solution would have some sort of matching service built-in (like Apple's iTunes Match), where the messy records would be straightened out based on other records in the cloud. But I don't think they do that.
Any ideas would be appreciated...
Best
yot
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
You may want to consider redoing the transfer if possible, but in any case, were the papers not being recognized academic papers, or something else? Generally speaking, academic papers — at least modern ones — should have a pretty high recognition rate, and it should be close to 100% for ones with a DOI on the first page. Other documents that just happen to be in PDF format wouldn't be recognized beyond possibly title and author. We do use cloud data to help recognize files, but many/most academic papers are watermarked, so we don't bother trying to match based on the exact file hash.
well. I think there are two non-mutually exclusive reasons for this issue - 1) I have already had this many unannotated pdfs (my library is over 5000 papers, almost all of them academic life science journals) and didn't realise it; 2) the unannotated papers are not annota-table automatically. Many of those are News & Views types of articles (with embedded titles often different from published ones) and many other are old pdfs - simply scanned papers with no OCR or any metadata embedded in them.
But I do think that some non-trivial proportion of those papers were annotated by me before. I was a (devout) Papers user for 10+ years and I did export my entire library as BibTex before importing it into Zotero. I think I will have to do it again for the papers that are unannotated...
As a side note, in my experience (and I am a hoarder of papers) it is not that unusual to have reference manager fail at annotating. Zotero and Papers are quite similar in this respect in my experience.
If that problem (cloud-based matching of metadata) could be solved, that would be a massive advantage for all users...
Thanks again,
Jarek
If there's a specific PDF that you think should be recognized automatically but that isn't, you can link to it here (if it's available publicly) or email it to support@zotero.org with a link to this thread.
https://www.zotero.org/support/locate
For example, use "Google Scholar - Title Only":
https://github.com/bwiernik/zotero-tools/blob/master/engines.json
This will not be automatic, but it might help accelerate the process of finding matching metadata. Running OCR software on those pdf files could help with automatic metadata retrieval.