Retrieving Metadata
Hi,
just in the process of moving to Zotero from an Endnote/Papers setup. I have a library of around 1500 papers. While about half have been identified and metadata attached there's still around 700 papers that the locate search engines cannot identify. It will be a real pain to do this manually. Also, in my efforts I'm also trying to use different locate engines to see if these will identify and retrieve info for the pdf. So, I have two questions:
1. Are there any other ways of batch retrieving metadata? Many of my PDFs are text based but they are not being identified.
2. I am having trouble adding new locate engines. I have followed the instructions but working with the standalone version on a Mac I cannot see an option to add a new engine via a web page and the search engine's icon.
Some help would be great.
Thanks
just in the process of moving to Zotero from an Endnote/Papers setup. I have a library of around 1500 papers. While about half have been identified and metadata attached there's still around 700 papers that the locate search engines cannot identify. It will be a real pain to do this manually. Also, in my efforts I'm also trying to use different locate engines to see if these will identify and retrieve info for the pdf. So, I have two questions:
1. Are there any other ways of batch retrieving metadata? Many of my PDFs are text based but they are not being identified.
2. I am having trouble adding new locate engines. I have followed the instructions but working with the standalone version on a Mac I cannot see an option to add a new engine via a web page and the search engine's icon.
Some help would be great.
Thanks
Can you provide an example of a PDF that's not being recognized?
here's a link to a paper I have in my library that can't be identified (just as an example, not my site):
http://ewasteschools.pbworks.com/f/Law2002ObjectsandSpacesTheoryCulture&Society.pdf
Thanks for the info about the locate engines and metadata. How is Zotero retrieving metada then?
Thanks.
Wait a number of hours, or until tomorrow, and try a single PDF again. If that works, go in smaller batches at a time.
I've come from papers and they use a number of sources to retrieve data. Is Zotero the same and is there a way to add to the sources being used?
Thanks again.
If you are coming from another bibliographic management software, there are better ways to transfer your library (along with metadata) than transferring PDFs and trying to fetch their metadata. If you're transferring from EndNote, see http://www.zotero.org/support/kb/importing_records_from_endnote
I'm not too familiar with Papers, but I believe it can export the library in either RIS or BibTeX format, which Zotero will be able to import. I'm not sure if the attachment files will be linked properly though.
Finally, PDF metadata retrieval is not a very good (or reliable) way to import metadata. If you haven't done so yet, I would encourage you to look at http://www.zotero.org/support/quick_start_guide If you already have a large collection of PDFs with no associated metadata, then this would be your only reasonable choice, but you probably don't want to make this your standard workflow.
I've been through some of this stuff but I guess I have a special case in that I have always kept my Endnote and Papers databases separate i.e. I have used endnote for references and citations only and Papers as the mechanism for managing my pdf files. My rationale was that neither was good for both but each was good at either reference or pdf management.
So, the upshot is, I am trying to bring together my endnote reference library and papers pdf library in Zotero. So far, this has meant cleaning my reference library in endnote with search and replace, importing it into Zotero and then importing the pdf's separately, using the retrieve metadata to get the best metadata for the files and then using the duplicate items to identify and resolve conflicts and duplications (I really like the duplicate items and it works better than endnote and Papers). I am now at the stage of retrieving the metadata as doing this manually, for hundreds of pdfs, would be a lot of work.
Once this is set-up I can imagine using Zotero as a 'normal' user might - that is, importing files and reference metadata on a one by one basis but for now batch processing is really going to save me a lot of time.
Any more advice would be warmly welcomed.
Thanks.
I seem to remember a discussion about slowing down the request rate to make the Zotero script less script-like. Did anything come of that? Sage also has limits to the number of records allowed within a time span.
Let me repeat what has been said before: Google Scholar is OK for finding articles but not quite so good for getting good metadata into Zotero. GS often omits authors (sometimes several authors), gets the order of authors wrong, provides incorrect dates and pagination, etc. GS will guide you to articles on a publisher's website. Much better metadata is available there.
DWL-SDCA - yes, I've experienced the inaccuracy of GS in Papers too but in my case where I want to batch match hundreds of papers doing it manually, which would be the most accurate way, implies a lot of work. I once used the British Library with endnote but they were really inaccurate and inconsistent.
If, like me, others are migrating from another setup for managing references and have lots of PDFs then some way to batch process metadata is really useful and attractive.
Maybe we can get Google to cooperate with Zotero. They don't seem too fond of Google Scholar in general though, since there is still no API for it, nor is it even included as a choice on google.com (while other specialized searches are)
I think Google Scholar used to be preferred because it was more comprehensive, but maybe things have changed?
I wouldn't be concerned about tricking GS, since Zotero doesn't actually do what they're trying to fend off (i.e. systematic scraping of their entire database). I know Simon is OK with staggering the requests.
Agree with Aurimas on the current uselessness of MS Academic for Zotero.
Either way, without full text indexing I don't see how we can use MSAS.
In short, I find that the ability to directly download metadata to Zotero from GS or MSAS is more of a nuisance than a benefit. I get grossly incomplete metadata that is also frequently wrong. If it were possible for Zotero to identify DOIs and follow them to the publishers' sites and grab metadata there, I would be really pleased.
BTW: the process of retrieving metadata is slowing down for me - perhaps google is quicker to lock me out meaning I still have hundreds of PDFs to retrieve data for.
Which makes me wonder why 700/1500 articles did not have detectable DOIs in them. Are you sure these remaining articles are OCR'ed? Alternatively, these could be old articles and actually not have DOIs.
Edit: I didn't think my statement through. Importing metadata from the web (after Googling for them) will not attach them to an existing PDF, so that's not a solution for you either.
I also migrate a large database and get this problem of limit of request.
Could I suggest to not stopping metadata retrieval in case of exceeding limit and continue the queue?
Many pdf seems to be found by either way than GS, but the process is stopped because one paper is not found by doi. I have to re-select entry, excluding the problematic one, and re-start a retrieving. Quite long and painful.
Thanks