Error in metadata extraction
Hello,
When I retrieved the metadata from the following PDF file available here http://ec.europa.eu/environment/marine/pdf/9-Task-Group-10.pdf I got the metadata from another file availalable here http://ec.europa.eu/environment/marine/pdf/8-Task-Group-9.pdf.
I would like to understand which information in the pdf file lead Zotero (and Google Sholar) to the wrong file.
Thank you for your help.
F
When I retrieved the metadata from the following PDF file available here http://ec.europa.eu/environment/marine/pdf/9-Task-Group-10.pdf I got the metadata from another file availalable here http://ec.europa.eu/environment/marine/pdf/8-Task-Group-9.pdf.
I would like to understand which information in the pdf file lead Zotero (and Google Sholar) to the wrong file.
Thank you for your help.
F
If you want to look at what exactly, you can have debug output run during the retrieve metadata process:
https://www.zotero.org/support/debug_output
and look at it - there will be a fair amount of irrelevant stuff, but the google scholar query is easy to spot.
(Zotero first looks for a DOI (CrossRef) or an ISBN (WorldCat) but I think those are unlikely here).
- Zotero couldn't get a result from CrossRef
- Zotero didn't try anything with the ISBN (it might have not found it, even if it seems to be obvious in the document text)
- Zotero then generate a query string for Google Scholar. And this result lead it to a wrong document.
My question is : is the text for the query string selected randomly ?
In my example, it is 2 text strings selected from the document preface.
Thanks for your explanation.
F
As for the phrase used in google scholar--they're relatively random (though not in the technical sense: you'd get the same sentence every time you try this). To be precise they're lines within 6 characters of the median line length that appear in the first column of the text (if there is multi-column text).
There's been talk of excluding the first x% of the document to avoid getting prefaces and the like, but I believe that has never happened.