PDF metadata mismatches

Occasionally the metadata retrieved by Zotero for PDF files is not accurate.
I'll post examples of this as I come across them

A paper titled "Information and Communication Technologies, Poverty and Development" is recognized as "Non-surgical retrieval of a broken segment of steel spring guide from the right atrium and inferior vena cava".
The original file is for some reason not indexed in Google Scholar, which might indeed be the problem, but it can be found in the following locations:

http://www.sed.manchester.ac.uk/idpm/research/publications/wp/di/di_wp05.htm
http://unpan1.un.org/intradoc/groups/public/documents/NISPAcee/UNPAN015539.pdf
  • Another file not properly recognised is this:

    http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/The_Rise_of_Collective_Intelligence.pdf

    It is imported with the metadata of another report from the same series: http://www.aspeninstitute.org/sites/default/files/content/docs/pubs/A_Framework_for_a_National_Broadband_Policy_0.pdf

    (however, this doesn't happen viceversa)

    (I am reporting on these issues not as a complaint, as I love Zotero - just to help with bugs)
  • Numerous articles from ACM Transactions on Graphics result in mismatched metadata.

    One quick example is:

    Modelling and rendering of realistic feathers
  • Currently the recognizer only looks at the first two pages of the PDF, so if there's no DOI information and the full-text content doesn't start until the third page or later (e.g., if there's a table of contents), it likely either won't find anything or will return mismatched metadata.

    So the first thing we need to do is to bump up the page limit.
  • edited September 3, 2009
    Could it also possibly look for JSTOR URLs on the first page and get info from there?

    This article from 2001 came up way wrong:
    http://www.jstor.org/stable/3061243
    (Identified as as 1993 paper in the same journal)
Sign In or Register to comment.