Retrieve PDFs Metadata > wrong metadata > source ?

Hello,
I hope not wasting your time with my issue.
Following a retrieval of wrong metadata, I would like to understand how Zotero retrieves the metadata of PDF.
My case : it’s a freely downloadable PDF from the website of the French journal Glottopol
http://glottopol.univ-rouen.fr/telecharger/numero_8/gpl8_04tran.pdf
If I save the PDF in Zotero and ask for metadata retrieval, it works but I get the metadata that refer to a citation which is the subject of a report in the journal issue :
Bertucci M-M, Houdart-Merot V. Situations de banlieues: enseignement, langues, cultures. Lyon: Institut national de recherche pédagogique; 2005.

If I save the PDF from Google Scholar, I get the correct metadata :
Tran TD. SYSTEME DE RECHERCHE D’INFORMATION MEDICALE PAR CROISEMENT DE LANGUES: VIETNAMIEN-FRANÇAIS-ANGLAIS. [cité 11 mars 2016]; Disponible sur: http://glottopol.univ-rouen.fr/telecharger/numero_8/gpl8_04tran.pdf

I though Zotero queries the Google Scholar database.
Does it mean that the source of wrong metadata comes from the journal ? (if so, I will inform the journal’s webmaster).

Thank you for clarifying.
  • The reason for the wrong metadata is the ISBN on the first page. Zotero assumes (usually correctly) that ISBNs on the first page or two refer to the PDF itself, not to another work. It looks those up on Worldcat.
    (And journals can affect how Zotero imports from their website, but never how Retrieve Metadata works, so no, don't report problems with that to the journal.)
  • Thank you for replying so quickly !
  • This is an example of wrong metadata retrieval that is not related to wrong ISPN
    https://www.researchgate.net/publication/262198643_althlyl_alaly_llwqf_walabtda_fy_nsws_allght_alrbyt_alhdytht_walklasykyt_Automatic_Analysis_of_Phrase-Break_Prediction_for_Arabic

    this was repeated problem at researchgate
    no ISPN at metadata, but info retrieved wrong from google scholar


    Text analysis and word pronunciation in text-to-speech synthesis
    Type Journal Article
    Author Mark Y. Liberman
    Author Kenneth W. Church
    URL https://www.researchgate.net/profile/Mark_Liberman2/publication/230876257_Text_Analysis_and_Word_Pronunciation_in_Text-to-Speech_Synthesis/links/56550e3508ae1ef9297700a4.pdf
    Pages 791–831
    Publication Advances in speech signal processing
    Date 1992
    Accessed 4/23/2017, 11:08:44 AM
    Library Catalog Google Scholar
    Date Added 4/23/2017, 11:08:44 AM
    Modified 4/23/2017, 11:08:44 AM
    Attachments
    althlyl-alaly-llwqf-walabtda-fy-nsws-allght-alrbyt-alhdytht-walklasykyt-Automatic-Analysis-of-Phrase-Break-Prediction-for-Arabic.pdf
  • the metadata were retrieved by downloading the attachment to zotero
  • Google Scholar openly acknowledges that theit metadata is not curated and that by the nature of how records are brought into the GS service there will be errors. I strongly recommend that GS users follow the link to the source and import metadata from the original site. The original site should always provide metadata that is more accurate and complete than metadata direct from GS.
  • the file is downloaded from researchgate, zotero automatically create parent item for the file and retrieve metadata from google scholar using factors from the file, mostly name and ISPN if available and many other I don't know.
    most files available at researchgate couldn't be retrieve from thier sources, maybe some metadata are openly available but not the full paper.
  • You should use the Zotero button in the browser while on the Researchgate site. That will usually get much better metadata than through Google Scholar. Retrieve Metadata is almost never the best way to get item metadata:
    https://www.zotero.org/support/getting_stuff_into_your_library
  • > Google Scholar openly acknowledges that theit metadata is not curated

    Do you have a reference for this? An authoritative statement would be extremely useful for me right now!
  • that is strange @adamsmith for sure zotero retriev meta data for the Item
    this is other example just now

    https://www.researchgate.net/publication/280976086_An_Interactive_Speech_Web_Site_in_Arabic_and_English
    Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition
    Type Book
    Author Daniel Jurafsky
    Author James H. Martin
    Series Prentice Hall series in artificial intelligence
    Edition 2. ed., Pearson internat. ed
    Place Upper Saddle River, NJ
    Publisher Prentice Hall, Pearson Education Internat
    ISBN 978-0-13-504196-3
    Date 2009
    Extra OCLC: 263455133
    Library Catalog Gemeinsamer Bibliotheksverbund ISBN
    Language eng
    Short Title Speech and language processing
    # of Pages 1024
    Date Added 4/24/2017, 12:22:55 AM
    Modified 4/24/2017, 12:22:55 AM
    Tags:
    Automatic speech recognition
    Automatische Spracherkennung
    Computational Linguistics
    Computerlinguistik
    Lehrbuch
    Natural language processing (Computer science)
    Notes:
    Literaturverz. S. 945 - 994
    Attachments
    An-Interactive-Speech-Web-Site-in-Arabic-and-English.pdf
  • So in this case it's grabbing an ISBN from the bibliography. We may be able to try to prevent that -- it's only looking for ISBNs on the first 10 (I think) pages, so this wouldn't happen for regular length articles.
  • what I'm doing is download from the download link then save to zotero with retrive metadata from file
    @bwiernik if you use the default save to zotero using embeded metadata it will return nothing and save web page and will download nothing
    if you use "DOI" it will retrieve metadata from google scholar or other library and may save the file only if it's available publicly by publisher

    using download will save the file to zotero and create item which is correct as type and most of he time correct title, but too many mistakes for the authors and the publishing date
  • I'd avoid importing from ResearchGate whenever possible. Import the article via DOI or from the publisher, then manually attach the PDF. I'm pretty sure you'll save time&trouble in the long run.

    Zotero should have relatively few false positives using retrieve metadata, but it'll often get details wrong, especially as there are frequently multiple versions of (essentially) the same paper that it has no way of distinguishing.
  • Academia and researchgate is not normal publisher, the articles posted here usually 90+% of the time by the author may or may not it's allowed by the publisher.
    if it's allowed you could sometimes find the original using DOI or google scholar but sometimes also cannot and you need to open the orignal site and maybe login or create account
    for some reason the cooperation between academia or researchgate and zotero is around zero, maybe some competition, I don't know
    but DOI working on researchgate "not academia" and tis is good that means considered a library and there is dedicated translator, maybe need some adjustment
  • I understand what RG and Academia.edu are. They just don't provide much, if anything, structured enough to base a translator on, which is why none exists.
  • @adamsmith that is correct in a way, you can't reference RG at paper or thesis, but when you are looking for extensive knowledge to grasp a topic and you are not paying tens of thousand $ for every semester to a famous univ where they give you access to anything you need. RG and academia is a must, you need them to find enough knowledge then referencing is something easy
    normally when you create a paper, or thesis you need at least 5-10 times what you are going to reference in your dissertation
  • no, that's not what I mean. I really do understand the scholarly publication landscape. What I am saying is that RG and Academia don't provide enough information on their pages for Zotero to construct a reasonably well-working translator to import it.

    Almost all publishers allow you to import the item's information in front of their paywall. What I'm advising you to do is to import that and then attach the PDF from wherever you get it -- RG, Academia, unpaywall button, or SciHub.
  • I tried to retrieve meta data for very old pdfs I downloaded ages ago, Zotero manage to retrieve metadata for 26 out of 47 pdfs, didn't check the accuracy of metadata yet, but notice that name of the file has huge effect on the procedure
    duplicate files with deferent names get deferent result
    This is not an important issue for zotero, but it implicate some adjustment to the procedure could be done and enhance it, and for some short-named pdfs zotero manage to retrieve the full article name.
  • That's just not possible. The code that runs for retrieve metadata does never so much as load the filename into memory. In only looks inside the file. Something else is going on.
Sign In or Register to comment.