Source of BibTex data in a PDF file.
I linked a PDF file into Zotero. It subsequently displayed BibTex data for the article. How do I know if that data came from within the PDF or from web sources. (I don't want the web sources to creep in.) I realize the data is rarely in a PDF, but I need a way of telling.
Regards,
Lou.
Regards,
Lou.
It attempts to identify the PDF and retrieve high-quality, canonical metadata from the publisher. I don't know what you're trying to imply by "web sources", but if you're using Zotero you presumably want high-quality data for your items.
https://www.zotero.org/support/retrieve_pdf_metadata
I link a PDF to a library. Zotero first displays, in the rightmost pane, a 'Note' frame. containg the title, filename, #pages, modified date, Indexed indicator, and Realted[click here] and Tags[click here]. I will call this metadata since it is quite limited.
Then seconds later much more information is displayed in the rightmost pane: Item type, Title, Authors, abstract Publication, and later in the list "DOI.org" which I assume is the source of all the above dataq. This is its own 'bib data' as I choose to call it now.
That bib data may exist in the PDF file and Zotero may have spent a few seconds retieving it from the PDF file and formatting it for display, or Zotero may have retrieved it from the web somewhere, perhaps from DOI.org. (Documentation states that Zotero does go out to the web and somehow finds this data.)
I need to know unambiguously where this data comes/came from. How would I know which of these scenarios happened?
Thanks.
Lou.
But that begs the question: Isn't the XMP data the same as what is often called metadata; not the same values, but the same kinds of information?
Practically, XMP tags in scholarly PDFs are so frequently useless and even misleading that it's not worth it trying to parse them in the (imo correct) judgment of Zotero's developers.
All this goes to long term research: re-using PDFs in later papers, and being able to regenerate, in the future, old published papers in their exact submission content, and yet use the old PDFs, without the old metadata, for a new paper. See a web page about using Docear: 'Sustainable Research...Part II" by Saul Albert. Later in the document, there is a para numbered "3" which discusses a way to structure information for multi-project use.
https://saulalbert.net/blog/sustainable-research-literature-management-with-docearii/
If you're asking how to ensure the long-term usability of entries in Zotero, that's a question about the most meaningful unit. For Zotero, that's not a PDF, but the metadata entry for an item: items can have no attachments, attachments in different file formats, or no attachments at. The way Zotero ensures longterm usability is by making sure the metadata entries are easily exportable (along with attached files) in a large number of standard file formats.
What is to believe? You haven't answered the question. Someone says the PDF has no metadata, yet it you say it has, but its poor. What I want to know is: How can I tell what Zotero is displaying, the internal data or the current web-sourced data?
Thanks for your help.
Don't let that little COVID thing getchya. :).