Retrieve PDF Metadata: a suggestion
Hi,
In the old days, when one used the "Retrieve PDF Metadata" function on a PDF that did not contain a DOI, Zotero would tell you it could not retrieve anything and would leave the PDF in the central column. So this function would basically only work with PDFs of (recent) articles.
Sometimes in the past, this function was modified.
Now, if Zotero does not find an embedded identifier in the PDF file, it will nonetheless create an hollow and flawed Zotero item with just the title and the number of pages.
This is all the trickier if you have left this function in automatic mode in the "Preferences" pannel (which is the default setting).
Indeed, as soon as you drop a new PDF file in Zotero, an item is created and you might think that the job has been (well) done and forget about it. And if you don't check carefully the item that was produced, you may end up with a great number of flawed items in your library.
Students who are not seasoned Zotero users will easilly fall into that trap and realize that something's wrong only much later on, after having Zotero produce a first bibliography.
This problem especially happen when trying to retrieve metadata from thesis available online on open archive repositories. When I realized that, I decided to explicitly advise students to switch off automatic retrieval of PDF metadata.
So I think this modifcation is totally counter-productive.
I advise you to go back to the initial setting: instead of creating a flawed item for the PDF, Zotero would be much wisier to tell the user that it could not retrieve the metadata.
Thanks for your help,
Best regards,
JH Morneau
In the old days, when one used the "Retrieve PDF Metadata" function on a PDF that did not contain a DOI, Zotero would tell you it could not retrieve anything and would leave the PDF in the central column. So this function would basically only work with PDFs of (recent) articles.
Sometimes in the past, this function was modified.
Now, if Zotero does not find an embedded identifier in the PDF file, it will nonetheless create an hollow and flawed Zotero item with just the title and the number of pages.
This is all the trickier if you have left this function in automatic mode in the "Preferences" pannel (which is the default setting).
Indeed, as soon as you drop a new PDF file in Zotero, an item is created and you might think that the job has been (well) done and forget about it. And if you don't check carefully the item that was produced, you may end up with a great number of flawed items in your library.
Students who are not seasoned Zotero users will easilly fall into that trap and realize that something's wrong only much later on, after having Zotero produce a first bibliography.
This problem especially happen when trying to retrieve metadata from thesis available online on open archive repositories. When I realized that, I decided to explicitly advise students to switch off automatic retrieval of PDF metadata.
So I think this modifcation is totally counter-productive.
I advise you to go back to the initial setting: instead of creating a flawed item for the PDF, Zotero would be much wisier to tell the user that it could not retrieve the metadata.
Thanks for your help,
Best regards,
JH Morneau
The better thing to teach students would be to rely on retrieve metadata as little as possible and instead import things like theses via the repository (landing) page for the thesis instead of importing the PDF.
Entering PDF directly will almost always yield a poorer record (even when a DOI is recognized, the abstract will typically be missing).
Indeed, if students read and followed the recommended method, everything would be fine. But many of them don't and will go for the easiest & less time-consuming method for creating references.
And the fact that metadata retrieval from PDF is so quick makes it harder for the user to spot the problems.
Maybe a solution could involve some kind of feedback from Zotero in case of parent item creation not based on an identifier?
Any type of feedback would help raising the awareness of the user and have him/her check the quality of the reference Zotero produced...
1) If Zotero can't detect anything at all, it will still leave a standalone PDF. The difference is that — almost five years ago at this point — we added a completely new PDF recognition system that, if it can't find an identifier, still tries to pull out at least some basic metadata from the item, including title, authors, and page number, to minimize the amount you have to type manually. Zotero is designed to save you time, so of course it's going to do that if it can.
2) Teaching students to turn off metadata retrieval is awful, misguided advice, and I'd implore you to stop doing that. Even if we always recommend saving from an article page when possible, PDF metadata retrieval is an extremely useful feature for PDFs that do have identifiers, which is the vast majority of current academic PDFs.
The most basic thing to teach students who will be creating citations is that they need to check the metadata for each item they import, no matter what tool they use and no matter how they import the data. This is a universal rule of reference managers and is in no way specific to Zotero.
Out of thesis PDF, the setting creates an item with a "Journal article" type, and only a title and page number.
Therefore, my opinion is that this setting is currently totally insatisfactory.
And to teach students to turn off this function is, to my mind, a no brainer.
I advise the students to only use it manually on article PDFs and carefully check the result. I also advise them to systematicaly create thesis items from our national higher education catalogue.
Problem solved.
You may not agree with me, but let's agree to disagree and refrain from making derogatory remarks.
I understand the point that one should review all metadata (and it's definitely the thing to teach students), but honestly, CrossRef metadata and Zotero web translator metadata is almost always good enough for references in, e.g., an initial submission of an article, so I suspect many people don't routinely do this and flagging items where it definitely needs to happen would be quite helpful (I know I can do this with the library catalog field and a saved search).
If you don't like the auto-created item and think there's a better source, you can right-click and choose Undo Retrieve Metadata (until Zotero restart). A future version of Zotero will also make it possible to update the metadata for an existing item by adding a DOI or other identifier, for cases where that's relevant.
When there actually is high-quality metadata available somewhere, teaching them to save from there is exactly what we recommend — but then why also have them waste time by manually running metadata retrieval on other PDFs? This is a feature that retrieves high-quality metadata for the vast majority of academic PDFs and that is frequently described as "magic" and one of Zotero's best features. Teaching them to turn that off in favor of a manual process just seems like a real disservice. (Honestly, this is only a configurable setting at all for privacy reasons, since it involves a request to Zotero servers.) @adamsmith: We can certainly consider that, but I don't think it's particularly specific to the basic items discussed here — more to the general point that people should always review metadata. As far as I know, Mendeley did it for all new items, and I assume we would do the same. (This is also perhaps related to the idea of being able to auto-tag new items.)
It would signal the need for the item to be reviewed by the user.
As for the context: many of our students use web search engines and find PDF versions of thesis on open archive repositories. They save the files on their computer and that's it. They don't pay attention to the source of the file or the related metadata.
When we teach them about Zotero, they want Zotero to create the items automatically out of the PDFs they have already downloaded, because it saves them precious time. Unfortunately, as it is, the results are almost always unsatisfactory, as I explained before.
If you want to give it a try, I suggest you download a random PDF thesis from https://theses.fr/ and/or https://dumas.ccsd.cnrs.fr/ and see the result of PDF metadata retrieval.
For example:
https://theses.fr/2021CYUN1084/document
https://dumas.ccsd.cnrs.fr/dumas-01644833/document
As we do have a national online higher education union catalogue (http://www.sudoc.abes.fr/) where the thesis are properly described, the best way to proceed is to create the item from that source.