Retrieve metadata from ProQuest PDFs
Many ProQuest databases can export full texts. Either in PDF, TXT or RTF, they have metadata, BUT in the very end of the document. For example, in PDF it looks like this:
https://s15.postimg.cc/gpx1kkf8r/proquest.png
The problem here is that Zotero can't retrieve this metadata automatically. It's a shame, as it is really well-structured and full. Just in case: I couldn't find an option to move it to the beginning of the document. On the first page there's still a small header with author, title and date of publication in the beginning, but that's all there's to it. So:
1.Isn't there some way to det metadata from ProQuest PDFs? Perhaps I'm missing something? Maybe there's a workaround?
2.In case there is - another thing. PQ also provides ablity to save articles in 20s, 50s and100s, but they are saved as a single file, and metadata table is shown at the end of each respective text. Is there a way to read one PDF and get multiple items? I, of course, doubt so. Fortunately, PDFs have bookmarks and can be easily split, but it'd still be so cool to just load one PDF and get multiple entries...
https://s15.postimg.cc/gpx1kkf8r/proquest.png
The problem here is that Zotero can't retrieve this metadata automatically. It's a shame, as it is really well-structured and full. Just in case: I couldn't find an option to move it to the beginning of the document. On the first page there's still a small header with author, title and date of publication in the beginning, but that's all there's to it. So:
1.Isn't there some way to det metadata from ProQuest PDFs? Perhaps I'm missing something? Maybe there's a workaround?
2.In case there is - another thing. PQ also provides ablity to save articles in 20s, 50s and100s, but they are saved as a single file, and metadata table is shown at the end of each respective text. Is there a way to read one PDF and get multiple items? I, of course, doubt so. Fortunately, PDFs have bookmarks and can be easily split, but it'd still be so cool to just load one PDF and get multiple entries...
You won’t get PDFs initially, but you can still add them automatically. The feature is currently in the Zotero beta, so install that. Then, select a small batch of items without PDFs, right click, and choose Find Available PDFs. Zotero will then download and attach the PDFs that you have access to at the item DOI/URL.
See:
https://www.zotero.org/support/getting_stuff_into_your_library#large-scale_imports_from_databases
The issue here is that most full texts I run across are HTML pages, and while connector can save them one at a time, it still can't do batches. I'm not talking of successive (either quick or slow-paced) downloads - it's even the first 20-ish bulk in two days (or even the first I've tried in about half a year).
So I thought that PQ built-in export feature, like you suggested, might do. BUT it can't download both texts and Zotero-readable collection file (e.g. in RIS) simultaneously. The latter does not have attached texts (only linked web pages). Text files DO have metadata inside of them, but Zotero can't read it.
I've tried a 'middle ground approach' - exporting both and then syncing. So I found out that Mendeley (not Zotero:( ) is capable of merging items in a collection with multiple files - provided the latter have names similar to respective entries. Though after several hours of turmoil I was not exactly successful - most items did merge properly, but about 1/3 didn't. So this is not an option sadly.
But if Proquest does have an html but not a PDF full text, Zotero should try to get that. Do you have an example?
I guess I could send you several PDFs if you wanted to take a look.