That's not the article I retrieved metadata for

tbroman · June 7, 2011

I'm having a rather strange issue: twice now, with two completely unrelated PDFs, I have loaded these PDFs into Zotero and then used the Retrieve PDF Metadata function to look them up on Google Scholar, but instead of getting the correct information or a "no matches" result, the function has found a match in the article "The Jukes in 1915" by A. H Estabrook from the College of Law Faculty Publications 2009. Both of these PDFs were of books published in the 1800s. I'm really stumped--how do two different PDFs both result in the same wrong result?

adamsmith · June 7, 2011

could be lots of things - something on the title page, a number that is interpreted as a DOI etc. - you'd have to let us know which pdfs in particular this is about.

tbroman · June 7, 2011

For example, this pdf http://www.jstor.org/pss/30079830 downloaded from jstor (not using Zotero) seems to think it's something called "Prefrontal cortical efferents in the rat synapse on unlabeled neuronal targets of catecholamine terminals in the nucleus accumbens septi and on dopamine neurons in the ventral tegmental area" by S. R Sesack and V. M Pickel, 1992 instead of the 1849 article it actually is. I would give another example, but the Retrieve Metadata function is suddenly disabled and I have no idea why.

tbroman · June 7, 2011

http://www.ncbi.nlm.nih.gov/pubmed/1377716 is the actual Sesack and Pickel article.

adamsmith · June 7, 2011

OK - no obvious reason this shouldn't work, but I'll poke around. (I assume you're aware that normally you shouldn't/needn't use retrieve metadata for JSTOR articles).

tbroman · June 7, 2011

Yeah, I just started using Zotero, and I'm organizing some formerly disparate libraries, so there are some PDFs separated from citations and whatnot. I didn't actually download it, it was in one of the folders I'm supposed to add into the library. But the front page of the PDF that I have gives that link, so it must have come from JSTOR.

noksagt · June 7, 2011

I get no record for my copy of the JSTOR pdf. JSTOR customizes the first page of your pdf, though. In my case, there's nothing that resembles a DOI. Zotero is probably using Google scholar & finds a similar phrase in the two papers. If you enable debugging, you'd seeRecogizePDF: Query stringfollowed by this line, which is probably similar in the two cases.

sixdeaftaxis · June 14, 2011

I have seen the same issue today while trying to import my library of PDFs. The Jukes of 1915 shows up as metadata for just about every pdf that I downloaded from Google Books.

sixdeaftaxis · June 14, 2011

And the frustrating thing is that once you have looked up the metadata for an item in your unfiled list, the only way to get that metadata removed is to drag it to a collection, then open the item and drag the pdf out of the item, then delete the item from the library, then remove the pdf from the collection to get it back in Unfiled Items.

dstillman · June 14, 2011

We need example URLs.

dstillman · June 14, 2011

And you don't need to move the parent item to a collection. Just drag the child out of the parent in My Library.

sixdeaftaxis · June 14, 2011

I will try to find URLs for the PDFs I downloaded, but I'll be guessing since I've downloaded these over the past few years.

I am unable to drag the child out of the parent if it is still in Unfiled Items.

I have all of my PDFs in Unfiled Items. I right click on one, "Retrieve metadata for PDF". If it finds the correct metada, I drag the result to a collection. If it finds the wrong metadata, it will not let me drag the child out of the parent in Unfiled Items. If I move the parent to a collection first, then I can drag the child out of it.

dstillman · June 14, 2011

I'm saying to drag it out from within My Library. Unfiled Items is a virtual folder, so dragging something into it (even from under a parent item) wouldn't make sense. You can sort by Date Modified in My Library if you want.

This is getting off-topic, though.

sixdeaftaxis · June 14, 2011

Firefox just hung while trying to retrieve metadata for several dozen very large PDFs. When I restarted Firefox, Zotero is now correctly reporting "No matching references found." for those Google Books that were all coming back as The Jukes of 1915 a few minutes ago. My old-fart programming instincts sense a memory leak somewhere that only causes problems after running for a while and retreiving metadata for many large PDFs. I'm working on about 150 PDFs totalling about a gigabyte. I initially dragged them into a "to be done" collection (which took going out to buy lunch and coming back to finish), then realized it was a pain to have to keep removing them from that collection once I had filed them, so I removed them all from that collection, thus sending them to the Unfiled Items folder.

In any case, if the Jukes of 1915 start showing up again, I'll post URLs. Consider it an intermittent problem at worst for now.

tbroman · June 14, 2011

Okay, I have posted the file I referred to earlier on a filesharing website (since it seems re-downloading the article from JSTOR does not duplicate the problem). As I come across PDFs that go to The Jukes I will upload them too.

http://www.2shared.com/document/HByyNRI5/allman_homology_tunicata_polyz.html

BTW sixdeaftaxis, it might be a bad idea to retrieve the metadata for "several dozen" PDFs at once, because Google Scholar might kick you out if it thinks you're making automated requests. Although, I only use the function on one PDF at a time and I still get the Jukes error.

adamsmith · June 14, 2011

OK, I'm also getting the Sesak article from that and I do think this could be a more fundamental issue. What happens is that Zotero first looks on google scholar for:

"the lawsof" "assumes the heart as indicating the dorsal aspect of the" "cephalic ganglion, however, in those inferior members of the animal" "in which the"

and doesn't find a match. (because the space gets lost - otherwise this would work and be the only hit). So far that's too bad, but OK. But then it looks for

"and ventral"

which of course gets a gazillion hits and the Sesak piece is first. Whatever makes Zotero pick that second attempted string is probably not a good idea.

tbroman · June 14, 2011

http://www.2shared.com/document/Se2mbZpW/Eschscholtz_System_der_Acaleph.html
OCR on this text was performed with Acrobat's built-in OCR tool, with German as the language setting. Once it was loaded into Zotero, I indexed it and then ran Retrieve PDF Metadata and got The Jukes in 1915.

http://www.2shared.com/document/n0VLz9fw/gore_1825_blumenbach.html
Very similar story here, but English (US) as the language this time. (This file is very large; ~383MB.)

adamsmith · June 14, 2011

ah OK - there's a simple answer to this:
The first page of text in all google books pdfs is the google books statement. And the words that Zotero picks from that statement just happen to bring up The Jukes in 1915 as the first hit

nothing to be done about that right now, but I think it's worth adding an exemption in the code for that given the prominence of google books and the fact that it's always the same page.

tbroman · June 14, 2011

Oh, that's interesting. I think I got the Leach hit once or twice as well. Another page that would be nice to avoid searching on, if possible, is the one that says what library owns the book. (In The Jukes, it's page 4.) A few universities (Harvard among them) have provided a large number of the hardcopies for Google to digitize, so searching on those pages could get erroneous returns as well. I understand that would be harder to implement, though, since those pages aren't all the same.