Error getting metadata
BEgan as not being able to drag. Now I can drag after installing (and later removing) stand alone version, but not settings. I noticed something got copied so that is probably still there.
Now... local files seem never to be able to pick up any metadata. And at the moment I have no luck finding an online file that can.
I just get a "ups someting happened" ... not certain how to translate the Danish error message properly back to the English.
Now... local files seem never to be able to pick up any metadata. And at the moment I have no luck finding an online file that can.
I just get a "ups someting happened" ... not certain how to translate the Danish error message properly back to the English.
Which Zotero version exactly (check under "about Zotero" in the gears menu).
And as I requested in the other thread, ideally a PDF you're trying this with.
Messages are different: http://www.uvm.dk/~/media/Publikationer/2009/Folke/Faelles%20Maal/Filer/Faghaefter/matematik_31.pdf returns: Fandt ikke referencer, der matchede
http://www.michaelfullan.ca/wp-content/uploads/2014/04/14_Spring_Maximizing-Impact-Handout.pdf returns: Der opstod en uventet fejl
The same did this: http://www.reflexen.learning.aau.dk/digitalAssets/66/66266_en_paedagogisk-didaktisk_praesentation.pdf
And this: 118280219-Berinderjeet-Kaur-Yeap-Ban-Har-Manu-Kapur-Mathematical-Problem-Solving-Yearbook-2009-AME-Association-of-Mathematics-Educators-World-Scient.pdf (local file)
While another local file returns: PDF inderholder ikke tekst genkendt ved OCR
but it does. Here is a copy from a random spot: School profiles
I think that might fix the unexpected error (der opstod...)
The first and third error mean what they say: Zotero doesn't find metadata for the first one and the last one, I assume, is a scan or a read-protected PDF, so Zotero doesn't find any text.
One os the files from this test - the top most - that could be read and referenced... well I had it as a pdf file from internet (I have picked it several times today) and despite this is the same as I have just fetch metadata from it will not fetch metadata this time.
Never mind I am going to delete that one, but there is something that look strange. And I still got at lot of pdfs that presumeably can't find metadata, but I get the feeling that another day to another time they will.
looks like something unusual might be going on there that could use another look.
For metadata extraction, Zotero takes text from the first 7 pages of a PDF (in the case of that PDF, it covers the ToC and the preface). It then looks for DOI in the first 80 lines of that text (the PDF in question doesn't contain a DOI). Then it looks for ISBN (again in the first 80 lines). In this case, there is an ISBN 978-87-7958-796-0, which Zotero detects correctly, but is unable to find it registered in Library of Congress, WorldCat, or Lulu.
At this point, Zotero proceeds to look for lines of text that are suitable for a full-text search via Google Scholar. It does some magic taking only text in the first column (i.e. not preceded by a tab) of the line that is longer than 3 words (the column selection is a good idea for journal articles that tend to be printed in multi-column format). If Zotero finds at least 20 such lines it proceeds to query Google Scholar with some of them. The PDF in question contains only 19 lines that pass the cleanup, so Zotero assumes that the PDF is not OCR'ed and those lines are just some junk, like "This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project", which could increase false-positive hits. Though maybe the language for the error could be changed to something like "Zotero could not find enough usable text to retrieve metadata. Make sure the PDF has been OCR'ed."
Generally, this logic works ok, because having extracted text from 7 pages of a PDF, you would hope to have quite a few lines of text with more than 3 words. The ToC in this PDF kills a lot of lines, since the first column on each line are just numbers. I think what would help in general is increasing the number of pages of PDF extracted. I can't think of much penalty in terms of false-positive detection, but it does mean a bit more processing. Increasing the number of pages to, say 15 or 20 shouldn't be a big issue though.
Again, though, this is not a general issue that's applicable to PDFs of journal articles, which is a major target of metadata retrieval.
I would like us to do as well as possible on reports and working papers, not least because those are what I use the function for mainly (journal articles are imported via URL bar icon). ToCs, title pages, impressums and the like aren't uncommon there. I think going up to 15 pages would be worth at least a test.
Anyways, it does not look as it point towards my local file and I would like also to keep that reference.