Not signed in (Sign In)
Quick Links
Vanilla 1.1.5a is a product of Lussumo. More Information: Documentation, Community Support.
-
- CommentAuthorIddo
- CommentTimeOct 31st 2008
Hi Again,
Strange problem - I watched the retrieve pdf metadata screencast and tried it on a number of PDFs.
The first time I drag and dropped 3 PDFs I got the option "retrieve metadata from PDF" when I right clicked. i tried it but it didn't work (opened a window with a small rotating circle which never stopped). I canceled and tried again - still nothing. I than tried it with a few other PDFs I imported and to my surprise I no longer had the option to retrieve PDF metadata at all when I right click.
Any ideas what is going on?
Iddo -
- CommentAuthorIddo
- CommentTimeNov 1st 2008
Any ideas? -
- CommentAuthorrmjanjua
- CommentTimeNov 3rd 2008
Iddo,
I posted a comment a while ago but never got a response from anyone. It has been mentioned in the past but the problem remains. The metadata retrieved is also incorrect and incomplete. The only engine used to retrieve is Google Scholar.
I also anxiously await the ability to drag my PDF's into Z without having them duplicated into the Z files. -
- CommentAuthorIddo
- CommentTimeNov 3rd 2008
m... this is something Zotero is supposedly supporting so we are not talking about a future feature rather on an existing one which doesn't work (for me at least). -
- CommentAuthorTjowens
- CommentTimeNov 3rd 2008 edited
Which version of the preview are you both using? I just tried a few different PDFs in the newest version, Sync3.2, and the metadata came in fine for me. The best way for the community to refine the tool is to mention specific PDFs that fail and where one can find them in a database. -
- CommentAuthorrmjanjua
- CommentTimeNov 3rd 2008
I am using the most current one but the problem was there before I updated it a few days ago. It actually always fails to retrieve the journal and incorrectly assigns a author: Anatomical variations in the origin of the human ophthalmic artery with special reference to the cavernous sinus and surrounding meninges.
Matsumura Y, Nagashima M.
In this article when dragging the PDF to Z and trying the retrieve the metadata, the author I got was "Organs C.T." without the journal and pages but it did get the volume and issue.
In the mean time a window keeps appearing on the top of my screen asking me to enter a URL.
Thanks for your help and look forward to your suggestions. -
- CommentAuthorIddo
- CommentTimeNov 3rd 2008
I am using version 1.5sync 3.2.
I just don't have the option to to get the meta-data.
On my desktop I never had the option and on my laptop I had it and now I don't. -
- CommentAuthorSimon
- CommentTimeNov 3rd 2008
Iddo, you need to install the PDF indexer for this feature to work. Go to "Search" in the Zotero preferences and click the "Check for installer" button. This should enable the option for you. -
- CommentAuthorIddo
- CommentTimeNov 3rd 2008
Hi Simon,
Interesting - why is this not installed as a default? I would have never guessed I need to do that.
O.K. now I have the option - I tried two PDF files and it did not find any metadata - can you direct me to a free PDF I can download which you know for certain that has retrievable metadata so I can try and see if it actually works?
Thanks,
Iddo -
- CommentAuthornoksagt
- CommentTimeNov 3rd 2008
Interesting - why is this not installed as a default? I would have never guessed I need to do that.
It uses platform-dependent binaries that are distributed under a different license. -
- CommentAuthorTjowens
- CommentTimeNov 4th 2008
-
- CommentAuthorIddo
- CommentTimeNov 4th 2008
yup - it works!
So apperntly many PDFsfrom Jstor don't have matadata - what a shame :(
So basically what you are saying is that most PDF I will try to import this way will not have metadata? is there any conceivable way around it apart of course from typing all the data myself which is something I don't really want to do for hundreds of PDF files I already have? -
- CommentAuthorsean
- CommentTimeNov 4th 2008
Until recently, JSTOR did not include a text layer in its PDFs. Without that layer, the PDF is effectively just an image, and there's no way for Zotero (or any other tool) to read or recognize anything. If the PDFs are from JSTOR, you could always go to JSTOR and reimport those resources. -
- CommentAuthorIddo
- CommentTimeNov 5th 2008
true - I just thought that JSTOR from all places will be more organized on this point - apparently no - so I guess I can't expect more from smaller places.
Just a thought - why not use OCR application and build an algorithm that can try and extract the title, author, year of publication etc. from the front page of a PDF?
It won't be 100% (probably not even 70%) but it might be better than typing everything by hand. -
- CommentAuthornoksagt
- CommentTimeNov 5th 2008
Ideas for more intelligent parsing of PDFs have been brought up before. A regex search for identifiers (PMID, arXiv, DOI, etc.) would be useful. For recognition of titles, etc., a search against a large pool of data (as the Zotero server may one day have) could benefit heuristic identification.
I don't know if OCR has been discussed before. This seems rather heavy and platform-dependent to be a reasonable dependency to me, but the idea of allowing end users to plug-in command line apps for indexing has been discussed. If custom commands could be run on file attachment or for indexing, you could insert your favorite ocr app into the chain before pdftotext. -
- CommentAuthorIddo
- CommentTimeNov 5th 2008
sounds interesting.
You talk about "a large pool of data" - is this something you are currently activly looking into or are we talking about a distant future? -
- CommentAuthornoksagt
- CommentTimeNov 5th 2008
I'm not a Zotero developer, but the Zotero server is under active development. Recommendations have been touted as a planned feature & a recommendation system may benefit similarly from a large data pool. There have been no announced plans for heuristics to be used to add metadata, so that would probably be a more distant feature. -
- CommentAuthorIddo
- CommentTimeNov 5th 2008
O.K. thanks :) -
- CommentAuthorenozkan
- CommentTimeNov 7th 2008
I have to second rmjanjua on his comment. Metadata retrieved is very inaccurate in cases when Google Scholar is used as the repository. Can we set the repository for data retreival to something else? Honestly as a biochemist, >99.5% of articles I read are accurately recorded by PubMed (for which we have to thank the American government for), and on the rare occasion PubMed is used as the repository, the bibliography retrieved is always accurate.
Engin -
- CommentAuthorrmjanjua
- CommentTimeNov 9th 2008
Engin,
One of the options that I would like to see is the ability top pick your repository. I agree that Pubmed is great and actually have been happy in all of my records taken directly from it. How do you save your PDF's? I have been saving them all in YEP with tags. This allows me to pull up all PDF's related to a subject and view in one glance. Although having PDF's associated with a citation in Z is functional, it does not have the visual feature of YEP and ends up duplicating the file on your drive.
Rashid -
- CommentAuthorSimon
- CommentTimeNov 9th 2008
PubMed stores only abstracts, not full-text, and thus would probably not work too well. Our algorithm currently tries to extract a random text snippet and searches for that; even if we tried to extract only the abstract, I'd imagine our success rate with any kind of algorithm would be very low. PubMed Central would probably work properly as a repository for the articles in it, but has far less content than PubMed or Google Scholar. -
- CommentAuthorenozkan
- CommentTimeNov 14th 2008
rmjanjua,
I do not save my PDFs any specific way any more, now that I'm using zotero. But I see your point.
Simon,
It does make me wonder, however, how other software manages accuracy in metadata retreival (I am specifically thinking of "Papers" which is available only on Macs). I guess their algorihthm is different, and if so, what are the chances of improving the current zotero algorithm? -
- CommentAuthorSimon
- CommentTimeNov 14th 2008
As far as I can tell, Papers takes a similar approach to ours, grabbing metadata from Google Scholar. They try the DOI first, which might be worth looking into. After that, it seems like they make you select text from the PDF, which we aren't going to do. Are there PDFs that Papers does a better job with than Zotero? -
-
CommentAuthorRintze
- CommentTimeNov 16th 2008 edited
PubMed stores only abstracts, not full-text, and thus would probably not work too well.
Would it be feasible to first use Google Scholar to identify PDF papers by using random texts snippets, and when that gives a positive hit, to send the Google Scholar metadata of the identified paper to Pubmed (e.g. author names, titles, DOI's)? -
-
- CommentAuthorMadigania
- CommentTimeDec 3rd 2008
Back to the original problem, solved for Iddo above but not for me:
"I just don't have the option to to get the meta-data." when right clicking on a pdf in the Zotero library.
I followed Simon's instructions:
"Iddo, you need to install the PDF indexer for this feature to work. Go to "Search" in the Zotero preferences and click the "Check for installer" button. This should enable the option for you."
Confirmed that I do have PDF indexing (version 3.02) but I wonder if that is the problem? I'm using Zotero 1.07, updater says that is the most recent version (for Apple Os X.5), but Iddo stated 1.5 syncing with 3.2.
So this new feature only works on the Zotero 1.5 sync preview? -
- CommentAuthorDan Stillman
- CommentTimeDec 3rd 2008
So this new feature only works on the Zotero 1.5 sync preview?
That is correct. It's a new feature that will be in 1.5. -
- CommentAuthorhamptondan
- CommentTimeFeb 27th 2009
I'm having problems with accuracy, for instance a pdf of "Management of Large Segmental Tibial Defects Using a Cylindrical Mesh Cage" is being recognized as "Tumoral calcinosis in infants: a report of three cases and review of the literature". They have the correct journal but the wrong year -
- CommentAuthorgary.pajer
- CommentTimeMar 2nd 2009
I haven't had any success at all with this feature. WinXP, Zotero 1.5b1, and yes, the two necessary plug-ins are installed. I drag the pdf into the center pane, making sure that it's an entry by itself (not attached to an entry). Right click, the context menu entry for Retrieve PDF Metadata is there. The info window (with progress bar) opens, wheel spins, then "No matching references found". Every time. Every PDF, including the one suggested above by Tjownes on Nov 4 2008. If I go to google scholar by hand, the article(s) are found easily, the zotero icon appears in the address bar, I click on it, and an entry is created and its fields properly populated.
Any hints? -
- CommentAuthorTjowens
- CommentTimeMar 3rd 2009
Can you confirm that your full text search plugins are functioning? Try doing some searches for terms that appear inside your PDFs and see if Zotero is searching through their full text.
I just tried this PDF again and it worked quite nicely, so it looks like this is not a general issue but something specific to your configuration. -
- CommentAuthorgary.pajer
- CommentTimeMar 4th 2009
Evidently ... I have three zotero installations, and it works fine on two of them. When I get a chance I'll try to uninstall/reinstall the plug-ins and, if necessary, zotero. Unless you have a better suggestion.
Thanks,
Gary -
- CommentAuthorbauct
- CommentTimeMar 31st 2009
I tried the above article, and the text is indexed correctly, but it cannot retrieve metadata! pdf indexers are up to date. Running 1.5b2.
Tried another article that is definitely found in google scholar, also did not find metadata.
I am using NitroPDF not Adobe reader, any connection? PDF document properties do show the right title and author... -
- CommentAuthorbauct
- CommentTimeMar 31st 2009
Uninstall/reinstall of zotero did not solve metadata problem... -
- CommentAuthorsean
- CommentTimeMar 31st 2009
Could you provide debug output? Please include output from the moment you download Tjowens's PDF (you can just right-click and Save Link As Zotero Snapshot) through your attempt to retrieve metadata (right-click PDF in Zotero and Retrieve Metadata for PDF). -
- CommentAuthorDan Stillman
- CommentTimeMar 31st 2009
bauct: You're getting a permission denied error, which generally indicates an extension or application conflict. See Step 5 on the troubleshooting translator issues page, and start a new thread if you're still having trouble. -
- CommentAuthorbauct
- CommentTimeMar 31st 2009
Disabling and then re-enabling Roboform 6.92 solved the problem!
Thanks! -
- CommentAuthorjasonjjay
- CommentTimeMay 18th 2009
I just switched to Zotero from EndNote for the specific purpose of being able to retrieve PDF metadata, but this feature doesn't seem to work. I have the same problem as others on this forum where the Progress window comes up, it says "Retrieving Metadata" and just spins its wheel without getting anything. I have tried a number of PDFs from a variety of sources and tried the PDF that Tjowens linked, and that one doesn't work either.
I am on Windows XP, running a fresh clean copy of Firefox 3.0.1 with no other extensions except the Microsoft .NET Framework Assistant that came installed. I am using Zotero 2.0b3 with the pdftotext 3.02 installed. The PDFs are definitely getting indexed - they are searchable and I can see the number of PDFs and words accumulating in the Search tab of the preferences window. But no metadata. Would love help on this, since again this is the main reason I switched to Zotero. -
- CommentAuthorBionatsci
- CommentTimeMay 18th 2009
Hello jasonjjay
I think you are just a victim of unfortunate timing, retrieving PDF metadata happens to be broken in the current 2.0 build, see the known issues, but it has been fixed in the trunk and a new build with this fixed should be out very soon ("in the next few days" was mentioned in a 3 day old post). -
- CommentAuthorjasonjjay
- CommentTimeMay 18th 2009
Got it. Thanks! I poked my head into Trac and saw the modification to trunk on its way. https://www.zotero.org/trac/changeset/4498 I hope this works. I'll look forward to the new build. -
- CommentAuthorrhett
- CommentTimeOct 31st 2009
When I drag a PDF into my library it usually does not drop at all. If I take it to "My Library" it zaps back to the desktop if it is in the middle column it just hangs there. I did many age to import a couple once but then right clicking did not give me the option of recovering meta-data.
I have installed the add ons and tried some of the "standard" pdfs mentioned above. Any one got any ideas what might be wrong?
Incidentially, I also have Mendeley installed as I was experimenting with both after using Endnote for years. Generally Zotero is much better for me, but the PDF OCR functions in Mendeley are very nice. -
- CommentAuthoradamsmith
- CommentTimeOct 31st 2009
what OS are you running?
You can also get a pdf to your library by
a) opening it in your browser with a plugin and clicking the "Create new item from page" button and
b) using the "store copy of file" function. -
- CommentAuthorrhett
- CommentTimeNov 1st 2009
Thanks for the tip.
With the "Standard PDF" I got everything except the Abstract (I preseume this is not included in the standard retrieval). With my own PDFs it was a bit variable - often the Publication name, vol, pages missing (in the Mendeley pdf viewer one can highlight text and copy it - so correcting mistakes in the automatic retrieval is very quick and easy).
I am using Ubuntu Linux 9.04. -
- CommentAuthoradamsmith
- CommentTimeNov 1st 2009 edited
I believe the metadata comes from google scholar, that's not always great, but usually a good start.
The adobe FF plugin for linux allows you to highlight and copy text, too. Essentially the same function as the built in pdf reader in Mendeley.
Drag&drop is broken in linux. -
- CommentAuthorrhett
- CommentTimeNov 1st 2009
Again thanks for the tip.
I will follow up on the Drag&Drop in the Ubuntu forums. -
- CommentAuthoradamsmith
- CommentTimeNov 1st 2009
no, it's a FF on linux issue - word from Dan is it might be fixed for FF 3.6 -
- CommentAuthorrhett
- CommentTimeNov 1st 2009
I also found it was possible when I switched my pdf view from Evince to Okular
1 to 45 of 45