Can get pdf metadata when I "Store a Copy.." but not from file links

timtak · March 7, 2012

The title says it all.

I linked to some pdf files on a flash drive.

I installed two pieces of software for reading pdfs (pdftotext and pdfinfo) and lo, most of the links' right hand panes say that the file is indexed and displays the number of pages in the file.

However when I click on "Retrieve Metadata for Pdf" It says "Could not read text from PDF." I spent a long time reading arcane posts in these forums, and attempted to delete 4 pdf* files. I don't think that they are in my store.

I then tried storing a copy of the file in the Zotero database and it *indexed and retrieved the metadata no problem*.

So I can just copy pdfs into Zotero's database.

However, with about 10 links and two pdfs (one a short review downloaded from muse, the other the copied file) I am already using 5MB of data (according to explorer, zotero says 3.6MB). My free 100MB for syncing will not last long.

I could put my Zotero storage on a flash drive and sync in that way. But it is a bit scary or strange to me now to move from explorer to Zotero for file storage; I would have to store pdf files in the zotero database in order to have them read.

I have put my storage on my flash drive and pdf files can be read if copied to the zotero store on the flash drive, so my problem with file links is not due to their being on a flash drive.

I imported some of my citations from MS word (via Ritesh Agrawals's Memento) and see the beauty of being able to get citations from internet pages such as Muse, Jstor and Google scholar with a click. Memento is just not up there with Zotero.

[As a side issue, what with the link to "upgrade storage" being on every page, I wonder about the extent to which Zotero is a commercial enterprise that focuses on storage sale, and whether bugs that can be fixed by increasing the size of the storage might be given short, or shorter shrift. A standalone server is available but not supported. All this would stand to reason on a commercial venture. I also think that 100USD a year would be enough for me for quite a while and about why am I such a cheapskate. I pay for Flickr/server yearlies....]

Zotero is cool. It would be even cooler if I could link to pdfs and then get their metadata rather than having to copy them to my store.

I wonder if it can find and create links for me to the pdfs on my hard disk if I import the bibliographic references first.

adamsmith · March 7, 2012

For your actual problem - produce a report ID immediately after a failed retrieve-metadata attempt of a linked file.
http://www.zotero.org/support/reporting_bugs#provide_a_report_id
Make sure the file shows that is indexed. Ideally make sure that retrieve metadata works for the same file when you store it in Zotero.

[As a side issue, what with the link to "upgrade storage" being on every page, I wonder about the extent to which Zotero is a commercial enterprise that focuses on storage sale, and whether bugs that can be fixed by increasing the size of the storage might be given short, or shorter shrift. A standalone server is available but not supported. All this would stand to reason on a commercial venture. I also think that 100USD a year would be enough for me for quite a while and about why am I such a cheapskate. I pay for Flickr/server yearlies....]

Zotero is not a commercial enterprise, period. The financial aspects of file storage are handled by the Corporation for Digital Scholarship, a non-for profit corporation whose board consist of a bunch of profs from various universities. Zotero supports WebDav servers for syncing - it is relatively easy to find free WebDav hosting up to about 2GB and relatively cheap for larger amounts.
Also, you don't need to use file syncing at all. Zotero is fully functional without it.
(Generally I find it odd to include completely baseless speculations/accusations like that in a request for support for a free software program. Do you realize how rude that is?)

dstillman · March 7, 2012

Metadata retrieval works fine for me for linked files.

We'll probably need a Debug ID (rather than a Report ID) for a metadata retrieval attempt for a linked file. But first, as adamsmith says, do make sure the same file works when stored.

As a side issue, what with the link to "upgrade storage" being on every page, I wonder about the extent to which Zotero is a commercial enterprise that focuses on storage sale, and whether bugs that can be fixed by increasing the size of the storage might be given short, or shorter shrift.

There are no bugs that can be fixed by paying for a Zotero File Storage plan. You can store as many files within the data directory as you want without paying for storage. If you go over your online quota, you'll get a warning when you try to sync. If you don't want to see the warning and don't want to pay for Zotero File Storage (which both covers our very real costs for file storage/transfer and, yes, helps fund Zotero development and support), turn off file syncing or use WebDAV. Your data will continue to sync in any case.

A standalone server is available but not supported. All this would stand to reason on a commercial venture.

One has nothing to do with the other. Read the countless threads, here and on the dev list, if you're curious as to the (resource/technical/support/brand) reasons the server code isn't supported.

dstillman · March 7, 2012

But it is a bit scary or strange to me now to move from explorer to Zotero for file storage; I would have to store pdf files in the zotero database in order to have them read.

And this is wrong, by the way. Files aren't stored in the database. They're stored in the 'storage' subdirectory of the Zotero data directory. You can open them directly, including by creating a smart folder on your OS that shows all PDFs within the directory.

timtak · March 7, 2012

Thank you for your suggestions.

The debug information is here.
http://nihonbunka.com/temp/id.txt

My guess is that Japanese characters in the file path. Zotero itself can cope and copy the file across to Zotero, but perhaps whatever is being used to do the metadata from links can not.

Yes, I can confirm that I can import from links to directories with no Japanese characters in their paths.

I wonder if the multilingual version of Zotero copes with two byte character in file paths better.

I apologize for being rude.

I did have a look at the forums regarding the things I mentioned.

When I said that files being in the database I mean that my means of accessing files copied to Zotero will be via the Zotero GUI, because they are in non-human-memorable named folders (such as 7V3PWETZ etc) in the Storage folder, folders which are so named due to the fact that they are accessed by the database, or somesuch. I am sure you understand the issue better than I can say.

I am having a great time with Zotero. It is *amazing*! I may even become a researcher thanks to Zotero (its makers and community).

adamsmith · March 7, 2012

you should use the suggested procedure for submitting Zotero Debug via a debug ID and not post it publicly (that's for your privacy and computer safety) - it also seems like you didn't let this run long enough - it has the attempt to index the pdf, but as far as I can tell you're not trying to retrieve metadata in this log.

Intuitively I'd guess that you might be right about Japanese characters, particularly on windows.

adamsmith · March 7, 2012

When I said that files being in the database I mean that my means of accessing files copied to Zotero will be via the Zotero GUI, because they are in non-human-memorable named folders

As Dan says, though - you can easily have all PDFs in the Zotero storage folder displayed in one single folder via saved search (google saved search and your Windows version to see how to create one).

timtak · March 7, 2012

Thanks again for your rapid support.

It was quite a few seconds after I attempted to retrieve the metadata that I copied the debug information.

I had a look through the file and could see nothing secret about the contents.
http://nihonbunka.com/temp/id.txt

I have updated the contents attempting to get the metadata from the link twice.

Here is a screen shot showing that I could get the metadata from the pdf when copied into zotero but not from the linked file even though it was indexed and the number of pages is shown.
http://nihonbunka.com/temp/shot.jpg

The saved search idea is interesting but, I have in the past religiously avoided the virtual folders provided in windows 7 (even using hacks to delete them from the explorer gui, and save file dialoguges) preferring to use real folders (in so far as any folders are real) due to complications that virtual folders create with backing things up, and finding the physical location of the file, for other reasons, on other machines. I keep all my data in a large 64 GB flash drive. I see how, however in this case, it might be a compromise to allow me to access files from both Zotero and explorer. In any event I am not leaving Zotero.

dstillman · March 7, 2012

My guess is that Japanese characters in the file path. Zotero itself can cope and copy the file across to Zotero, but perhaps whatever is being used to do the metadata from links can not.

It was the Japanese characters. Earlier versions of Firefox didn't support running external programs (e.g., pdftotext) with extended characters on the command line. Newer versions do, but Zotero had to be updated to take advantage of that. That had been done for full-text indexing but not for metadata retrieval. Fixed for 3.0.4. Thanks.

timtak · March 7, 2012

Thank you very much indeed for fixing this. I look forward to 3.04.