Subset of PDFs not recognized as PDF
(apologies if this double posts - preview apparently ate my original post)
Hi all, I’m having a problem with pdf file recognition that I’m hoping somebody has seen before.
A subset of PDFs in my library are showing up with the snapshot icon, not the little adobe PDF icon, although they have the correct extension. This would be fine if it were purely cosmetic. But because zotero doesn’t recognize them as PDFs, I can neither extract metadata nor search for them by file type. Both FF and Preview open them just fine as PDF.
Some of the PDFs are attachments, others do not have a parent record. Their commonality is that they were imported from an export of another zotero library with about 1200 items.
If I show the file on the drive, it’s fine. When I drag it in, zotero recognizes it as pdf and imports it as normal. So I can show the file, drop it in to create a new record, and then delete the old version when it’s finished as a fix. But there are a lot (!) of them.
Does anybody know what’s going on? And, more importantly, is there a way to fix it without going through the “show in finder-drag and drop-extract metadata-delete old one” routine for every one?
Thanks!
Hi all, I’m having a problem with pdf file recognition that I’m hoping somebody has seen before.
A subset of PDFs in my library are showing up with the snapshot icon, not the little adobe PDF icon, although they have the correct extension. This would be fine if it were purely cosmetic. But because zotero doesn’t recognize them as PDFs, I can neither extract metadata nor search for them by file type. Both FF and Preview open them just fine as PDF.
Some of the PDFs are attachments, others do not have a parent record. Their commonality is that they were imported from an export of another zotero library with about 1200 items.
If I show the file on the drive, it’s fine. When I drag it in, zotero recognizes it as pdf and imports it as normal. So I can show the file, drop it in to create a new record, and then delete the old version when it’s finished as a fix. But there are a lot (!) of them.
Does anybody know what’s going on? And, more importantly, is there a way to fix it without going through the “show in finder-drag and drop-extract metadata-delete old one” routine for every one?
Thanks!
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
They're PDFs, but with the snapshot icon.
It's weird. I'm wondering if something in the database got corrupted for that library import, because I can't think of anything else.
There may be an import/export bug that resulted in the file type info being lost. Can you provide a Debug ID that shows you opening a few of the affected files?
Yes, they go to the actual storage directory on the drive. ls -alF shows actual file, not link:
One that doesn’t work (attached to a parent record)
-rw-r--r--@ 1 cmt staff 765744 Feb 4 22:45 /Users/cmt/Dropbox/zotero_storage/RSAUH4FU/Howlett-2012-The_lessons_of_failure.pdf
For reference, here’s one that works just fine (not attached to a parent record)
-rw-r--r--@ 1 cmt staff 1389914 Feb 5 15:45 /Users/cmt/Dropbox/zotero_storage/WMSTCECJ/8f1f5d7e868d849da0_gam6b4eab.pdf
And another that doesn’t (one not attached to a parent record)
-rw-r--r--@ 1 cat staff 674532 Feb 4 12:31 /Users/cmt/Dropbox/zotero_storage/MMQHPZWD/63ecbf1f40aa29b527ef4f61ca90d5e9.pdf
As for the Debug ID, I don't even get the *option* for Retrieve Metadata from PDFs on the PDFs labeled as snapshots, so I'm not sure how to generate a useful action log. Suggestions?
Also, we very, very strongly discourage storing your zotero database on Dropbox, and it's certainly possible that that's the cause of your problems. I'd start by undoing that.
(these are stored files, but you're still misunderstanding linked vs. stored files, btw. That's linked vs. stored from Zotero's perspective, not your file system's. Both are regular files for the OS).
Debug ID was for opening, not for Retrieve Metadata. But it seems like the content type was lost for all the attachments in a given import. We'll have to check the Zotero RDF translator to see if we can reproduce this.
As for a fix, if you're comfortable at the command line, this should fix it for you:
1) Close Zotero/Firefox.
2) In Terminal, cd to the Zotero data directory.
3) Make a backup of zotero.sqlite.
4) sqlite3 zotero.sqlite
5) Run these two commands:
UPDATE items SET clientDateModified=CURRENT_TIMESTAMP WHERE itemID IN (SELECT itemID FROM itemAttachments WHERE path LIKE '%.pdf' AND mimeType IS NULL OR mimeType = '');
UPDATE itemAttachments SET mimeType='application/pdf' WHERE itemID IN (SELECT itemID FROM itemAttachments WHERE path LIKE '%.pdf' AND mimeType IS NULL OR mimeType = '');
Note for posterity: If you came across this thread in search results, don't run these commands! They should work for taylor_caroline at the current time, but they could cause problems in other situations and won't work in future versions. We'll come up with a proper fix for this.
All the files imported since have been fine. (so far). And I appear to have deleted the original export once it was imported.
I do, actually, understand the distinction within zotero between linked and attached. I should, perhaps, have been more explicit in my description but was trying for brevity. Sorry.
And my comment about Dropbox clearly crossed yours -- I very deliberately didn't put the database there! Your warnings are very explicit!
I'll report back shortly on the database modifications. Thanks!
Thank you so very much!
It's possible I have the export (rdf) on my backup drive - do you want it somewhere if I find it?
-Caroline
I just generated a small rdf export from an old zotero library to pull into my current one, and 3 of the 6 attachments came in right, the other 3 as a pseudo-snapshot (all files were correct in the originating library). Is that rdf file of any use to you for debugging? If so, let me know and I'll post it.
-Caroline