Subset of PDFs not recognized as PDF

(apologies if this double posts - preview apparently ate my original post)

Hi all, I’m having a problem with pdf file recognition that I’m hoping somebody has seen before.

A subset of PDFs in my library are showing up with the snapshot icon, not the little adobe PDF icon, although they have the correct extension. This would be fine if it were purely cosmetic. But because zotero doesn’t recognize them as PDFs, I can neither extract metadata nor search for them by file type. Both FF and Preview open them just fine as PDF.

Some of the PDFs are attachments, others do not have a parent record. Their commonality is that they were imported from an export of another zotero library with about 1200 items.

If I show the file on the drive, it’s fine. When I drag it in, zotero recognizes it as pdf and imports it as normal. So I can show the file, drop it in to create a new record, and then delete the old version when it’s finished as a fix. But there are a lot (!) of them.

Does anybody know what’s going on? And, more importantly, is there a way to fix it without going through the “show in finder-drag and drop-extract metadata-delete old one” routine for every one?

Thanks!
  • Never seen this. Are these attached or linked? You'd be able to tell either by the link icon that all linked files, regardless of item type, have, or by the file paths of the folder that opens with show file.
  • All attachments! They all (all the ones that I've checked anyway) go to the file on the drive when I show the file.

    They're PDFs, but with the snapshot icon.

    It's weird. I'm wondering if something in the database got corrupted for that library import, because I can't think of anything else.
  • Going to the hard drive doesn't mean they're not linked files — the question is if they go to a directory within 'storage' within the Zotero data directory or elsewhere on your hard drive.

    There may be an import/export bug that resulted in the file type info being lost. Can you provide a Debug ID that shows you opening a few of the affected files?
  • When and in what format did you export the data originally?
  • Thank you both for looking at this.

    Yes, they go to the actual storage directory on the drive. ls -alF shows actual file, not link:

    One that doesn’t work (attached to a parent record)
    -rw-r--r--@ 1 cmt staff 765744 Feb 4 22:45 /Users/cmt/Dropbox/zotero_storage/RSAUH4FU/Howlett-2012-The_lessons_of_failure.pdf

    For reference, here’s one that works just fine (not attached to a parent record)
    -rw-r--r--@ 1 cmt staff 1389914 Feb 5 15:45 /Users/cmt/Dropbox/zotero_storage/WMSTCECJ/8f1f5d7e868d849da0_gam6b4eab.pdf

    And another that doesn’t (one not attached to a parent record)
    -rw-r--r--@ 1 cat staff 674532 Feb 4 12:31 /Users/cmt/Dropbox/zotero_storage/MMQHPZWD/63ecbf1f40aa29b527ef4f61ca90d5e9.pdf


    As for the Debug ID, I don't even get the *option* for Retrieve Metadata from PDFs on the PDFs labeled as snapshots, so I'm not sure how to generate a useful action log. Suggestions?
  • From zotero, Export library to Zotero RDF, with both files and notes ticked.
  • Dan is asking for debug for _opening_ them, not for retrieve metadata.

    Also, we very, very strongly discourage storing your zotero database on Dropbox, and it's certainly possible that that's the cause of your problems. I'd start by undoing that.

    (these are stored files, but you're still misunderstanding linked vs. stored files, btw. That's linked vs. stored from Zotero's perspective, not your file system's. Both are regular files for the OS).
  • edited February 6, 2016
    (Note that we were talking about linked files in Zotero, not filesystem symlinks, but it doesn't matter — seems like these are attached, not linked, files. But see File Copies and Links to understand the difference.)

    Debug ID was for opening, not for Retrieve Metadata. But it seems like the content type was lost for all the attachments in a given import. We'll have to check the Zotero RDF translator to see if we can reproduce this.

    As for a fix, if you're comfortable at the command line, this should fix it for you:

    1) Close Zotero/Firefox.

    2) In Terminal, cd to the Zotero data directory.

    3) Make a backup of zotero.sqlite.

    4) sqlite3 zotero.sqlite

    5) Run these two commands:

    UPDATE items SET clientDateModified=CURRENT_TIMESTAMP WHERE itemID IN (SELECT itemID FROM itemAttachments WHERE path LIKE '%.pdf' AND mimeType IS NULL OR mimeType = '');

    UPDATE itemAttachments SET mimeType='application/pdf' WHERE itemID IN (SELECT itemID FROM itemAttachments WHERE path LIKE '%.pdf' AND mimeType IS NULL OR mimeType = '');

    Note for posterity: If you came across this thread in search results, don't run these commands! They should work for taylor_caroline at the current time, but they could cause problems in other situations and won't work in future versions. We'll come up with a proper fix for this.
  • I should say: only the storage directory is in Dropbox, with a soft link pointing to it from the zotero folder in Application Support.

    All the files imported since have been fine. (so far). And I appear to have deleted the original export once it was imported.
  • Yeah, Dropbox probably isn't the problem, since you don't have the database in there.
  • Thanks, Dan and Adam.

    I do, actually, understand the distinction within zotero between linked and attached. I should, perhaps, have been more explicit in my description but was trying for brevity. Sorry.

    And my comment about Dropbox clearly crossed yours -- I very deliberately didn't put the database there! Your warnings are very explicit!

    I'll report back shortly on the database modifications. Thanks!
  • Phew, that fixed it - once I remembered how to exit :) !

    Thank you so very much!

    It's possible I have the export (rdf) on my backup drive - do you want it somewhere if I find it?

    -Caroline
  • no need, thanks, I can replicate this. Thanks for reporting.
  • ticket here: https://github.com/zotero/zotero/issues/906
  • Hi guys,

    I just generated a small rdf export from an old zotero library to pull into my current one, and 3 of the 6 attachments came in right, the other 3 as a pseudo-snapshot (all files were correct in the originating library). Is that rdf file of any use to you for debugging? If so, let me know and I'll post it.

    -Caroline
  • I think we got this covered (i.e. we know what's going on), but keep it around just in case. Thanks.

This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.

Sign In or Register to comment.