New item from PDF

If I add a PDF to my library and then "Retrieve Metadata" from the PDF and no metadata is detected then I would like to have the option to directly create a New Item (Book, Document, ...) with the PDF as attachment.

As it is now when no metadata is detected then I have to create a New Item which is empty, fill in title and so on manually and then select to add attachment (either "Stored copy.." or "Link to file..") and then delete the previously added PDF.

It would be simpler to be able to ctl-click on the PDF and have the option in the popup menu to "Add as Attachment to New Item...".
«1
  • This has been discussed, and you're certainly right that it should happen. Falling back to making a new parent item (perhaps of type "Document") is a very reasonable behavior, and I hope that it happens for Zotero 2.1.
  • I'm sorry that I didn't find that it had already been discussed. In any event I am using 2.1b2 and the functionality isn't present.
  • Yes -- it's not in the 2.1 betas, but it could conceivably still happen before the final release of 2.1.
  • I'm not sure when it appeared, but a similar function is already in place for non-PDF items-- Zotero gives the option of making them an attachment to a new item.

    I just wrote up a little patch to add this to Zotero -- I don't know if it will make it into 2.1, but I don't see why not. I've also created a ticket for this.
  • awesome! Thanks for doing that.
  • Thanks to ajlyon for the patch. It works.

    I think a final solution will need more than this, though. If we're tying this into metadata retrieval, we might want to confirm with the user that parent items should be created, prompting for all failed items at the end of the metadata retrieval process. Otherwise 1) it's hard to notice what happened and 2) a temporarily failing lookup would create parents for all items (and there's no way, at least at present, to re-run metadata retrieval on child attachments).
  • I agree that the patch isn't good enough to actually land yet, but I was pleasantly surprised to find that it was really quite easy to do. I still have almost zero experience tweaking XUL, so I just changed the back-end behavior.

    The easiest fix that wouldn't require any real UI change would be to add PDFs to the file types supported by the existing "Create Parent Item from Selected Item" code.

    As noted in the ticket and in the commit message for the r4701, a real solution would combine Create Parent Item... and PDF detection. I'll see about mocking-up something like that.
  • I'm happy to do the XUL work if we can agree on the best way this should work (though I'd of course more than welcome a patch).
  • I appreciate the efforts on this functionality.

    I would like to say that as a new user trying to organize a large collection of PDF articles and the like, the batch handling of failed items would be very welcome.

    It seemed to me that perhaps when parent items are created in such cases they could be partially or approximately populated from the information that was extracted when searching online for proper metadata. It seemed to me also that as a fallback it might be possible to use the title of the PDF file - in my case I use a naming convention of author lastname(s) followed by " - " followed by the title.

    A related consideration is that it would be very helpful when adding many PDFs to have the option of adding by link to file rather than ingesting the PDF into the Zotero DB. This would be similar to the iTunes option that allows the user to specify whether the files are copied into the iTunes media folder or not when adding them to iTunes. I have a very large collection of PDFs (40 GB) and don't really want to duplicate the storage. Such an option would, it seems, imply that "retrieve metadata for PDF" would have follow the link.

    Thanks for such a helpful tool
  • A related consideration is that it would be very helpful when adding many PDFs to have the option of adding by link to file rather than ingesting the PDF into the Zotero DB.
    that's already possible. Use "Link to File" under the green plus sign (add new item) menu in Zotero - you can select multiple items with shift and/or ctrl
    Retrieve Metadata will work for linked items.
  • Wonderful! Thank you.

    I was trying the link to file from a Document, Book etc and then didn't see the retrieve meta-data. The "Link to File" under the green plus sign works fine along with the retrieve metadata.
  • It seemed to me that perhaps when parent items are created in such cases they could be partially or approximately populated from the information that was extracted when searching online for proper metadata. It seemed to me also that as a fallback it might be possible to use the title of the PDF file - in my case I use a naming convention of author lastname(s) followed by " - " followed by the title.
    My test implementation would fall back to the filename as the title of the created parent item.

    In the way that Zotero currently tries to identify PDFs, there is no such thing as approximate metadata-- if it finds a DOI, it creates an item using that. Otherwise, it uses a phrase from the article and searches Google Scholar-- if there's a match, it creates an item using that. Both of these are binary propositions -- we never have partial DOI-provided data (well, we do sometimes, but Zotero doesn't know that it's partial and happily creates an item anyway), and we don't have partial item data from Google (well, we do, but again Zotero doesn't know that and creates the item anyway). Zotero doesn't even try to guess what might be the journal name, article name, etc.
  • Using the filename works for me. I assumed that retrieve metadata was actually extracting doi and title and author based on heuristics. I suppose that's non-trivial depending on how the PDF is generated. I have a PDF that has a DOI near the beginning on page one but the retrieve metadata doesn't seem to find it.
  • If you can, post the PDF in question online and I can try to see why Zotero can't find the DOI.
  • Sorry. Here's the link:

    http://michel.bitbol.pagesperso-orange.fr/Autopoiesis.pdf
  • I haven't quite figured out why yet, but Zotero can't manage to import from that DOI at all. It is trying to look it up, but then it chokes on the output. This is pointing to a problem in DOI lookups-- it finds the identifier right away.

    Since this points to an issue in the core DOI code, I'll be looking into this. Fortunately, it doesn't indicate any limitations in the PDF handling.
  • Dan points to the cause of this in another thread. Suffice it to say that we'll fix this rather soon.
  • I have spent several hours trying to add simple PDF files to my group library. These are documents that I created. I have dragged the PDF file from My Documents on my computer, and it shows up in the proper folder in the editing panel of Zotero. When I click on it, it opens instantly. However, when I go to my Group Library in Zotero (as any member of the public would do, or another member of the group), there is no way to get the PDF file to open so it is visible. Doesn't matter whether I am logged into the group or not. What am I doing wrong?
  • Could you describe what you're seeing online?
    I'm not sure if individual pdfs, not attached to any metadata - actually sync. But provided they do:

    Are you using File Syncing?
    http://www.zotero.org/support/sync#file_syncing
    If not - that's the problem.
    If you are, are you using Zotero File Storage? If not, that's the problem - WebDAV sync doesn't allow for public file access.
  • I have now spent the entire day unsuccessfully trying to add a simple PDF file to my group library. I am using file syncing and zotero file storage. I attempted to create a new document by hand and then attach a PDF file. I tried to drag a PDF file directly into the library folder. I tried multi-page PDF documents and single-page documents. I tried random PDF files from my computer. In all cases where I used a PDF that was created from a document, rather than coming from a database, it doesn't work--either it will not sync at all, or it shows up in the group library, but when you try to open the file, it just goes to a description of the file, but does not actually open the PDF. Yet if I put in an article from the New York Times from 1898 that I downloaded from an online database, that does show up. Is it not possible to put newly created PDF documents into zotero? If not, how can people use this to collaborate on work, share edited manuscripts, etc.?
  • Correction to previous comment. I have now succeeded in getting one page of a PDF created de novo to open in the group library, although I am not sure how I did that. It is not clear whether a multipage PDF document, from which this one page came, will actually work the same way.
  • Kentwood: Start a new thread, provide exact steps to reproduce the problem, and be sure to differentiate between the Zotero client and zotero.org. A "group library" can refer to the group library in Zotero or the group library on the website. If you can post example files that work or don't work to a free file-sharing service, that might be helpful, since nobody has reported any problems like yours (to the extent that I understand it).
  • I have figured out that if I scan the documents as PDFs at lower resolution, then I can put them into my library files and make them appear in the group library on the website and they open. Previously I was working with PDF files created on the computer in high resolution to print the documents, and those did not work, for reasons that are not clear. It may be that the file size was too large and gummed up the works. In any case, I now have a system that is giving me what I want.
  • I finally solved this problem--I recreated the original documents in PDF using much lower resolution, and now everything works, so no need for further comments on this issue.
  • Sorry, but problem not solved. When I attach a PDF document that I created in the group library, it opens perfectly when I am logged in, but when I log out (as if I were a member of the public viewing the site), the link to the attachment disappears and it is not possible to open and view the PDF file. Even though my preferences are set to sync to the group library with Zotero, it appears it is actually using WebDAV--how do I disable this function so it will sync properly and make these PDF documents visible to anyone?
  • Update on previous comment: When I log into Zotero using my wife's username and password, then it is possible to open and view the PDF attachments. So presumably this means they are visible to any member of the group, but not visible to the public at large. Why would this be? Is it possible to make original documents visible to everyone?
  • Zotero doesn't allow file sharing to non-users.

    If you have further questions, please start a new thread—this thread has nothing to do with your issue.
  • This has been my key problem. I have created scanned many books to PDF and run OCR on them. When I add these PDFs to Zotero, and try "Retrieve Metadata" from the PDF , no metadata is detected. There does not yet appear to be any way to create a New Item (Book, Document, ...) with the PDF as attachment (as suggested by xristy at the top of this discussion)

    I was not able to figure out the patch created by ajlyo (Dec 25th 2010) that is supposed to enable this behavior. I just installed the lastest 2.1.1 Zotero and found it does not include this behavior "Create new item with PDF as attachment".

    Is there a solution?

    I'm quite stunned that this problem/request is not much much more common.
  • no this hasn't progressed. Yes it does get requested a lot - read the rest of the thread on some of the issues in implementing this and chime in.

    You probably don't want to apply a patch to Zotero yourself - it's not that hard, but it involves decompressing the .jar applying the patch and recompiling it - unless you feel inclined to tinker you probably don't want to get involved with that.
  • This didn't happen because, well, we didn't lay out precisely how it should work.
Sign In or Register to comment.