ZotFile - Advanced PDF management for Zotero

  • Thx, in 4.2.6 working again.

    But their is another question/problem i have trouble with..

    I use a lot of pdfs i have to ocr first, before i can work with them (dpi300x300 scans / normal modern letters) Zotfile seems to have a hard time extracting correct annotations from the pdfs, especially on the line break the addon is prod. a lot fails.

    Is it a problem of bad orc software (I use finereader) or a problem of the zotfile code? Will it be any better in the future?

    best regards, E.
  • My bet would be on the OCR, but hard to say without the actual file. If you can share a sample (either upload to Dropbox or the like or share via the zotfile zotero group as described here: http://zotfile.com/#extract-pdf-annotations) Joscha or someone else can take a look.
  • https://www.dropbox.com/s/2y90wm3ke4ip5ss/tartarenblut%20weinhauer.2000.pdf?dl=0

    The ocr was made with adobe finereader professional 12 (standard settings "picture" into "pdf" /> the annotations with the drawboard.pro app on a surface-book.

    I would be very pleased to get your help and advice on how to get the annotations paragraphs extraced correctly.

    best regards, E.
  • edited April 5, 2016
    I can see the problem. Copy and paste from preview works pretty well but zotfile screws up the order. I think the problem is related to the order of the quadrilaterals for each annotation. zotfile could resort them based on their position on the page but I don't think that is going to happen anytime soon.

    Zotfile: "Obwohl angesichts all dieser Turbulenzen die Bedrohung der Men­ Zeitgenossen anders empfunden haben schen durch klassische Unsicherheitsfaktoren wie Kriminalität im Rück­ blick eher marginal erscheinen mag, ist zu konstatieren, daß dies viele"

    Preview copy & paste "Obwohl angesichts all dieser Turbulenzen die Bedrohung der Men schen durch klassische Unsicherheitsfaktoren wie Kriminalität im Rück blick eher marginal erscheinen mag, ist zu konstatieren, daß dies viele Zeitgenossen anders empfunden haben"

    Edit: Here is a ticket for it: https://github.com/jlegewie/zotfile/issues/221
  • edited April 5, 2016
    Hello Joscha,

    thanks for your time and the support ticket you opened at github.

    It would be a milestone for the workflow for me and a lot of people i know working in the humanities here over in europe (were professional edited and especially ocr'ed papers are still rare..)

    You know any other method to get the annotations out of the pdf and then back into my zotero? (excluded "copy+Paste")

    best regards from Germany,
  • edited April 5, 2016
    @Gracherkonsequent: see this tutorial:

    it can be very interesting for you :) (exactly video time 7:25 and next)
  • edited April 15, 2016
    Hi. Thanks for the excellent add-on for Zotero.

    I have tried Zotfile to extract the highlights for a pdf in foxit reader and it worked as advertised.

    The problem is it doesn't work for this pdf here, it is in ascii and I'm using the same foxit's built in annotation maker. What is the problem?
    Link to the pdf
    The article is free to download. Just click on the blue download link to the right of the webpage.
    Thank you in advance
  • Hi!

    I'm struggling with the same problem zurpher and mbruffey described early this year:

    "zurpher Jan 22nd 2016

    I have scanned a text and OCR'ed with Adobe Acrobat Pro DC v2015.006.30033 I use Firefox (43.0.4), ZotFile (4.1.6), Zotero Standalone (, Windows 7 When trying to extract my annotations, I get a notice “Zotfile: Extracting Annotations…” but then the circle stop after about a quarter, the notice disappears and not extractions are imported into a Zotero note. I can copy and paste from that document so it should work. I had a similar issue previously that Joscha was able to fix. Any ideas how to fix this one? I have posted the document online in the zotfile Group Library."

    mbruffey Feb 4th 2016

    Cannot Extract Annotations

    It has been a long while (at least a couple of years) since I used the Zotfile Extraction feature. I can't extract notes (from a PDFXChange'd file). I tried on several items tonight, in both Juris-M and plain old Zotero. Extraction seems to begin properly, but the circle never completes its compass, halting about one o'clock, at which time the popup window disappears. I'm on Ubuntu 14.04 with its latest Firefox and the latest Zotfile. For good measure, I attempted the operations in a brand new profile with only Zotero and Zotfile as addons. Thanks, M
    How can we fix this? In my case, I was able to extract annotations from a file. But I've continued put annotations in it and now I cannot extract them. And I should add that I've been using a lot of colors. Any chance of those things have ruined everything?


  • OK, searching the forums hasn't revelaed an answer to the following question:

    How do it *remove* PDFs from my tablet once i've finished reading them? There is no "Remove from table" option in the context menu -> "manage attachments" section.

    I suppose I can remove the _tablet tag from the item and delete it from the tablet sync folder, although my attempts to do this so far result in zotfile reporting "missing files". Is there a better way?
  • @PooyaM, the pdf works for me without a problem...

    @thiagoafdoria, are you using version 4.2.6?

    @livingthingdan, "Manage Attachments -> Get from tablet". Don't remove the _tablet tag manually. That screws things up (as described in the documentation).
  • Yes, Joscha. Version 4.2.6.
  • Joscha, it seems that the pdf file has a problem with the XREF table. I made the annotations with Goodreader at the iPad. Then I tried to extract them with Zotero Stand Alone for Windows. I've a folder with pdf's at Dropbox which is connected with Goodreader. It's from there that I pick up the files in order to extract their annotations. Somehow, when uploading the newest version from Goodreader the file seems to get this proble with the XREF table. If I copy and paste those annotations into a new pdf file, the extraction works.
  • edited April 19, 2016
    This particular topic is now 28 pages long, so somewhere my question may have been asked already. If so, I apologize in advance. But a search for relevant terms comes up empty, so hopefully I'm not reinventing the wheel here.

    I'm using Zotfile to catalog a library of pdfs. Today I ran into this problem. I'm storing a working paper I wrote myself. It has three parts: the original paper, a set of illustrations (graphs), and a set of statistical tables. Zotfile's naming mechanism changes their names to "metadata", "metadata_2", and "metadata_3". Years from now, I will never know what the differences between the files are. Instead, I'd like names like "metadata_text", "metadata_graphs", and "metadata_tables".

    I realize Zotfile has the hidden option .disable_renaming. This would allow me to customize the names as desired. But changing and resetting hidden options is cumbersome for just one item, or for a number of items encountered on an ad hoc basis.

    So is there an easy way to customize the name of an individual attachment when using Zotfile?

    Would it work to go into the Zotfile library (directory hierarchy), change the names manually, and then link to them without using Zotfile's Rename Attachments feature?

    (Suggestion for enhancement: add a "Custom Rename" option to the Zotfile menu, which will do two things. (1) It will allow the user to customize the attachment's name. (2) It will flag such a renamed individual attachment to be exempt from future renaming according to the general renaming rules.)
  • marsh -- you can just rename in Zotero (by clicking on the bold title at the top right). Only downside is that it won't persist if you auto-rename again with ZotFile in the future, but that should be rare, no?
  • Joscha, merci pour le plug-in Zotfile. Je suis content, ce qui me permet de travailler avec l'Ipad.
    Ma question: dans le fichier d'extraction des annotations, je souhaiterai avoir les références avec la norme APA (Joscha, 2016, p. 23) à la place de (Joscha, 2016:23). Comment faire?
  • edited April 19, 2016
    Thanks, Adam! I didn't even realize that sucker was there, much less that you can click on it and change the file name.

    Still, I do think my suggestion would be a good enhancement to Zotfile. Not only might it be more intuitively obvious than clicking the bold title, it would also keep fools like me from accidentally resetting things with Zotfile's auto-rename.

    In general, I'm a big fan of the software tools/Unix philosophy of "do one thing well." But in some cases, it's warranted to have multiple ways of doing one thing. I think this is one of them.
  • Hi. Thanks for the excellent add-on for Zotero.
    My question: in the extraction file annotations, I would like to have the references with the APA norm (Joscha , 2016, p . 23) instead of (Joscha , 2016: 23). How to do?
  • Can I ask a question about hypothes.is annotation import, whether it's supported or something that's possible, I havn't tried it yet and it's a new application still under development so maybe this is too early to ask but it would be a fantastic ability. My understanding is that there will be public and private, and limited group, annotations. Being able to selectively import one or more of these into your notes would be a necessary feature. It just seems an exciting project for the presumably predominantly academic minded zotero community, ultimately being cross browser, cross platform with pdf.js and the upcoming epub.js integration etc. Would this be something that could be feasible in the hopefully near future?
  • ghavamikia -- it's not possible currently, though some degree of it might be in the near future, but also this is the wrong thread. This almost certainly wouldn't happen as part of Zotfile.
  • Adamsmith Thanks for the reply! What application would this function be part of? Or potentially be a part of, if there is one? Am I on the wrong site even, would it be a non zotero extension?
  • either Zotero proper or a separate add-on. Zot_File_ as the name suggests, is all about handling files. The whole point of Open Annotations/Hypothes.is is to separate annotations from the file.
  • That's very helpful, thank you!
  • edited May 19, 2016
    I'm running Zotero v4.0.29.10, Zotfile v4.2.6, and Zotero Connector v4.0.29.1 (also active is Zutilo v2.0.1), under elementary OS Freya (based on Ubuntu 14.04, 64 bit) with Google Chrome v50.0.2661.102.

    I have Zotero set to use a base directory, so I can sync with my laptop via Dropbox.

    I have Zotfile set so that after adding an article, it will rename it following certain rules, and store it in a sub-folder (within the base folder) named after the year of publication.

    I've found two issues:

    1- After adding an article (example: http://arxiv.org/abs/1605.05330) it will be properly renamed and stored by Zotfile. If I then remove this article from my library, moving it into the recycling bin first and then *removing it from here too*, the entry in Zotero is gone as it should, but the PDF file will remain orphaned in the sub-folder.

    2- The following arXiv entry is not being automatically renamed and moved to its sub-folder by Zotfile and I don't know why: http://arxiv.org/abs/1605.05700. I have to manually force the renaming, and then it gets moved properly.

    The first issue is a serious one since it means Zotfile leaves trash behind that one is then unable to trace. The second issue is weird but not a show-stopper (as the first one I believe is)

  • edited May 19, 2016
    1. isn't actually a bug, but due to the fact that these are _links_ to files rather than attachments. So what you delete in Zotero is a link and deleting a link absolutely should not delete the item the link points to.
    For this to work the way you want, ZotFile would either have to interface with how Zotero treats file links in a much more heavy handed way or Zotero would have to treat links to files in a completely non-standard way. I don't really see either of those happening any time soon (the former I think is conceptually fine, but a mess to implement; the latter I don't think should happen ever).
  • Well, I understand what you say but I definitely see this as non-expected behavior (at least I didn't expect it).

    Is there any way to track down these orphan files manually? I hate leaving trash behind like this..
  • someone wrote a script to search for orphan files in storage, but that's obviously simpler than here -- but you could see how that looks and if it can be adapted. No idea if that's possible there (the script is in perl, I believe?)
  • edited May 19, 2016
    Thanks Adam, do you have a link or the script file itself?

    As I can only code in Python, I'm not sure I could come up with an entire plugin. Perhaps a simple script that scans the Zotero database and compares with the stored PDFs by Zotfile. How can I access my full database? Which file should I look into?

    Also, what do you think of issue 2?
  • don't have a link, but it's on this forum, so should be googleable.

    No idea on 2. If you can replicate it reliably, report via github.
  • Thanks Adam, I'll report it over there.

  • For some time now, whenever I extract annotations in Zotero, the highlighted text ends up shifting about a line or half a line up in the pdf, thus selecting (and extracting) the wrong text. When I then reopen the PDF, the highlights have moved. This does not seem to be a problem if I don't extract annotations, and thus I'm supposing that the problem is with the zotfile extension. This occurs in almost all cases of PDFs that I have highlighted and from which I've attempted to extract annotations, but I can't seem to resolve the issue. I've tried changing the extraction to Poppler, but this did not make any difference. Any guidance would be much appreciated.
