linking / importing to PART of a pdf

mheim · October 5, 2010

Hi,

I have dozens of books among my pdfs, often they consist of articles / chapters written by different scholars.
Currently, I create a Zotero entry for the entire book, to which I link the pdf. I then create separate entries for those "book sections" that I'm particularly interested in, and create a "related"-entry to the main book.
With essay collections it makes much more sense to organize the sorting on the level of chapters, and this is no problem in Zotero.

However, if I use the search engine to find text from within the pdf, obviously the entire book (and its entry) pop up. If often find this limiting, since I get many hits that I may not be interested in (and I have to open the file and search it again to find out), since it is impossible to tell in which section of the pdf Zotero has found the search term.

Couldn't it be possible to link to a page span(s) of a pdf-file (which would be indexed separately). Clicking on that link would also bring up the pdf on the respective page.

I think this should be rather easy to implement, and I'm sure I wouldn't be the only one to welcome this.

Thoughts? Opinions?

Cheers,

Matthias

ajlyon · October 5, 2010

I think this should be rather easy to implement, and I'm sure I wouldn't be the only one to welcome this.

Since the PDF readers are outside of Zotero's control, and there is no standard way to refer to a section of a PDF so that all PDF readers will handle it correctly, there's no easy way to do this from within Zotero.

You can, however, use an external utility to split your book-length PDFs into chapters, which you can then attach to your Zotero items.

mheim · October 5, 2010

I beg to differ.

pdftotext, the utility Zotero uses to extract the text from pdfs, has command line options to specify a page range:

-f number
Specifies the first page to convert.

-l number
Specifies the last page to convert.

So, now that there is an easy way to do this from within Zotero... ;-)

ajlyon · October 5, 2010

I still don't see a reliable way to do this, since PDF page ranges and citation page ranges frequently are quite different. You're welcome to explore the code and suggest improvements or submit patches.

mheim · October 11, 2010

I wouldn't have the time to write patches (I also think this suggestion is a frequent dead end for feature requests in os-projects).

I am aware that PDF page ranges are not consistent with the "pages"-field of "book section"-entries in Zotero. However, what I would like to see is a "Right Click on Item->Add Attachment->Attach Link to part of a pdf file"-option. A pop-up menu would then ask for which pages (of the pdf) to link to. Maybe the page-range from the "pages"-field could be the default.

ajlyon · October 11, 2010

The options for pdftotext that you noted are sufficient to limit full-text indexing to only part of the PDF, but we still would be attaching the entire PDF. No way of linking to a specific page in a PDF (as when double-clicking on the PDF) or of splitting the PDF from within Zotero has been explored or coded. Just limiting the indexing, while sufficient perhaps for some purposes, would make for a confusing and inconsistent feature, because it's only a small part of the real solution.

Yes, Zotero should work more tightly with software to read its attachments. Unfortunately, there's no way that development in this direction is going to happen soon, unless a new grant or new developers show up. I know that this is a common answer in the open source world, but it's better than saying that such a feature is undesired or impossible.

fbennett · October 11, 2010

Matthias,

So the idea would be to have a single item in Zotero for a book-length work, and then to have a series of attached links which address particular page spans within the work, and (a) only those spans get indexed, and (b) clicking on the link calls up the PDF at the starting page registered in the link ... ?

The main problem for someone implementing something along those lines would be (I think) that Zotero doesn't do much internally to handle PDF links; it just passes the call through to the environment (Firefox), which invokes some arbitrary PDF viewer. Calling a PDF viewer on a file is straightforward and uniform for all applications: . But passing a page-to-jump-to is going to be application specific (i.e. different applications will use different options). That means that invoking the right command with the right parameters would become the direct responsibility of Zotero, and you'd need to implement the infrastructure to support that (config menus, error handling), multilingualize it, debug it, and maintain it.

It's certainly possible, and it sounds like it would be handy. It's just that it would take a significant effort to put in place and maintain. The "can submit patches" thing is kind of a test for whether it's worth that level of effort. In time, it might prove to be. Meanwhile, one can dice up the PDF itself and attach or link the fragments to the item (as Avram suggests), which does pretty nearly the same thing.

mheim · October 11, 2010

ok, ok, I'm caving in... ;-)

I would have thought that it is as simple as adding a "file.pdf#page=10" parameter. However, this isn't part of the pdf specification, so pdf readers other than Acrobat might not support it (However, at least pdf-Xchange does support it as well).

I can see that a complete support for this request would require a lot of work (ideally, as Frank suggested, the 'chapters' would be listed as separate entries for one larger text).
A minimal support, just linking to the file, not attaching the part, indexing only part and adding a parameter to the link field, however, wouldn't require a lot of work --- but then it would obviously be an incomplete solution...

@Frank:
You basically got it, except that I wouldn't want the "one single item" for the entire book, just chapter items (as they already exist), with the option to link to and index only spans.

I realize that splitting the pdf would be the solution, but it really isn't.
a) The parts of a book usually do make more sense together.
b) You loose the ability to use an index
c) endnotes are often to all chapters and cannot really be split.
Similarly, to split the file, and then keep an un-split copy for the book entry is no solution either, since I (like many others) annotate my pdfs - I don't want two distinct files.

@Avram:
I can see that 'just limiting the indexing' would produce an inconsistent solution. BUT, how about adding a small button next to the re-index button which would query for a page span? This wouldn't add an awful lot of code, while it would do the trick for those people who want to use it (and it would be perfectly consistent).