Index end of long pdf files

Hello Zotero devs,
I wanted to make a recommendation to change the default behaviour of the Zotero pdf indexing.

For long files (those exceeding the default 50k words/100 page indexing limits) the program should use the final (eg) 10k words or 20 pages to index the end of the file. I think this would make it so that the program by default would capture the index of books stored in the library - arguably the most important thing to index in a text book.

  • might be tricky, though, because that would also mean it would index the bibliography, especially for working papers that don't have an index -
    I'm not sure that's desirable?
  • Hi Adam,
    This is possibly true, but in my experience documents of such long length are more often terminated with an index than a bibliography.
  • edited June 8, 2011
    What what? An index is an index. You generate that from keywords. It has nothing to do with a bib, which has to do with your sources. Books often have both. [I deleted this.]
  • The future of America is bleak.
    if that's indeed the case that's because people feel compelled to comment on conversations even though they clearly don't understand what they're about...

    @sethmg - I'm not sure your experience is generalizable - in the non English world indices are a lot less common, for example - I have a bunch of long pdfs in Spanish and German ending with a bib. Also, most documents by international organizations - many of them above 100p. - don't have indexes.
  • @amphioxus: Ouch! That was harsh. We are talking about indexing an index, as appended to the back of a textbook, generally created by editors and others that are informed about the book's purpose and content. You are thinking of indexing an index, as created by an uninformed indexing algorithm. I was relying on those reading this posts to make the differentiation based on context.

    @adamsmith: Sounds reasonable, I can certainly see that. With the current method the table of contents is captured, which is likely enough.

  • edited June 8, 2011
    Yeah, my apologies. I was cranky and improperly harsh. And I think we're mostly saying the same thing, with some confusion obscuring agreement. Nope, I was talking about indexes of subject matter, which Zotero does not do.
    But, re. "the non-English speaking world:" 'Bibliography' and 'Index' are Latin words, meaning the same in German, Spanish, Russian, and adopted by the rest of the world. One means 'Books' (etc.), while the other means, 'Let your index finger do the walking in a printed topical search database.' (And yes, you're correct--many documents, in any language, don't contain an index. Most, in any language, do contain either citations, a bibliography, or both.

    American Heritage Dictionary says, of Index: ETYMOLOGY:
    Middle English, forefinger, from Latin; see deik- in Indo-European roots

    So it probably dates to ancient Sanskrit database engineers, who didn't even have palmtops, so how could they know anything? Good luck.
Sign In or Register to comment.