Questions about full-text synching

First, a general full-text question. In thread https://forums.zotero.org/discussion/33126/what-does-sync-fulltext-content-do/ it says full text is "The text content of indexed attachments, i.e. PDFs and html files."

The Zotero preference pane has sections for "full-text cache" with maximum size listed in characters and "PDF indexing" with maximum size in pages. Based on the above answer, it seems that in an indexed PDF, the indexed text is considered full-text. So it is the case that if the PDF index is bigger than the max full-text cache character limit, the PDF index is truncated? Or is the extra text just not included in the Zotero full-text index that is searched?

Now, my sync questions:

How does the sync deal with different indexing parameters across computers? Does the sync always replace the older file with the newer one?

Say I have two computers with Zotero stand-alone. Assume I'm dealing with PDF attachments. On computer A the index max is set at 100 pages, on B it goes to 200. If I have an attachment file, and I index it on computer B, sync it, and then re-index it on computer A, will the file index now stop at page 100?

Also, I usually set the index at zero pages to speed up adding documents. If I want to index them later (overnight, say), and then reset the index to back to zero pages, will that erase the index? What if I sync to another computer?

Do full-text contents sync to the Zotero servers if I do not store attachments on those servers?

Thanks!
  • So it is the case that if the PDF index is bigger than the max full-text cache character limit, the PDF index is truncated? Or is the extra text just not included in the Zotero full-text index that is searched?
    The character limit isn't used when adding PDF content to the index, just the page range. So if the max pages setting is set to 100, it will index all 100 pages, regardless of length, and it should sync the full-text content of those 100 pages as well. (So yes, the preferences are maybe a bit misleading in that sense.)

    Based on my reading of the current code, the one exception may be that, for phrase searches, it won't search past the character limit, even for PDFs. So it might be possible to find PDF results from word searches that then don't show up if you search for the same words in a phrase. We should fix that.
    How does the sync deal with different indexing parameters across computers? Does the sync always replace the older file with the newer one?
    If by "file" you mean the full-text content, then yes.
    Say I have two computers with Zotero stand-alone. Assume I'm dealing with PDF attachments. On computer A the index max is set at 100 pages, on B it goes to 200. If I have an attachment file, and I index it on computer B, sync it, and then re-index it on computer A, will the file index now stop at page 100?
    Yes. Clicking Reindex will mark that item's full-text content as unsynced, and so on the next sync the shorter content would be sent up to the server and to other computers from there.
    Also, I usually set the index at zero pages to speed up adding documents. If I want to index them later (overnight, say), and then reset the index to back to zero pages, will that erase the index? What if I sync to another computer?
    Simply changing the setting in the prefs without taking any other action won't have any effect. Syncing also doesn't use the current max-characters setting, instead checking how many characters were actually indexed previously for each given file. So that means that if you index 200K of a file, set the pref to 0, and then sync, it will sync 200K, not 0.
    Do full-text contents sync to the Zotero servers if I do not store attachments on those servers?
    Yes, meaning that WebDAV users can benefit from the feature as well. But that's the reason we added a prompt on the first sync in the new version (which obviously we don't normally do for new features).
  • One more question:

    For pdfs, is the full-text content tied to the pdf itself such that if a pdf attachment is deleted from an item, then the full-text index is also deleted from the item? I am wondering based on your comment that the full-text contents sync to the Zotero server even if the attachment doesn't.
  • For pdfs, is the full-text content tied to the pdf itself such that if a pdf attachment is deleted from an item, then the full-text index is also deleted from the item?
    Yes. What matters is the attachment item, not the file itself.
Sign In or Register to comment.