Are HTML-SNAPSHOTS indexed (& thus searchable?) thru Zotero ?

Are HTML-SNAPSHOTS indexed (& thus searchable?) thru Zotero ?

ps: I've let Zotero install the indexing plugin (for Linux) automatically; some pdf files are indexed, others not yet. How long does it take for indexing to complete? What does it depend on? Idle CPU? Any methods to expedite indexing? Does this also index the HTML Snapshots?

thx!
  • html snapshots are indexed.
    PDFs should index on import - if they don't you can try to see if you can index them manually by clicking on the green arrow next to "Indexed: No" - some files just won't index - most often because they don't have OCRd text, but at times there are other problems (permissions etc.).
  • thx Adam!

    I think what did the trick (for me) was to click on the black down-arrow in the standard search box and select "Everything"...
  • I have the same problem:
    Snapshots of webpages - for instance this page - are correctly marked as indexed, however not searchable through Zotero.

    Any suggestions?
  • edited September 25, 2012
    what do you mean by "not searchable through Zotero"? Zotero includes the text content in its index so if you search for a phrase/keyword included in the site using a search setting that includes attachments (i.e. "Everything" in the quick search bar or "Attachment Content" in the advanced search) they'll come up.
  • edited September 25, 2012
    However searching "everything" there won't i.e. be a match on "adam" (with this page included in Zotero and indexed).

    EDIT: Using the advanced search however works.

    I reinstalled the standalone and the connector to make sure that I have the current versions.
  • I have done some further research if it is of interest for anybody out there...

    A fulltext search using the quicksearch bar --- set to 'everything' --- only works if the search phrase is put in quotes. This automatically starts the advanced search.

    Since I am also working with the Zotero code, I found a solution to this problem. If this is just how Zotero is designed to be and hence a feature and not a bug, please simply consider my suggestions as irrelevant.

    In order for Zotero to also search the contents of indexed attachments using the quicksearch bar the following simple modification has to be made:

    add the code "this.addCondition('fulltextContent', operator, split, false);"
    in zotero.jar/content/zotero/xpcom/search.js#addCondition after line 434.

    So far the search condition is only specified by "this.addCondition('fulltextWord', operator, split, false);" which for some reason does not work. To be honest I don't fully understand it but it has to do with the table that is searched by this condition which, in turn, does not contain all indexed words.

    Hope I did not confuse anyone.
  • I believe that's a bug. If you open a pull request and submit a patch to Zotero I'd expect that to get fixed pretty quickly.
  • I'll do that! Thanks!
  • edited September 26, 2012
    This change shouldn't be necessary—fulltextWords should be sufficient for a keyword search, and it works for me. Can you provide exact steps to reproduce, with the specific words you're using and other details on the item you're searching for (ideally with an example URL)?
  • Of course - I used this page, right after your comment and added it to Zotero by using "Save Zotero snapshot from current page". The snapshot is indexed.

    Then I started a search, using the quicksearch bar that has been set on 'everything'. Searching for the string 'Dan' without quotes should in my understanding result in a hit for this page. However it doesn't.

    What is the definition of keywords?
    I suppose it is not 'non stop words' in retrieval terms?
  • edited September 26, 2012
    Interesting... I just tried some other pages to reproduce this.

    I added http://www.zeit.de/gesellschaft/schule/2012-09/waldorfschulen-studie to Zotero using the corresponding translator (sorry, it is a German webpage).

    Interestingly searching for 'Barz' (a name) or 'Bildungsforscher' (a noun) works, however searching for 'gesund' (part of 'gesundheitliche' an adjective doesn't. In turn, searching for 'Problem' (again a noun) will not work either.

    Does that help to reproduce my search problem?
  • All of your examples—including my name on this page—work for me.

    Provide a Debug ID (from Standalone) that covers 1) saving a snapshot and 2) searching for it in Everything mode.
  • Since 2012, is it the case that HTML Snapshots are indexed and the content readable once the Zotero Indexer is installed/activated? Even if it is imperfect.. in German.
  • This worked back in 2012 (at least it did for Dan and we never got the debug for it not working) and works now. It doesn't actually require any installation; those tools are only necessary for PDFs.
  • As a follow up - and to clarify - the html snapshot is being searched in full text? Also, is there any way to highlight the searched-for term in the snapshot, or .pdf? For those who are more visually inclined?
  • yes and not currently, though generally planned.
Sign In or Register to comment.