Fulltext search in Web Library yields no additional hits

edited November 14, 2022
In my web library there is no difference in number of hits when switching from "Title-Creator-Year" to "Title-Creator-Year + Fulltext Content".
(search for "reliability": 17 hits, same as in the desktop client, while the latter gives 74 hits with "all fields + tags" (of which 6 are tagged), and 473 hits with "everything")

My setup is zotero 6.0.18 with data syncing (fulltext syncing is checked and was successful), but without file syncing (I'm using linked attachments).
Is this an expected limitation of my setup that I have missed or a bug?

Edit:
Did some experiments, and it looks as though this might be a reindexing problem: The fulltext search is ok for a newly imported subcollection, and when (in the original subcollection) I manually reindex one of the "missing" items, it yields a hit in the web search.

Anyone an idea for the reasons, so I can avoid those problems in future?
  • Update: Re-building the index did not solve the issue, though now the web library search now gives some more hits.

    Possibly related: There is one article found by searching "mikrogl" but not "mikrog" or "mikro" (whereas the desktop clients has no problems finding it by any of the three phrases).

    Looks as though the web search uses the fulltext index not the same way as the desktop client does.
  • There was a temporary problem with full-text indexing in the online library. We're currently reprocessing the items that didn't get indexed, so this should be resolved automatically within a few days.

    Thanks for reporting.
  • Thanks for your reply! Then I'll wait and give an update about the resolution.

    (Out of curiosity: This means the index from linked attachments (from the desktop client) is not used as is (after sync to the zotero server), but has to be processed/re-indexed itself prior to being used by the web search?)
  • edited November 18, 2022
    Hi @dstillman, there are no changes to the observed problems yet, neither the differing number of search results nor the strange behaviour when searching for part of a word.

    (I wonder if the problems might be related to https://forums.zotero.org/discussion/101222/partial-library-sync-between-devices#latest or https://forums.zotero.org/discussion/101049/different-search-results-on-different-computers#latest - perhaps this help to track the error down?)
  • edited November 18, 2022
    No, it's what I said. Those threads aren't related.
  • edited November 18, 2022
    This means the index from linked attachments (from the desktop client) is not used as is (after sync to the zotero server), but has to be processed/re-indexed itself prior to being used by the web search?
    Yes, it just transfers the raw content. It's indexed by a totally different, server-side technology, and the results may be different.

    The backlog is still processing, but it looks like all but two of your PDFs have been indexed. (Those two were uploaded a couple days earlier, and we haven't processed them yet.)

    Note that you should test with full words. It's a different kind of search engine, and it won't necessarily match prefixes in the same way as the desktop app, though we may be able to improve that.
  • Many thanks for your quick reply and the additional explanation. I really do appreciate your hard work in answering those tons of forum requests!

    Sorry - I'm still a bit confused by your wording "PDFs have been indexed", so for clarity's sake: We are not speaking about the PDF itself that is getting indexed on the server side (since I'm using linked attachments without file syncing), but the transferred index (belonging to a PDF, or to a note) that has yet to be processed, right?
    (There are also notes that are not found in the web search.)

    To avoid spamming you with unnecessary updates of mine: could you give me a hint when at the earliest I should text again in case the problem (with search for full words) should persist?
  • edited November 18, 2022
    As I say, all but two of your PDFs have already been indexed. So this issue is basically resolved for you. The search should be working normally for you.

    (And yes, I'm using "PDFs" loosely. The desktop app uploads the raw extracted text and that gets indexed, but the text is associated with a given PDF attachment.)
  • The search should be working normally for you
    Sadly it doesn't. There are still a good many items - despite reindexing on the desktop side and resyncing - which are not retrieved by the web library search.
  • We would need examples.
  • Glad to provide those - which information do you need? IDs of the missing search hits? Examples are ID 12518, with attachment 12519; or ID 3287, with attachment 12539.
    (If necessary, I can send sample PDFs via email to support@zotero.org?)
  • We need the search term you're using and the web library URLs (or just the 8-character item keys) of the parent items you're expecting that aren't showing up in the results.
  • Search term for the 2 examples: reliability [without quotes or spaces]
    The corresponding browser URL ends with "...search/reliability/everything"

    The item keys for the 2 example parent items are HH8HC3YR and HVT4VLUK
  • It's working normally. There's just a 150-item limit for full-text searches for performance reasons, so if you use a term that appears in hundreds of items, you might not find your result.

    But for now I've increased that to 300 items, and we can see how that performs. That will cover all results for you for this search.

    (The current system isn't ideal in that it does the full-text search independently of other fields, so other search terms don't affect the full-text limit, but that's just how this works at the moment.)
  • Ok, didn't know there was a limit.
    Thank you very much for checking and clarifying (and increasing the limit)!

    My results for the desktop app and the web library are very similar now (269 vs. 270 items) - the remaining differences probably caused by the different behaviour of the web version you mentioned, not being able to search for prefixes (e.g. search for "achievement" -> 85 items, "achievements" -> 24 items; both merged manually -> 99; desktop "achievement"- > 98 items).

    One final suggestion: As long as the web library has limited search capabilities, how about listing those restrictions on https://www.zotero.org/support/searching to avoid confusion about different search results?
Sign In or Register to comment.