Zotero search duds + Gibberish characters

Hi, I have had two issues with Zotero for Apple. Issue (1) prevents me from using Zotero as a linked database.

(1) The first is searching my library on the downloaded version of the program. I search - even for a single word- using the search window slot at the top middle of the Zotero window. Zotero returns hits in some of the "Notes" which are associated with a single bibliographic entry. That's OK, because I would also expect the hits to be in the "Notes". BUT: there are no actual hits there, also within the key words. It will find "real hits", but those are in the minority.

(2)The second issue regards gibberish character streams, such as †or “ or ’ or ’ or – or … ...etc. I had exported the results from a stored search as rdf-format, and then reloaded it into a new Group Library. The re-uploaded file contains those characters randomly imbedded into the text of notes and titles (yes, the Euro-symbol is a favorite)

Of possible relevance to (1) is my observation that the native Apple search engine also finds bogus words throughout e.g. pdf files.

What's going on here?

Thanks!
  • BUT: there are no actual hits there, also within the key words. It will find "real hits", but those are in the minority.
    I'm not sure what you mean by that. You're saying it finds matches that shouldn't be matches? To be clear, you understand the difference between the black and gray results? That the child items will always appear, regardless of whether they match the search, but will appear in gray?

    If you think you're getting matches you couldn't, could you share the URL for one of the items you think shouldn't match when you view it in your web library, along with the search term and search mode (e.g., "Everything") you're using?
    Of possible relevance to (1) is my observation that the native Apple search engine also finds bogus words throughout e.g. pdf files.
    Well, are you talking about notes or are you talking about PDF files? Those would be totally different issues. If the OS search finds words within the PDF files, that's likely just due to bad OCR for the file, particularly if these are older papers that were scanned. Zotero will just search the hidden text layer in the document.
    I had exported the results from a stored search as rdf-format, and then reloaded it into a new Group Library.
    First, to copy items to a group library, you can just drag — there's no need to export and import. But exporting and importing shouldn't affect those characters. Have you checked to confirm that you don't have the same thing in the original library? Can you export a single item this is happening for, upload it somewhere, and provide a link here?
  • edited August 24, 2020
    Hi,

    Thanks for your quick response. Here are my imbedded answers:

    BUT: there are no actual hits there, also within the key words. It will find "real hits", but those are in the minority.
    -I'm not sure what you mean by that. You're saying it finds matches that shouldn't be matches? To be clear, you understand the difference between the black and gray results? That the child items will always appear, regardless of whether they match the search, but will appear in gray?
    --Yes, it’s finding matches in the child items which shouldn’t be matches. Specifically, the notes for a bibliographic entry are incorrectly colored black

    -If you think you're getting matches you couldn't, could you share the URL for one of the items you think shouldn't match when you view it in your web library, along with the search term and search mode (e.g., "Everything") you're using?
    --I searched for a term under “Everything”. I then searched the child entries which appeared black, using the search function which works with the child entries, I thereby determined whether the hit was correct.
    For example, searching for the term “qua” in the library which is named “Temp Video”. Here are the first six bogus hits from that search, together with my comments;
    1. Entry: Energy Efficiency 2019; child: “p 37-1 Figure 2.14. …” fake hit
    2. Entry: “The carbon footprint of streaming video: fact-checking…”; child “(5) Although the carbon footprint…” fake hit (has pasted graphic file)
    3. Entry: “Semiconductor device, display system and electronic device” child: “Not N-L-O relevant”
    4. Entry: “Chip-to-chip interconnect with embedded electro-optical bridge structures”; Entry: “FIG. 3 is an exploded view….” fake hit (has pasted graphic file)
    5. Entry: “Energy 101: Energy Efficient Data Centers”; child: “Apparently posted on 5/25/2015”. This child has an imbedded url. That url address does indeed contain the character string which I was using to test.
    6. Title: “What is Fiber Optic Transceiver | Optcore.net”; child “Definition of Fiber Optic Transceiver”….fake hit (has pasted graphic file)

    Of possible relevance to (1) is my observation that the native Apple search engine also finds bogus words throughout e.g. pdf files.
    -Well, are you talking about notes or are you talking about PDF files? Those would be totally different issues. If the OS search finds words within the PDF files, that's likely just due to bad OCR for the file, particularly if these are older papers that were scanned. Zotero will just search the hidden text layer in the document.
    --I don’t save separate copies of indexed pdf files in the Zotero registry. That would become far, far too large. Instead, I just paste the name of the relevant pdf file into the individual Zotero entry. So it isn’t possible to search in the manner which you have just described.

    I had exported the results from a stored search as rdf-format, and then reloaded it into a new Group Library.
    -First, to copy items to a group library, you can just drag — there's no need to export and import. But exporting and importing shouldn't affect those characters. Have you checked to confirm that you don't have the same thing in the original library? Can you export a single item this is happening for, upload it somewhere, and provide a link here?
    --Initially I had just tried to “drag-and-drop”. But for whatever reason, that didn’t work as intended. But now, yes it did. So this problem appears to have defaulted.

    Thanks so far.....
  • edited August 24, 2020
    You didn't provide the web library URLs I asked for, and the "Temp Video" library online only has 7 items in it, 6 of them notes. So I don't know what items you're referring to. If you want us to look at them, they need to be uploaded, and you should provide URL from the web library.

    Note, though, that you shouldn't be pasting graphics into notes, and if you do so 1) you could easily get false matches, because image data will contain lots of random characters and 2) you likely won't be able to sync those notes. We'll support embedding images in notes in the future, but for now it's not supported and shouldn't be done.

    I still don't really know what you mean re: PDFs. The fact that macOS finds words in PDF files is only relevant if you've added those PDF files to your Zotero library.
    Initially I had just tried to “drag-and-drop”. But for whatever reason, that didn’t work as intended. But now, yes it did. So this problem appears to have defaulted.
    I don't know what you mean by "defaulted" here. But exporting to RDF still shouldn't result in corrupted characters that weren't in the original library.
  • Hi, this explanation is exhaustive-exhaustive:
    The library which is named "Temp Video" has 38 entries.

    For the online version of “Temp Video”:
    Searching for “qua”gives hits in five different entries.
    There is no way to distinguish "dark" and "light" colored children notes for a hit (or hits) in a given entry. So I don't see how I can send you an url.

    For the local version of “Temp Video”:
    When I carry out exactly the same search on the local version (on my Mac) of that same library, then I get back 33 hits (not five)
    Now I can see which children are colored dark or light. Hence, now I can see the fake hits.
    After this message,then I can try to sen you the urls to one, but from the online Zotero, not the local one. Even though you won’t see any dark color to the child notes at all.
    Yes, I had also considered that the graphics which I had pasted into the child-notes were somehow responsible for the 33 dud hits. Here is what I observed for the first six of the 33 hits:
    #2, #4 and #5 hadpasted graphics
    #5 had an imbeddedurl which contained a string with "qua"
    #3 had nodistinguishing characteristics
    Please recall thatthe entries in “Temp Video” were copied out of a saved search.That initial saved search is called “z_Any video, Parent andChild”
    If I search “z_Anyvideo, Parent and Child” for “qua”, then TWO hits are returned. Neither of them have any dark children or main entries.

    Thanks

    Fenton
    PS: I used the term“defaulted” to mean that the issue of "drag-and-drop" had been solved.
  • OK, you hadn't actually fully synced the group before and only had a few items online. You now appear to have synced the rest, so we can look at it.
    There is no way to distinguish "dark" and "light" colored children notes for a hit (or hits) in a given entry. So I don't see how I can send you an url.
    I'm just asking for the URL after clicking the parent item online so we can quickly tell what item you're referring to. You don't need to perform the search online.
  • One of the entries for which a hit is locally returned but not on the Internet would be the parent starting with e.g. “44th European Conference”. Those appear to be real/correct hits. The URL of the online version of this hit is: https://www.zotero.org/groups/2549006/temp_video/search/44th/titleCreatorYear/items/JTHTU5PB/item-list

    Another parent has the title starting with “Energy 101: Energy Efficient Data Centers…”, where the child has “qua” at the child starting with “Apparently posted on..”. In this case, “qua” is in the lengthy URL which is imbedded into the text. The online version of this entry, itself, has the URL:
    https://www.zotero.org/groups/2549006/temp_video/search/Energy 101/titleCreatorYear/items/KEWLFR54/item-list

    The search on the Internet version had five hits, even though it wasn't possible to identify the children which produced those hits.. Those same hits are also found on the local version.

    Summary: my observations from today appear to be the _ opposite _ of those from earlier yesterday. Yesterday I was finding bogus hits on the local version. Now the search appears to only give authentic ones. However, the online version misses most of those.
  • The web library simply doesn't currently have a two of the search modes that the app has, so the searches won't be identical. It doesn't have "All Fields & Tags" or "Everything", only "Title, Creator, Year" and "Title, Creator, Year + Full-Text Content".

    This all appears to be working properly. You were just seeing matches from embedded image data and links, and hadn't yet fully synced your items with the web library.
  • The search which I just did was for the character stream "ZB". On the local Zotero, it gave six hits, of which all were bogus. Five were for children with graphics. #6 had a strange imbedded link which contained "ZB". I erased it and used the chain symbol to add a "test link" containing simply "ZB". The child-search engine still found it. This suggests that the native link function still needs to prevent this from happening.

    Fortunately, the Internet version gave no bogus hits.

    Does the native link function allow linking to a local, non-url file, especially just a graphic one?

    Thanks!
  • I don't know what you mean by "strange embedded link" — you'd have to provide an example. But any text in the note will match, visible or otherwise, so again, everything is working properly here. There's no way for the search to match a note that doesn't contain the search string in some way.
    Does the native link function allow linking to a local, non-url file, especially just a graphic one?
    https://www.zotero.org/support/attaching_files#stored_files_and_linked_files
Sign In or Register to comment.