How to test for an invalid Language field?

edited October 13, 2020
I want to make a saved search called "Language Field Invalid" that will give me a list of all records with a Language field that is neither blank, nor one of the ISO codes I use, viz: EN, FR, DE, IT, SP, and ZXX.

I set up a search that tests for ALL the conditions. Each condition is 'Language' ... 'is not' . I repeat all the codes in lower case and use % to recognize a blank field.

Because I am testing for ALL in a list of 'is not' conditions, I expect the valid codes (true conditions) to be excluded from the results. Instead, the search appears to return every item in the database. What am I doing wrong?

If I do the opposite and test for ANY against a list of 'is' conditions, I correctly get a list of all those that have a valid code. What I want is everything that is NOT in that list, so I tried also making the Valid list, then making a list of what is not in that 'Collection'. Same result. If I change it to a list of what is in the collection the expected result ensues.

BTW, the reason I am not sure it is the whole library is that all the sub-items are exposed in this search, but not in the ones that works. That's odd too!
  • It returns every item because every item matches one of the conditions (e.g., a FR item is not EN).

    If you change to ALL and switch the % to just a blank cell, it should work.
  • edited October 13, 2020
    I am using ALL, as I said:
    "I set up a search that tests for ALL the conditions. Each condition is 'Language' ... 'is not'". If it is not the first code and not the second one, and so on for the whole list then it is not any allowed code. However, the records with allowed codes are being listed as well as the invalid ones. That's the problem.
    I can send a screen shot if you like?
  • edited October 13, 2020
    BTW, the reason I am not sure it is the whole library is that all the sub-items are exposed in this search, but not in the ones that works. That's odd too!
    It looks like "is Not" searches include items that don't have the field in question, which would be all notes and attachments. That might not be the best choice, but I think you should get the results you want by checking the "Show only top level items" box.
  • P.S. the search on blank field does not work. I tested it separately and it matches nothing at all. Neither does %. % works as a wildcard (for any number of characters) if used with letters, but not on its own, apparently.
    Is there any way of matching a blank field or one that is non-blank with any value?
    i am using Zotero 5.0.91.
  • Yes, 'show only top level items' works. Thanks.
  • Once I removed the match-blank-field row from the search (with blank or %), the search works as expected. I now have a saved search that correctly shows invalid codes AND blank fields. With 'show top level only' I can now see that the number listed is somewhat less than the total in the library and sampling show only invalid or blank field.

    So, the blank field search is the problem. I still want to get rid of the blank fields because they are not invalid (simply unspecified). How can I do that??
  • Going quickly on the blank item so haven't tested in detail, but what I think works is to
    1. Create a Saved Search with Language does not contain and a blank field and the only top level item field selected. That will find all items with nothing in the language field

    2. Include Collection -- is not -- the saved search from 1. in the condition
  • ... It seems to be field-dependent too! % works as wildcard in the Title field but not in Language field (!)
  • edited October 13, 2020
    *I think I cracked it*

    Neither blank or % works with an 'is' or 'is not' match, but either works with a 'contains'
    or 'does not contain' match. So both 'Language ... does not contain ... %' and 'Language ... does not contain ... ' return a list of records with a blank Language field. That means a blank value in the condition matches to "anything", not "nothing" as one might expect — the same as %, right?

    I therefore added the row 'Language ... contains ... %' to my search instead of 'Language ... is ... %' and now get the wanted list of codes that are not blank but not valid either.

    A Zotero change request to add 'is blank' and 'is not blank' as operators in the search would be welcome to make this simpler. I would also prefer the wildcard to work consistently in the same way for any operator and any field. You could use an escape character like \ to search for an actual % symbol.
  • It looks like "is Not" searches include items that don't have the field in question, which would be all notes and attachments. That might not be the best choice
    Should [Field] [is not] ["value"] find only items that 1) have the specified field but 2) have it set to a different value?

    There are some cases where the current behavior seems pretty clearly undesirable — e.g., [Attachment File Type] [is not] [PDF], which matches all regular items and notes in addition to non-PDF attachments. [Attachment Content] [is not], on the other hand, only matches attachments, which seems clearly correct.
    I would also prefer the wildcard to work consistently in the same way for any operator and any field.
    The % is really just a leaky abstraction from SQLite and not something we officially support. The reason it works in 'contains' is because that's a LIKE statement. An 'is' condition is an = statement, and wildcards don't work for those.

    _ and % are hardly expected end-user characters, so any change here would need to abstract these further to ? and *. If we supported wildcards in 'is' (which I think is sort of debatable, because then 'contains' is really just the same as putting wildcards at the beginning and end of 'is' and therefore sort of pointless), we'd need to translate those into = and LIKE queries depending on whether wildcards were present.
Sign In or Register to comment.