NEW: Odd result when importing PMIDs with Magic Wand

edited September 11, 2020
I've noticed this problem several times over the past 8 to 10 days. The number of records imported is 1, 2, or 3 items fewer than the number of PMIDs submitted. However, if I request debug logging, the number of imported items matches the number of items submitted.

This has happened at least 10 or 12 times. It doesn't happen if I only import 5 or 6 PMID records but consistently happens if I import a substantial number of records. A couple of days ago I "wanded" 148 PMIDs but only 143 records were imported (until I activated debug logging). The omitted PMIDs are not at the very beginning or at the end of the list.

Could this be related to the debug process slowing things down a little and allowing time for every item to be imported? Is there any harm done if I routinely activate the debug-logging each time I will import PMIDs via the magic wand tool?

Zotero beta: 5.0.90-beta.3+e9473caa1
Mac 10.15.6 (19G73)
Firefox 79.0 (64-bit)
My speedtest upload averages 11.1 Mbps Southern California to Western Europe servers and a little faster if in-state. (not slow but not blazing) I don't know if my own speed has any relevance.

Each day I conduct a very sensitive PubMed search to find records meeting my criteria during the previous 00:00:01 (midnight) to 23:59:59 CRDT interval. This produces 600 to 800 PubMed items. I go through the listing, tick the relevant records and export the results to a PMID list. Typically, I have 80 to 120 relevant items to import into Zotero each day.
  • I'm not 100% sure I follow if the same PMID list yields different results (i.e. sometimes all are important sometimes there are missing ones) or not?

    I don't see a good reason this would happen, no. Having debug enabled during import is no problem, though I'd be rather surprised if that actually ended up being the reason rather than a co-incidence. There'd be a tiny chance that a slowdown would matter, but where it'd matter would be on the pubmed end which might rate limit you for too many requests. I thought Zotero sent the search by identifier request as one single request -- if we don't do that, I wouldn't be surprised if rate limiting happens for 150 quasi-simultaneous requests.
  • edited September 4, 2020
    Thanks @adamsmith I'll try to be more clear in my explanation.

    edit: One more thing. I am not syncing these records (syncing is off).

    This afternoon I had a list of 94 PMIDs each on a new line in my text editor (BBEdit). I pasted these into the magic wand tool. The first try imported to only 90 Zotero records, I deleted these. The second attempt (after copying the 94 PMID numbers from my text editor to the magic wand tool) imported 92 records to Zotero.

    I tried to make started a debug log and all 94 records imported into my working collection.

    This has happened several times over the past couple of weeks. Today, after editing the Zotero records and exporting them into a MODS file I, as usual, deleted the imported PubMed records from Zotero to have an empty collection for tomorrow's work. Just to test, I again pasted the 94 PMID numbers into the magic wand and imported only 91 records (without debug operative). I started a debug log and imported the records again and ended up with all 94 PubMed records.

    I can only guess that the process of rapidly importing into Zotero is somehow altered (slowed down a bit?) by running a debug log. That is only a guess because I really have no knowledge of the process Zotero uses to obtain full PubMed records from NLM by submitting PMIDs.

    I have a work-around so this isn't a serious problem for _me_ but the issue is so peculiar that I feel it is worth reporting. My primary use of Zotero is 1) as a really great editor of imported records from PubMed and publisher websites and 2) as a way to export the records in MODS format so that I can import them into my online database.

    Might it be helpful to provide a debug log of a successful import after a partial import? I don't see how that might be useful but I can do that.

    Aside: I regularly poll the NLM API with PMIDs of journals that are Medline-indexed to obtain MeSH terms (I find that the indexers are usually finished assigning terms after 5 or 6 months -- often sooner but certainly by then). I send the requests in batches of about 200 items and haven't had an issue with rate limits. I send a similar request for ePub items to get volume, issue, and pagination metadata. I do this 18 months after the CRDT (record creation date). I run the ePub / metadata fetch script about every 6 weeks and I send at least 600 or more PMID queries. I have had no rate limiting issues.

    Thanks again

    edit: For many years I had a license to bulk capture NLM data. My requests were accompanied by the license number string. A few years ago I received notice that my license was no longer needed and that there would be public access to the files. I seem to recall that something was mentioned about monitoring requesting IP addresses and I've used the same 2 or 3 IP addresses from the very beginning.

    edit 2:
    I keep close track of the total number of PubMed records returned from my query as well as the true and false positives so that I can follow closely the precision and recall stats. This close monitoring has been especially important after PubMed not only changed the interface but also changed the term mapping. It is important to keep testing my search string so that I limit the irrelevant items but capture all relevant ones. Thus, I know how many records I expect to import into Zotero.
  • Might it be helpful to provide a debug log of a successful import after a partial import? I don't see how that might be useful but I can do that.
    No, we'd really need a debug log that didn't include all items.

    While it's not impossible that there's some heisenbug situation here, it may just be the result of repeated requests to the PubMed API, with a couple items timing out the first time or something like that. If that's the case, trying to log the first import might still demonstrate the problem.
  • The pmid list included 37 items but Zotero imported 36. After I ran the import with debug active and received the reported result, I ran the import again as a test without debug and imported only 35 items. FWIW the second run omitted PMID 32906614.

    The debug ID is: D1606513612
    I hope that this is useful.

    Thank you

  • 429 means "Too Many Requests". It's just rate-limiting by PubMed.

    In theory we could automatically slow down and retry when that happened, but because of how the translator architecture currently works it might take a lot of work to implement that. I've created an issue to track it.
  • Thanks. I wonder how many is "too many". It will not be too much effort for me to limit my Magic Wand queries to, say, 20 at a time and allow a pause between request series.

    Are these requests of NLM arising from my local Zotero client (my IP) or is the Zotero system making the queries and then returning the results to me ('your' IP)?
  • They're direct from your IP.
  • I have had success getting complete imports if I copy 3 PMIDs at a time to the Magic Wand tool. The small delay from my copy/paste process is sufficient even though I can do the cycle very quickly. I've imported 250+ items this way. Note that trying this copy/paste cycle with 4 PMIDs at a time will eventually trigger the rate-limiting and instead of 4 items I'll capture only 2 or three.

    Lesson: Importing PMIDs in bulk works best with batches of 3 and can be accomplished fairly quickly -- each batch imports almost instantaneously.
Sign In or Register to comment.