Will Add by identifier gobble up anything and output random new items?

I was looking to improve the Glossa style so copying this reference for retrieval: "Milewski, Tadeusz. 1951. The conception of the word in languages of North American natives. Lingua Posnaniensis 3. 248–268."

I inadvertently pasted it into the add item by identifier and it didn't warn about not being able to parse it but duly delivered for new items in my library, all from 1975, none having anything to do with the source ref. Here they are:

* Schulz, J. et al. 1975. “Affinity Elution of Pyruvate Kinase from Phosphocellulose.” Acta Biologica Et Medica Germanica 34 (8): 1321–32.
* Moore, G., G. Burford, and K. Lederis. 1975. “Properties of Urophysial Proteins (Urophysins) from the White Sucker, Catostomus Commersoni.” Molecular and Cellular Endocrinology 3 (4): 297–307.
* Smith, R. J., and R. G. Bryant. 1975. “Metal Substitutions Incarbonic Anhydrase: A Halide Ion Probe Study.” Biochemical and Biophysical Research Communications 66 (4): 1281–86.
* Wood, S. 1975. “The Effect of Environmental PH upon Acid Hydrolase Activities of Cultured Human Diploid Fibroblasts.” Experimental Cell Research 96 (2): 317–20.

I have occassionally noticed something like this also with incomplete DOIs or ISBNs I think: it won't stop with an error but instead serve up something ostensibly unrelated
  • What is happening is that it is recognizing short numbers as PubMed IDs. What exactly are you pasting in Add by Identifier?
  • He pasted the above citations and these are indeed items with PMID 1951, 3, 248 and 268 respectively.
    We should probably make PMID extraction much stricter. While pasting the above was obviously a mistake, I'm not sure it's worth the false positives to extract PMIDs unless they're bounded on both sides by whitespace or the beginning/end of the string.
  • Or at least not do so unless the numbers are sufficiently long. " 12345678." might be reasonable, but " 3." and " 1951." are not.
  • I think being strict makes sense. bounded by whitespace or newlines beginning/end of string seems right to me.
  • Thanks! And interesting, I hadn't realised it could be PMIDs.
  • Add by Identifier also gets tripped up for me when I paste a DOI with the / characters escaped
