Better Metadata Resolution via "rebiber"

Rebiber is a tool for normalizing bibtex with official info: https://github.com/yuchenlin/rebiber
It works surprisingly well, much more than the lookup engines.

An example input entry with the arXiv information (from Google Scholar or somewhere):
@article{lin2020birds,
title={Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models},
author={Lin, Bill Yuchen and Lee, Seyeon and Khanna, Rahul and Ren, Xiang},
journal={arXiv preprint arXiv:2005.00683},
year={2020}
}



An example normalized output entry with the official information:
@inproceedings{lin2020birds,
title = "{B}irds have four legs?! {N}umer{S}ense: {P}robing {N}umerical {C}ommonsense {K}nowledge of {P}re-{T}rained {L}anguage {M}odels",
author = "Lin, Bill Yuchen and
Lee, Seyeon and
Khanna, Rahul and
Ren, Xiang",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-main.557",
doi = "10.18653/v1/2020.emnlp-main.557",
pages = "6862--6868",
}



It would be great to have it integrated in Zotero. Would it be possible to add it as a lookup engine?
  • No, that doesn't make any sense. Zotero can already retrieve metadata from countless services and websites — including via arXiv ID, for the example above. It just needs metadata updating. That's been implemented for a while and is just waiting for some unrelated technical changes before it will be merged.
  • edited April 13, 2023
    Also, the change that you show above does things it really shouldn't be doing.

    title={Birds have four legs?! NumerSense: Probing Numerical Commonsense Knowledge of Pre-trained Language Models}

    is perfectly fine bibtex, and must not be "normalized" to

    title = "{B}irds have four legs?! {N}umer{S}ense: {P}robing {N}umerical {C}ommonsense {K}nowledge of {P}re-{T}rained {L}anguage {M}odels"

    as this means something different to bibtex, and will cause styles that demand sentence-case output to incorrectly render it as forced-titlecase.
  • edited April 13, 2023
    I see, thank you for your fast and kind answers!

    The proposal comes from the observation that in my community there is a big problem with "wrong" citations: people cite the pre-prints instead of the published version of the papers (because they are often easier to find).
    People use rebiber not for formatting/normalization but for choosing the correct, official citations among the many (valid) available citations for a paper.

    In my research group, for example, it is common to always "pass" the citations on rebiber to ensure this does not happen. My idea was to avoid the (more error-prone) process to Update&Export from zotero and then use rebiber, with some sort of integration.


    -- Anyway, If I understood correctly what you are saying, there is already a merged feature that does this. That is great :smile:
  • People use rebiber not for formatting/normalization but for choosing the correct, official citations among the many (valid) available citations for a paper.
    Which is fine, but I'd recommend talking to the rebiber creators about their mangling of titles.

    The first bit of the title should be title-cased to Birds Have Four Legs?!, and the the rendering should be left to the bib(la)tex sentence-caser, which bibtex will know when to apply. If the target style is title-case, those extra braces don't do anything (in which case they are useless clutter), and if the target style is sentence-case, those braces prevent the style doing its work (meaning they are harmful).
Sign In or Register to comment.