Exclude some metadata from import via connector

jandavid · August 27, 2025

When importing an item from a browser via the Zotero Connector I often get lots of metadata items that I don't need. Is there a way to exclude specific items?

For example, for a journal article, it frequently puts "Publisher" in the Extra field. I don't want that there.
Or, for an item that already has a DOI in the DOI field, I still get an additional "_eprint: https://doi.org/10.1080/..." entry in Extra.

Also, I will never need to know the "Library Catalog" so I'd love to prevent Zotero from filling it automatically.

Is there a place where I could specify specific metadata fields to be excluded from (browser) import?

AbeJellinek · August 27, 2025

There's currently no way to exclude certain fields, no. You (or ChatGPT) could use Zotero's JavaScript API to batch-modify items after import.

For example, for a journal article, it frequently puts "Publisher" in the Extra field. I don't want that there.

This will be migrated to the standard Publisher field once that's added to Journal Article items in an upcoming release.

Or, for an item that already has a DOI in the DOI field, I still get an additional "_eprint: https://doi.org/10.1080/..." entry in Extra.

Yeah, this is intentional when importing directly from BibTeX, but we should do a better job of cleaning it up when importing from a site that just happens to use BibTeX as its export format. What site/translator (Library Catalog field) are you getting these from?

jandavid · August 27, 2025

Thanks much for getting back!
The example comes from www.berghahnjournals.com

Try for instance
https://www.berghahnjournals.com/view/journals/ajec/34/1/ajec340102.xml
It puts "www.berghahnjournals.com" into Library Catalog and adds Publisher and Section in the Extra field (Section is just a duplicate of the journal name from the Publication field.)

I've followed your suggestion and asked ChatGPT to write me a script to clean up upon import. I've then added it as an Action to the Action & Tags plugin (Event = Create Item; Operation = Script). But nothing happens. Can you spot what's wrong (sorry, I don't have any scripting experience)?


// This script is intended to be run in the Zotero JavaScript API environment

// Function to modify an item after it is added
function modifyItem(item) {
    // Check if the item has the "Library Catalog" field and delete it
    if (item.getField('libraryCatalog')) {
        item.setField('libraryCatalog', null);
    }

    // Get the current "Extra" field content
    let extraField = item.getField('extra') || '';

    // Check if "Publisher" or "Section" fields are present
    const publisher = item.getField('publisher');
    const section = item.getField('section');

    // If "Publisher" is present, remove it from the "Extra" field
    if (publisher) {
        extraField = extraField.replace(/Publisher:.*?\n/g, '');
    }

    // If "Section" is present, remove it from the "Extra" field
    if (section) {
        extraField = extraField.replace(/Section:.*?\n/g, '');
    }

    // Update the "Extra" field
    item.setField('extra', extraField.trim());

    // Save the modified item
    item.save();
}

// Event listener for when an item is added
Zotero.Events.on('itemAdded', (event) => {
    const item = event.item; // Get the newly added item
    modifyItem(item); // Modify the item
});

// Notify the user that the script is active
Zotero.notify('Item modification script is active. Modifications will occur on item addition.');

adamsmith · August 27, 2025

FWIW, I'd very much keep LibraryCatalog -- that tells you where an item came from and it's also very valuable for any troubleshooting (and doesn't have any real costs as it appears very rarely in citation styles)

jandavid · August 29, 2025

Point taken. LibraryCatalog may give valuable information – although shouldn't this information also be available through the URL? In all cases I have come across the LibraryCatalog field is populated either by the base domain of the URL or the name which can usually be inferred from the URL.

If my URL is https://www.jstor.org/stable/...
then in LibraryCatalog I get simply JSTOR.
So it doesn't give me any additional information I don't already have. And for troubleshooting the URL should be much more valuable.

To keep it free from clutter I'd love to eliminate that – perhaps reserve only for those cases where an item comes from a physical location/archive/repository that doesn't have a URL, but I think that's extremely rare.

In any case, I would still love to be able to learn how to control other fields, i.e., how to filter out data in general.

AbeJellinek · August 29, 2025

The URL field is used in citations, so it's meant to be filled in with the location of the item's full-text content on the web, not just the URL of the page you saved. If the page you saved doesn't contain full-text content (or a link to a PDF), translators are supposed to leave the URL field blank.

What's your concern about clutter? Especially now that you can collapse the Info section and see a citation preview in the item pane header (right-click -> View As -> Bibliography Entry), it doesn't seem like a little extra info way down at the bottom of Info is that big of a deal.

jandavid · August 30, 2025

Yes, sorry, I wasn't clear, the above was just meant as an example, of course the URL has and retains the full URL. It's just that the first part of the URL will always reveal where an item came from for the troubleshooting purposes that adamsmith mentioned. Sure, it doesn't really bother as it won't be exported, but it's one more item of text that will draw my attention when I'm checking metadata. Over time, I'll probably learn to ignore it if it's always there, but I'd rather have it cleared automatically.

I prefer the Info section to be always expanded in addition to the citation preview in the header, so I can retain a full overview of all existing metadata. That's one of the best design considerations that went into Zotero in my view, that it's always right there (compared to BibDesk, for instance, or some other tools). For my eyes, it's much easier to detect missing items or errors right away.

There are also things the citation preview just doesn't show. Like capitalization for instance, if my preview is in Chicago, for instance, then it's automatically in Title Case, so I'll never know if there are any words that should always be capitalized, but that are stored lowercase, which would be a problem for other styles. The citation preview also doesn't tell me whether the language field is specified or not, which is crucial for BitTeX export later. And so on. So the only section in the right pane I always keep expanded is the Info section. Therefore, it would be nice to find a way to automate some of the cleaning that I'll otherwise would have to do manually.

adamsmith · August 30, 2025

Some examples of where the URL and the library catalog don't store similar information is anything you import via Add by Identifier using DOI or ISBN (including when you drag a PDF to Zotero directly) and almost anything saved from a regular library catalog.

Anyway -- this doesn't exist and the way import is set up it wouldn't be simple to make it exist, but you can relatively easily write scripts that remove info from fields
https://www.zotero.org/support/dev/client_coding/javascript_api#batch_editing