Unable to bulk import a list of urls

I'm on the Zotero linux desktop client and I'm pulling my hair out trying to make this work. Basically all I want to do is import a list of urls into Zotero and have the metadata pulled from the page and a snapshot saved. If I drag and drop a bookmark from firefox this works perfectly.

The problem is that if I drag multiple items weird things start happening. All bookmarks have the same title, it only imports 1 and there's no title, or nothing happens and drag and drop is disabled until I exit the application.

I tried saving my url as a .html file and importing it, but Zotero didn't treat it as a webpage. I built an ris file by hand and imported that, but no metadata was pulled and no snapshot created. I looked for the "Add Webpage" option in the menu but there's no entry.

Is there any way to pull in a list of urls?
  • edited 6 days ago
    This isn't a supported option, for a number of reasons.

    You're really meant to add URLs from the browser, since those use your existing cookie state, including web-based proxy access and other site logins. While Zotero has the ability to save directly from URLs (which would be roughly equivalent to what you'd get when you save on zotero.org, barring IP-based subscription access), the results might not be as good, so it's not something we encourage when you have the Zotero Connector available.

    This is the reason you can't, say, paste a list of URLs into Add Item by Identifier the way you can paste in a list of DOIs (which itself has the downside of not using any web-based proxy you have to download PDFs, though it will use an IP-based proxy). We could consider adding that for cases like yours, but I'd be concerned that some people might start to add individual URLs that way instead of loading them in the browser, which wouldn't be ideal.
    If I drag and drop a bookmark from firefox this works perfectly.
    It doesn't, actually. Dragging a URL will create a webpage item, but it won't run translation on the page to grab metadata. I'll fix that, but for the reasons above, it's still not really a recommended workflow.

    That aside, dragging multiple URLs just seems to be poorly implemented across various browsers and applications. It's possible we could make that work from some browsers, but I'm not sure we could make it work reliably.
    I looked for the "Add Webpage" option in the menu but there's no entry.
    Because you're not meant to add webpages by hand. See the note here:

    https://www.zotero.org/support/getting_stuff_into_your_library#manually_adding_items

    That said, if you really want to do this, you can save a newline-separated list of URLs to a text file, open Tools → Developer → Run JavaScript, and run this code, after adjusting the path on the first line:

    var path = '/home/username/Desktop/urls.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);
    await Zotero.HTTP.processDocuments(
    urls,
    async function (doc) {
    var translate = new Zotero.Translate.Web();
    translate.setDocument(doc);
    var translators = await translate.getTranslators();
    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    return;
    }
    catch (e) {}
    }
    await ZoteroPane.addItemFromDocument(doc);
    }
    )


    This won't load JavaScript on the page before trying to save, so some pages might not save properly. (There's a way to do that, but it's more complicated and much slower.)
  • Thanks for the detailed explanation of what's happening here. My use case is that I have about 3000 web pages that I want to store in Zotero. None of them are paywalled or anything so I just want to save the raw page itself. It seems like this would be a pretty common workflow but maybe I'm missing something? I can't go to each of the 3000 pages individually in Firefox and manually save them.

    I'm very comfortable with Javascript so I'll try the solution you posted. Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.
  • It seems like this would be a pretty common workflow but maybe I'm missing something?
    It's just not, I'm afraid — you're the first person I can recall ever asking for this. Most people either have existing reference libraries that they import into Zotero from another program or just start saving items from the browser.
    Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.
    @bwiernik: I think @camjohnson was asking about the alternative approach I mentioned above where it loads JS on the page.

    JavaScript knowledge isn't particularly relevant here — this is all Zotero-specific code. But you'd want to use Zotero.HTTP.loadDocuments() instead of Zotero.HTTP.processDocuments(), and then, for safety, re-parse the document using DOMParser:

    var path = '/home/username/Desktop/urls.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);
    await Zotero.HTTP.loadDocuments(
    urls,
    async function (doc) {
    var parser = new DOMParser();
    var safeDoc = Zotero.HTTP.wrapDocument(
    parser.parseFromString(doc.documentElement.outerHTML, 'text/html'),
    doc.location.href
    );
    var translate = new Zotero.Translate.Web();
    translate.setDocument(safeDoc);
    var translators = await translate.getTranslators();
    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    return;
    }
    catch (e) {}
    }
    await ZoteroPane.addItemFromDocument(safeDoc);
    }
    )


    But I'd strongly recommend using the previous one first and only using this if there are pages that aren't working right. Note that loadDocuments() loads pages in parallel, so it isn't meant for running with many URLs at once and will likely crash Zotero and/or your computer if you do so.
  • Sorry for the misunderstanding.
  • Thanks for that, if I get some time I'll go through it but you're right that processDocuments seems to be doing fine. I've ended up with this code which is working well, I made all the calls synchronous and added some error catching.

    ```
    var path = '~/Desktop/temp/Exported Items.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);

    rv = ""

    for (var url of urls) {
    try {
    await Zotero.HTTP.processDocuments(
    url,
    async function(doc) {
    rv += "\n" + url + ": ";
    if (!doc) {
    rv += ("doc is null")
    return;
    }
    var translate = new Zotero.Translate.Web();
    translate.setDocument(doc);
    var translators = await translate.getTranslators();

    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    rv += "translated successfully "
    return;
    } catch (e) {
    rv += "translation failed with " + e
    }
    }
    await ZoteroPane.addItemFromDocument(doc);
    rv += "Succeeded"
    }
    )
    } catch (e) {
    rv += "\n" + url + (" Failed Download");
    }
    }

    return rv + ("\nAll Documents Processed")
    ```

    For the record I'm moving from Firefox bookmarks to Zotero for managing this content. I've looked at a bunch of tools including Polarized, Notion, Mendeley, Papers app, pinboard, Hypothes.is, Diigo, etc. but none of them have all the features I need. Zotero is the closest though so if a few of these kinks get ironed out this use case might be a source of new users.
  • edited 6 days ago
    I made all the calls synchronous
    FWIW, processDocuments() already loads URLs serially, so the original code I gave you should work even with many URLs. It's only loadDocuments() that loads URLs in parallel.
  • Oh good to know thanks, unfortunately can't remember the issue I was having with it but think my final code didn't change much.
Sign In or Register to comment.