Unable to bulk import a list of urls

I'm on the Zotero linux desktop client and I'm pulling my hair out trying to make this work. Basically all I want to do is import a list of urls into Zotero and have the metadata pulled from the page and a snapshot saved. If I drag and drop a bookmark from firefox this works perfectly.

The problem is that if I drag multiple items weird things start happening. All bookmarks have the same title, it only imports 1 and there's no title, or nothing happens and drag and drop is disabled until I exit the application.

I tried saving my url as a .html file and importing it, but Zotero didn't treat it as a webpage. I built an ris file by hand and imported that, but no metadata was pulled and no snapshot created. I looked for the "Add Webpage" option in the menu but there's no entry.

Is there any way to pull in a list of urls?
  • edited August 13, 2019
    This isn't a supported option, for a number of reasons.

    You're really meant to add URLs from the browser, since those use your existing cookie state, including web-based proxy access and other site logins. While Zotero has the ability to save directly from URLs (which would be roughly equivalent to what you'd get when you save on zotero.org, barring IP-based subscription access), the results might not be as good, so it's not something we encourage when you have the Zotero Connector available.

    This is the reason you can't, say, paste a list of URLs into Add Item by Identifier the way you can paste in a list of DOIs (which itself has the downside of not using any web-based proxy you have to download PDFs, though it will use an IP-based proxy). We could consider adding that for cases like yours, but I'd be concerned that some people might start to add individual URLs that way instead of loading them in the browser, which wouldn't be ideal.
    If I drag and drop a bookmark from firefox this works perfectly.
    It doesn't, actually. Dragging a URL will create a webpage item, but it won't run translation on the page to grab metadata. I'll fix that, but for the reasons above, it's still not really a recommended workflow.

    That aside, dragging multiple URLs just seems to be poorly implemented across various browsers and applications. It's possible we could make that work from some browsers, but I'm not sure we could make it work reliably.
    I looked for the "Add Webpage" option in the menu but there's no entry.
    Because you're not meant to add webpages by hand. See the note here:

    https://www.zotero.org/support/getting_stuff_into_your_library#manually_adding_items

    That said, if you really want to do this, you can save a newline-separated list of URLs to a text file, open Tools → Developer → Run JavaScript, and run this code, after adjusting the path on the first line:

    var path = '/home/username/Desktop/urls.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);
    await Zotero.HTTP.processDocuments(
    urls,
    async function (doc) {
    var translate = new Zotero.Translate.Web();
    translate.setDocument(doc);
    var translators = await translate.getTranslators();
    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    return;
    }
    catch (e) {}
    }
    await ZoteroPane.addItemFromDocument(doc);
    }
    )


    This won't load JavaScript on the page before trying to save, so some pages might not save properly. (There's a way to do that, but it's more complicated and much slower.)
  • Thanks for the detailed explanation of what's happening here. My use case is that I have about 3000 web pages that I want to store in Zotero. None of them are paywalled or anything so I just want to save the raw page itself. It seems like this would be a pretty common workflow but maybe I'm missing something? I can't go to each of the 3000 pages individually in Firefox and manually save them.

    I'm very comfortable with Javascript so I'll try the solution you posted. Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.
  • It seems like this would be a pretty common workflow but maybe I'm missing something?
    It's just not, I'm afraid — you're the first person I can recall ever asking for this. Most people either have existing reference libraries that they import into Zotero from another program or just start saving items from the browser.
    Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.
    @bwiernik: I think @camjohnson was asking about the alternative approach I mentioned above where it loads JS on the page.

    JavaScript knowledge isn't particularly relevant here — this is all Zotero-specific code. But you'd want to use Zotero.HTTP.loadDocuments() instead of Zotero.HTTP.processDocuments(), and then, for safety, re-parse the document using DOMParser:

    var path = '/home/username/Desktop/urls.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);
    await Zotero.HTTP.loadDocuments(
    urls,
    async function (doc) {
    var parser = new DOMParser();
    var safeDoc = Zotero.HTTP.wrapDocument(
    parser.parseFromString(doc.documentElement.outerHTML, 'text/html'),
    doc.location.href
    );
    var translate = new Zotero.Translate.Web();
    translate.setDocument(safeDoc);
    var translators = await translate.getTranslators();
    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    return;
    }
    catch (e) {}
    }
    await ZoteroPane.addItemFromDocument(safeDoc);
    }
    )


    But I'd strongly recommend using the previous one first and only using this if there are pages that aren't working right. Note that loadDocuments() loads pages in parallel, so it isn't meant for running with many URLs at once and will likely crash Zotero and/or your computer if you do so.
  • Sorry for the misunderstanding.
  • Thanks for that, if I get some time I'll go through it but you're right that processDocuments seems to be doing fine. I've ended up with this code which is working well, I made all the calls synchronous and added some error catching.

    ```
    var path = '~/Desktop/temp/Exported Items.txt';
    var urls = Zotero.File.getContents(path).split('\n').map(url => url);

    rv = ""

    for (var url of urls) {
    try {
    await Zotero.HTTP.processDocuments(
    url,
    async function(doc) {
    rv += "\n" + url + ": ";
    if (!doc) {
    rv += ("doc is null")
    return;
    }
    var translate = new Zotero.Translate.Web();
    translate.setDocument(doc);
    var translators = await translate.getTranslators();

    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    rv += "translated successfully "
    return;
    } catch (e) {
    rv += "translation failed with " + e
    }
    }
    await ZoteroPane.addItemFromDocument(doc);
    rv += "Succeeded"
    }
    )
    } catch (e) {
    rv += "\n" + url + (" Failed Download");
    }
    }

    return rv + ("\nAll Documents Processed")
    ```

    For the record I'm moving from Firefox bookmarks to Zotero for managing this content. I've looked at a bunch of tools including Polarized, Notion, Mendeley, Papers app, pinboard, Hypothes.is, Diigo, etc. but none of them have all the features I need. Zotero is the closest though so if a few of these kinks get ironed out this use case might be a source of new users.
  • edited August 13, 2019
    I made all the calls synchronous
    FWIW, processDocuments() already loads URLs serially, so the original code I gave you should work even with many URLs. It's only loadDocuments() that loads URLs in parallel.
  • Oh good to know thanks, unfortunately can't remember the issue I was having with it but think my final code didn't change much.
  • I have a similar use case in which I would like to pull in large numbers of URLs to Zotero-- hundreds or more at a time. All I care about in this case is getting the metadata from each webpage. I'm using Zotero for Windows and this JavaScript isn't working for me. I'm getting an error about an unrecognized file path. Is different code required for Zotero in a Windows environment?
  • var path = '~/Desktop/temp/Exported Items.txt';
    needs to be the filepath to your file with URLs.
  • I'm using Zotero for Windows and having a problem running this JavaScript. I'm getting "SyntaxError: malformed Unicode character escape sequence". Is there something I need to change for Windows?
  • Sorry for posting again, but I'm new to Zotero and not familiar with JavaScript. I am also trying to extract meta-data from a large number of URLs at once. I've tried using the code posted by dstillman (the first one, not the second) and the code posted by camjohnson, both with changing the var path to my designated file. In both cases, I get "SyntaxError: malformed Unicode character escape sequence" and no messages in the error console. The only difference I can think of is that I am using Windows instead of Linux. I've been trying to find the error but am having difficulties. Can someone at least point me in the direction of how to debug this?
  • This isn't really about JavaScript — it just sounds like your file isn't valid UTF-8, either because it was exported in a different character set or because there are invalid characters. You'll need to fix the file.
  • edited October 16, 2019
    (If you know what character set it is (e.g., 'windows-1252'), you can also pass it as the second parameter to getContents().)
  • Thanks for directing me towards the file. I've tried resaving the file as UTF-8 with a limited number of PumMed urls. I've also tried adding UTF-8 as the second parameter to getContents in the following way "var urls = Zotero.File.getContents(path, 'UTF-8').split('\n').map(url => url)". I am still having the same error message. Is there something I am doing wrong?
  • It defaults to UTF-8 — you don't need to pass that. The point is that you can pass non-UTF-8 charsets if necessary.

    As I say, it could also be due to invalid characters in the file, in which case the charset won't matter — and that's fairly likely, because nearly all URLs would generally be ASCII anyway, so other charsets generally shouldn't come into play. If you just paste a couple URLs into a new file, I think you'll find it works fine.
  • edited October 16, 2019
    I've tried pasting a few URLs into a new file (UTF-8 encoded) and am still getting the same error. It seems like it should be a simple fix, but I'm at a loss of what I'm doing wrong.
  • I'm pretty sure these are actually Windows line endings messing with you/the script. Try replacing split('\n') with split('\r\n')
  • Thanks, adamsmith, that is a good idea. I just tried changing it but am still getting the same error.
  • For anyone interested, I was able to work around the error by pasting the URLs into JavaScript instead of using a text file.

    The final code looks like this (with actual URLs pasted into var path)

    var path = "URL1,URL2,URL3";
    var urls = path.split(',');
    await Zotero.HTTP.processDocuments(
    urls,
    async function (doc) {
    var translate = new Zotero.Translate.Web();
    translate.setDocument(doc);
    var translators = await translate.getTranslators();
    if (translators.length) {
    translate.setTranslator(translators[0]);
    try {
    await translate.translate();
    return;
    }
    catch (e) {}
    }
    await ZoteroPane.addItemFromDocument(doc);
    }
    )
Sign In or Register to comment.