Unable to bulk import a list of urls

camjohnson · August 11, 2019

I'm on the Zotero linux desktop client and I'm pulling my hair out trying to make this work. Basically all I want to do is import a list of urls into Zotero and have the metadata pulled from the page and a snapshot saved. If I drag and drop a bookmark from firefox this works perfectly.

The problem is that if I drag multiple items weird things start happening. All bookmarks have the same title, it only imports 1 and there's no title, or nothing happens and drag and drop is disabled until I exit the application.

I tried saving my url as a .html file and importing it, but Zotero didn't treat it as a webpage. I built an ris file by hand and imported that, but no metadata was pulled and no snapshot created. I looked for the "Add Webpage" option in the menu but there's no entry.

Is there any way to pull in a list of urls?

dstillman · August 11, 2019

This isn't a supported option, for a number of reasons.

You're really meant to add URLs from the browser, since those use your existing cookie state, including web-based proxy access and other site logins. While Zotero has the ability to save directly from URLs (which would be roughly equivalent to what you'd get when you save on zotero.org, barring IP-based subscription access), the results might not be as good, so it's not something we encourage when you have the Zotero Connector available.

This is the reason you can't, say, paste a list of URLs into Add Item by Identifier the way you can paste in a list of DOIs (which itself has the downside of not using any web-based proxy you have to download PDFs, though it will use an IP-based proxy). We could consider adding that for cases like yours, but I'd be concerned that some people might start to add individual URLs that way instead of loading them in the browser, which wouldn't be ideal.

If I drag and drop a bookmark from firefox this works perfectly.

It doesn't, actually. Dragging a URL will create a webpage item, but it won't run translation on the page to grab metadata. I'll fix that, but for the reasons above, it's still not really a recommended workflow.

That aside, dragging multiple URLs just seems to be poorly implemented across various browsers and applications. It's possible we could make that work from some browsers, but I'm not sure we could make it work reliably.

I looked for the "Add Webpage" option in the menu but there's no entry.

Because you're not meant to add webpages by hand. See the note here:

https://www.zotero.org/support/getting_stuff_into_your_library#manually_adding_items

That said, if you really want to do this, you can save a newline-separated list of URLs to a text file, open Tools → Developer → Run JavaScript, and run this code, after adjusting the path on the first line:

var path = '/home/username/Desktop/urls.txt';
var urls = Zotero.File.getContents(path).split('\n').map(url => url);
await Zotero.HTTP.processDocuments(
	urls,
	async function (doc) {
		var translate = new Zotero.Translate.Web();
		translate.setDocument(doc);
		var translators = await translate.getTranslators();
		if (translators.length) {
			translate.setTranslator(translators[0]);
			try {
				await translate.translate();
				return;
			}
			catch (e) {}
		}
		await ZoteroPane.addItemFromDocument(doc);
	}
)

This won't load JavaScript on the page before trying to save, so some pages might not save properly. (There's a way to do that, but it's more complicated and much slower.)

camjohnson · August 11, 2019

Thanks for the detailed explanation of what's happening here. My use case is that I have about 3000 web pages that I want to store in Zotero. None of them are paywalled or anything so I just want to save the raw page itself. It seems like this would be a pretty common workflow but maybe I'm missing something? I can't go to each of the 3000 pages individually in Firefox and manually save them.

I'm very comfortable with Javascript so I'll try the solution you posted. Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.

bwiernik · August 11, 2019

See here https://www.zotero.org/support/dev/client_coding/javascript_api#running_ad_hoc_javascript_in_zotero

dstillman · August 11, 2019

It seems like this would be a pretty common workflow but maybe I'm missing something?

It's just not, I'm afraid — you're the first person I can recall ever asking for this. Most people either have existing reference libraries that they import into Zotero from another program or just start saving items from the browser.

Could you point me to documentation of how to get it to run Javascript too? I think I can figure it out if I know where to look.

@bwiernik: I think @camjohnson was asking about the alternative approach I mentioned above where it loads JS on the page.

JavaScript knowledge isn't particularly relevant here — this is all Zotero-specific code. But you'd want to use Zotero.HTTP.loadDocuments() instead of Zotero.HTTP.processDocuments(), and then, for safety, re-parse the document using DOMParser:

var path = '/home/username/Desktop/urls.txt';
var urls = Zotero.File.getContents(path).split('\n').map(url => url);
await Zotero.HTTP.loadDocuments(
	urls,
	async function (doc) {
                var parser = new DOMParser();
                var safeDoc = Zotero.HTTP.wrapDocument(
                   parser.parseFromString(doc.documentElement.outerHTML, 'text/html'),
                   doc.location.href
                );
		var translate = new Zotero.Translate.Web();
		translate.setDocument(safeDoc);
		var translators = await translate.getTranslators();
		if (translators.length) {
			translate.setTranslator(translators[0]);
			try {
				await translate.translate();
				return;
			}
			catch (e) {}
		}
		await ZoteroPane.addItemFromDocument(safeDoc);
	}
)

But I'd strongly recommend using the previous one first and only using this if there are pages that aren't working right. Note that loadDocuments() loads pages in parallel, so it isn't meant for running with many URLs at once and will likely crash Zotero and/or your computer if you do so.

camjohnson · August 12, 2019

Thanks for that, if I get some time I'll go through it but you're right that processDocuments seems to be doing fine. I've ended up with this code which is working well, I made all the calls synchronous and added some error catching.

```
var path = '~/Desktop/temp/Exported Items.txt';
var urls = Zotero.File.getContents(path).split('\n').map(url => url);

rv = ""

for (var url of urls) {
try {
await Zotero.HTTP.processDocuments(
url,
async function(doc) {
rv += "\n" + url + ": ";
if (!doc) {
rv += ("doc is null")
return;
}
var translate = new Zotero.Translate.Web();
translate.setDocument(doc);
var translators = await translate.getTranslators();

if (translators.length) {
translate.setTranslator(translators[0]);
try {
await translate.translate();
rv += "translated successfully "
return;
} catch (e) {
rv += "translation failed with " + e
}
}
await ZoteroPane.addItemFromDocument(doc);
rv += "Succeeded"
}
)
} catch (e) {
rv += "\n" + url + (" Failed Download");
}
}

return rv + ("\nAll Documents Processed")
```

For the record I'm moving from Firefox bookmarks to Zotero for managing this content. I've looked at a bunch of tools including Polarized, Notion, Mendeley, Papers app, pinboard, Hypothes.is, Diigo, etc. but none of them have all the features I need. Zotero is the closest though so if a few of these kinks get ironed out this use case might be a source of new users.

dstillman · August 13, 2019

I made all the calls synchronous

FWIW, processDocuments() already loads URLs serially, so the original code I gave you should work even with many URLs. It's only loadDocuments() that loads URLs in parallel.

camjohnson · August 13, 2019

Oh good to know thanks, unfortunately can't remember the issue I was having with it but think my final code didn't change much.

janeckelly · September 3, 2019

I have a similar use case in which I would like to pull in large numbers of URLs to Zotero-- hundreds or more at a time. All I care about in this case is getting the metadata from each webpage. I'm using Zotero for Windows and this JavaScript isn't working for me. I'm getting an error about an unrecognized file path. Is different code required for Zotero in a Windows environment?

adamsmith · September 3, 2019

var path = '~/Desktop/temp/Exported Items.txt';
needs to be the filepath to your file with URLs.

jtondt · October 13, 2019

I'm using Zotero for Windows and having a problem running this JavaScript. I'm getting "SyntaxError: malformed Unicode character escape sequence". Is there something I need to change for Windows?

jtondt · October 16, 2019

Sorry for posting again, but I'm new to Zotero and not familiar with JavaScript. I am also trying to extract meta-data from a large number of URLs at once. I've tried using the code posted by dstillman (the first one, not the second) and the code posted by camjohnson, both with changing the var path to my designated file. In both cases, I get "SyntaxError: malformed Unicode character escape sequence" and no messages in the error console. The only difference I can think of is that I am using Windows instead of Linux. I've been trying to find the error but am having difficulties. Can someone at least point me in the direction of how to debug this?

dstillman · October 16, 2019

This isn't really about JavaScript — it just sounds like your file isn't valid UTF-8, either because it was exported in a different character set or because there are invalid characters. You'll need to fix the file.

dstillman · October 16, 2019

(If you know what character set it is (e.g., 'windows-1252'), you can also pass it as the second parameter to getContents().)

jtondt · October 16, 2019

Thanks for directing me towards the file. I've tried resaving the file as UTF-8 with a limited number of PumMed urls. I've also tried adding UTF-8 as the second parameter to getContents in the following way "var urls = Zotero.File.getContents(path, 'UTF-8').split('\n').map(url => url)". I am still having the same error message. Is there something I am doing wrong?

dstillman · October 16, 2019

It defaults to UTF-8 — you don't need to pass that. The point is that you can pass non-UTF-8 charsets if necessary.

As I say, it could also be due to invalid characters in the file, in which case the charset won't matter — and that's fairly likely, because nearly all URLs would generally be ASCII anyway, so other charsets generally shouldn't come into play. If you just paste a couple URLs into a new file, I think you'll find it works fine.

jtondt · October 16, 2019

I've tried pasting a few URLs into a new file (UTF-8 encoded) and am still getting the same error. It seems like it should be a simple fix, but I'm at a loss of what I'm doing wrong.

adamsmith · October 16, 2019

I'm pretty sure these are actually Windows line endings messing with you/the script. Try replacing split('\n') with split('\r\n')

jtondt · October 16, 2019

Thanks, adamsmith, that is a good idea. I just tried changing it but am still getting the same error.

jtondt · October 18, 2019

For anyone interested, I was able to work around the error by pasting the URLs into JavaScript instead of using a text file.

The final code looks like this (with actual URLs pasted into var path)

var path = "URL1,URL2,URL3";
var urls = path.split(',');
await Zotero.HTTP.processDocuments(
urls,
async function (doc) {
var translate = new Zotero.Translate.Web();
translate.setDocument(doc);
var translators = await translate.getTranslators();
if (translators.length) {
translate.setTranslator(translators[0]);
try {
await translate.translate();
return;
}
catch (e) {}
}
await ZoteroPane.addItemFromDocument(doc);
}
)

eldakka · December 31, 2019

@dstillman August 11, 2019

>> It seems like this would be a pretty common workflow but maybe I'm missing something?
> It's just not, I'm afraid — you're the first person I can recall ever asking for this.

I would expect this request to become more common.

A little over a year ago, Firefox removed a bookmark 'notes' feature, where one could add useful notes/info to bookmarks stored in Firefox.

In searching for a fix to restore this ability, or an alternative, I came across a recommendation for using Zotero as the bookmark library instead. Therefore I am looking at moving my existing browser bookmarks gathered over the years to Zotero.

So while I have no issue using the suggested javascript supplied here, I am just noting why you may get more interest for such a feature than previously.

stefanct · May 26, 2020

Since non-raised voices can't be heard here is one ;)

I would also like to import a list of URLs, or to be more precise, I want Zotero to visit each URL and mimic the behavior of pushing the "Save to Zotero" button. My overall workflow is to feed a JS file to Zotero via BBT's debug bridge that then reads the URLs from a file just as discussed above. I've been trying to use both "main" variants shown above but was only successful with processDocuments and not loadDocuments (yet).

My main question though is why all variants above have a return after translate.translate(). This does not make any sense to me because then the later ZoteroPane.addItemFromDocument is never executed (and thus nothing added to the database)?

I am nevertheless almost satisfied with the additional error handling camjohnson's version provides - one small thing I would like to check is that none of docs is of type web page (as this is most likely to be an error in my use case). Is there a way to do that (before inserting the result into the db, i.e. within the processDocuments processor)?

Also, is there any way to debug this code better? I have been using the Zotero.debug() and the debug output window so far but this is rather ineffective.

dstillman · May 26, 2020

My main question though is why all variants above have a return after translate.translate(). This does not make any sense to me because then the later ZoteroPane.addItemFromDocument is never executed (and thus nothing added to the database)?

They're different methods. One uses the translation architecture and one saves the page as a generic webpage. If the former succeeds, there's no need to use the latter.

dstillman · May 26, 2020

Also, is there any way to debug this code better? I have been using the Zotero.debug() and the debug output window so far but this is rather ineffective.

Zotero.debug() is the thing to use, but you can use a terminal window instead of the debug output window.

stefanct · May 26, 2020

Ah, that explanation regarding translation makes a lot of sense, thanks. The debugging via stdout is a small step in the right direction too, thank you. Not a game changer though... :)

ChillyWolf · July 9, 2020

Can't get to use this script
var path = '/home/username/Desktop/urls.txt';
var urls = Zotero.File.getContents(path).split('\n').map(url => url);
await Zotero.HTTP.processDocuments(
urls,
async function (doc) {
var translate = new Zotero.Translate.Web();
translate.setDocument(doc);
var translators = await translate.getTranslators();
if (translators.length) {
translate.setTranslator(translators[0]);
try {
await translate.translate();
return;
}
catch (e) {}
}
await ZoteroPane.addItemFromDocument(doc);
}
)
I'm getting Zotero is not defined. I'm trying to import multiple urls tried to import from Chrome history but didn't succed

dstillman · July 9, 2020

@ChillyWolf: Where exactly are you trying to run this? This is code that runs in the Run JavaScript window in Zotero.

ChillyWolf · July 12, 2020

@dstillman My bad I was trying to run it from Chrome's console, but now when I run it from Zotero JavaScript it returns this error
[Exception... "Component returned failure code: 0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH) [nsIFile.initWithPath]" nsresult: "0x80520001 (NS_ERROR_FILE_UNRECOGNIZED_PATH)" location: "JS frame :: chrome://zotero/content/xpcom/file.js :: Zotero.File</this.getContents :: line 159" data: no]

snipespy · July 20, 2020

@dstillman

I am also getting a very similar error:

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIFileInputStream.init]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://zotero/content/xpcom/file.js :: Zotero.File</this.getContents :: line 167" data: no]

and I implemented the same code.

adamsmith · July 20, 2020

People who are getting a FileInputStream error -- you are adjusting the filepath in the script to a file that exists, yes?

snipespy · July 23, 2020

@adamsmith

Yes I have tried many different ways to fix the file path error and I even included an absolute path to the text file, but I was still unable to fix the problem.

Is there a different way to indicate the filepath with Windows?

This is what is going into path url right now: "C:\Users\my_user_name\Desktop\text_file.txt"

The rest of the code is the same.