Converting linked items to URL field?

ttamm · April 13, 2014

I am in the process of migrating to Zotero and have been able to successfully import most of my data. However, I have encountered one problem that I'm finding difficult to solve (except for manually, but this is not an option due to the size of the library).

Namely, most of the URLs to journal articles have carried across to Zotero as linked files (containing only the URL), while the Zotero URL field for these records remains empty. I have tried to search for a solution on the forums but haven't been able to find one specific to this problem. So my question is:

Would there be any way to import the contents of these linked items to the Zotero URL field?

Considering that the only data contained in these linked items is the URL, no prior data conversion would be necessary. If it is not possible in Zotero directly, but would be possible using a grep-capable text editor on an export, or an sqlite front-end on the Zotero DB, for example, I'd be comfortable using this approach as well. As long as someone could help me to figure out exactly what I need to do :)

I have also already found a solution how to delete all of these linked items afterwards (using a saved search in Zotero). Indeed, I could remove these attachments without the conversion as they are not essential from the point of view of generating bibliography entries. But I would highly prefer to keep this data if possible.

I would be most grateful for any recommendations.

fbennett · April 13, 2014

Others may have something to say about your use case, but as far as batch editing of the Zotero records goes, this is probably what you are looking for:

https://www.zotero.org/support/dev/client_coding/javascript_api#examplebatch_editing

You'll want to explore the Zotero functions that control attachments and field content further. It should be perfectly possible to transfer the link attachment URL to the item URL field and delete the attachment in one go for each item, if that's the effect that you want.

ttamm · April 13, 2014

fbennett, I'm most grateful for your helpful and swift response!

Yes, "transfer the link attachment URL to the item URL field and delete the attachment in one go for each item" would be exactly the effect I am after. And judging by the support link you provided above, I concur that this indeed looks most likely to be possible. There seem to be very powerful ways to access and modify the Zotero DB using that method. Whether I can figure it out with my (very limited) coding skills is a different question though :)

I will carefully review and start experimenting based on the linked material. However, if anyone on this forum happens to have a code snippet or an alternative ("for dummies") approach they have used to solve a similar scenario, I would be most grateful if you could post it here.

adamsmith · April 13, 2014

While the javascript API is certainly the most sophisticated way to go, it's not exactly low tech.
My take would be that doing this is easiest on import - so if you haven't done much work with your Zotero library, I'd edit the file you imported.
Which format did you use and could you post that for a single item that imported with a link (rather than a URL)?

ttamm · April 14, 2014

A "low tech" solution would indeed be great. Unfortunately though, I have already done quite a bit of data cleanup in Zotero and added quite a few new entries. So going back to the original import file is unfortunately not an option.

However, might it be possible to export from Zotero, do the cleanup, and then re-import? If so, which of the various export formats (RDF, RIS, BIB, EndNote XML, etc.) might be best to use? My guess is that Zotero RDF would be best at preserving data, but it seems doing the URL conversions could be tricky using this format as they seem to be deeply embedded within Attachment tags, e.g.:

<z:Attachment rdf:about="#item_1466">
<z:itemType>attachment</z:itemType>
<dc:title>isre.3.1</dc:title>
<dcterms:dateSubmitted>2014-04-07 04:05:41</dcterms:dateSubmitted>
<dc:identifier>
<dcterms:URI>
<rdf:value>http://isr.journal.informs.org/cgi/doi/10.1287/isre.3.1.60</rdf:value>;
</dcterms:URI>
</dc:identifier>
<link:type>text/html</link:type>
</z:Attachment>

adamsmith · April 14, 2014

Zotero RDF would indeed be quite hard to do automatically without coding (you'd essentially need to write an XST conversion, search&replace won't do)
The easiest would be RIS. It's not entirely loss-less depending on the format you choose, but unless you have a lot of very unusual formats it should work nicely. There the link appears after "L2 - " - simply replace that with "UR - " and re-import. Start with a small batch to make sure you're happy with the results.

ttamm · April 16, 2014

Thank you for this excellent suggestion! After tinkering around with the RIS export file in TextWrangler for a bit, I was able to not only achieve the above-discussed but also to do quite a bit of additional data cleanup.

It would be absolutely great if grep-like global find&replace functionality would be built into Zotero itself, but working with the export works well as an alternative. The only thing I managed to loose were collections. Probably should have added keywords for those but discovered too late and didn't want to redo all the data conversion again.

Just as a thought, perhaps it would even be worthwhile adding an automated way of exporting collections to the export in a future release of Zotero? The simplest way would seem to be introducing a third checkbox to the export window (in addition to notes and attachments), which could trigger auto-tagging the entries with keywords with a special prefix, e.g., coll_collectionname. (At least this is the approach I would have probably used for doing this manually - if I had realised the need on time.)

Many thanks again! I have now got Zotero very nicely up and running, and am really enjoying using the application (in combination with ZotFile and ZoteroQuicklook).

adamsmith · April 16, 2014

some version of search&replace is panned - I'm not sure if it's going to allow regex but that's a topic for another day.

Collection export is always a problem - too many checkboxes aren't good and there are already a bunch on export (bibtex has 3 or 4 I think) and we'd prefer not to mix tags and collections on export. In Zotero RDF, collections are exported.

ffsammak · May 15, 2014

the RIS format export is a good one, but how to do it right? I exported in ris format with notes and attachments all 1000 refs, then created a new collection to be the target, did import after doing the replacement, but ended up with duplicates, I think L2 if pdf attachment then would be lost in the process, so how to do it efficiently?
I hope that someone could provide a low-tech solution, a snippet code how to do it in batch processing: URL link attachments to the URL field.

aurimas · May 15, 2014

then created a new collection to be the target

If you're doing this correctly, the import itself will create a new collection. If you're able to import directly into a collection you created, you're doing something wrong.

did import after doing the replacement, but ended up with duplicates

Did you delete your old data? Obviously you should make a database backup before starting such a task in general.

I think L2 if pdf attachment then would be lost in the process

Yes, L2 is not for PDF attachments. That's for web links/HTML attachments (including snapshots). You want to keep PDFs as L1 (they should be exported that way to start with).

a snippet code how to do it in batch processing: URL link attachments to the URL field.

It's really as simple as replacing L2 with UR. The only catch is that if you have HTML attachments, like snapshots, that you want to preserve, you need to differentiate them. So using a text editor that supports regexp (e.g. EditPad) you could search for ^L2(\s*-\s*http://) and replace with UR$1

ffsammak · May 15, 2014

The problem with RIS export/import is that it does not preserve the HTML tags in the title, which is a major issue. Is there any way to preserve the HTML tags?

thibaultdepremo · May 19, 2021

This code snippet can get URL stored in a web attachment back into the URL field:

https://github.com/retorquere/zotero-better-bibtex/issues/1765#issuecomment-839600482