CiteULike personal PDF download

Marcin Zalewski · October 16, 2008

I have just tried Zotero and I think it is great! I have a large collection of articles on CiteULike that I would like to import and it seems to work quite OK except that my PDFs are not downloaded. I have tens of PDFs and it would be a major pain to have to download each manually and then attach it to the library item imported from CiteULike. Is there some way to get PDFs imported?

arggem · October 16, 2008

Is "Automatically attach associated PDFs and other files when saving items" checked in the Preferences window?

To find out, click the gear icon and then the "General" tab.

Marcin Zalewski · October 17, 2008

Yes, the checkbook is checked. I tried to uncheck it and then the links attached to items on CiteULike are not saved in Zotero. After rechecking the box I get the links again but the PDFs are still not imported.

Marcin Zalewski · October 17, 2008

I also noticed another glitch. I was trying to import an entry from CiteULike that has the following bibtex record:

@phdthesis{Tracz:1997:PHD,
author = {Tracz, William J. },
citeulike-article-id = {3385689},
keywords = {algebraic-specification, architecture},
month = {March},
posted-at = {2008-10-08 10:04:23},
priority = {2},
school = {Stanford University},
title = {Parameterized Programming in LILEANNA},
year = {1997}
}

The import worked fine except that the "school" key was not imported: the Zotero entry shows blank university.

Marcin Zalewski · October 17, 2008

In response to myself, I think that the problem is with CiteULike's RIS record that does not list school for this thesis. Apparently, Zotero performs the import from the RIS record and it does not get the school value since it is simply not there.

I think that for CiteULike a different translator is necessary. It would be best to import the bibtex since it seems more complete than CiteULike's URI. CiteULike makes you go through a form to get a bibtex file so I guess it is a bit harder than getting the Endnote record. But to make the import complete, the HTML page of the CiteULike article needs to be examined itself and any extra URLs and the PDF files have to be imported from there. I would implement it in a jiffy if I was fluent with javascript and Zotero. As it is, I hope that someone else can have a look.

rdiaz02 · July 23, 2010

Has there been any progress on these issues? I also have > 1500 entries in CiteULike, and I thought that RIS would be a last resort for getting the PDFs into Zotero, but that did not help either.

Marcin Zalewski · July 23, 2010

http://forums.zotero.org/discussion/4469/improved-citeulike-import/

This is something I have done a long time ago, and, as far as I can remember, it mostly worked. I still have plenty of entries from CiteULike with the original PDFs. At this point, I can't walk you through the process (I don't remember anything), but you can look up how to modify imports in Zotero, and see if my patch still applies.

rdiaz02 · July 24, 2010

Thanks for the answer. In fact, the first thing I tried was the built in import, and then your fix, but I could not get it work. I'll try again (and report).

rdiaz02 · August 9, 2010

I tried again, with no luck: when I use your code in the CiteULike.js importer, I get the usual "An error occured while saving this item ...". That means I cannot import either one by one or by using the folder icon. I guess it must be something in the js, but I know nothing about Javascript. I'll see if I can find a workaround.

Marcin Zalewski · August 9, 2010

Something to try could be to install an old version of Zotero, the one that was around about the time I posted the code. If my patch would work with the old version, you could import your entries and then upgrade. I am not sure though how would an upgrade from an ancient version work.

rdiaz02 · August 9, 2010

Aha. OK, that could be a good way to go. Before I get going, and even if you do not remember the details, do you recall having to go page by page in CiteULike and clicking the "folder icon"? or did you import all go? I am assuming you did not go entry by entry.

dstillman · August 9, 2010

Translators are individual files in SVN/Trac. There's certainly no reason to install an old version of Zotero to get an earlier version of a translator, and it's not something you would want to do.

ajlyon · August 9, 2010

I think that the idea here is that the translator only worked with an older version of Zotero. That said, it'd be better to modify the translator to work with today's Zotero-- not that much has changed, so the modifications should be minimal.

In fact, I don't see why any of the changes in Zotero in the last several years should prevent that version from working. Install Scaffold 2.0 in your up-to-date Zotero installation and give the modified translator a try. If it doesn't work, it's more likely because of a change in CiteULike behavior than in Zotero behavior.

rdiaz02 · August 11, 2010

Dan: yes, ajlyon is correct about the intentions.

ajlyon: I see your suggestion makes a lot of sense. However, I know close to no Javascript, nor anything about CiteULike's detailed behavior. Learning JS just for this was not an appealing idea to me. So, in the mean time, I found a solution which is a very ugly kludge, but it worked. In case it could help anybody, I am posting it here.

The basic idea is to use SyncUThink (http://www.andrewberman.org/projects/sync/), which, among other things, downloads all the pdfs one has in CiteULike. So all that we need to do is modify the bibtex file from CiteULike so that it includes the full path to the local pdf file downloaded. For this modification we need to find a way to easily match the citeulike bibtex records to the pdf file names (without digging in the Java code for SyncUThink nor learning about CiteULike's structures). So this is what I did. Again, be warned this is an ugly kludge.

- Use the standalone version of SyncUthink (download from link above).

- If run from an Emacs (shell) buffer, the correspondence between CiteULike IDs and pdfs is clear there. Save that buffer and call it "emacs.buffer.txt".

- Leave only the lines with interesting stuff from that buffer (one per record)

grep Downloading emacs.buffer.txt > tmp1

- The CiteULike ID is the third from last field (separator is "/"). Get
just the ID and the pdf file name, so we can see the correspondence between
CiteULike's ID and file name in a two-column file

awk -F"/" '{print $9, $11}' tmp1 > id.and.pdf

- Now we only have to match the ID in the bibtex file with the pdf and, for the sake
of simplicity, substitute the "citeulike-article-id " by
"pdf ", so we can import the bibtex file in Zotero. I do it in R.

### What follows is all done from within R

id.pdf <- read.table(file = "id.and.pdf", header = FALSE)
common.dir <- "/home/ramon/CUL3-pdf/" ## where the pdfs live
bib <- readLines("rdiaz.bib") ## the file where SynkUThink stored your bib
pos.to.check <- grep("citeulike-article-id", bib) ## lines that have the IDs

get.the.pdf.path <- function(z, id.pdf, common.dir) {
id <- as.numeric(strsplit(strsplit(z, "\\{")[[1]][2], "\\}")[[1]][1])
pos.id <- match(id, id.pdf[, 1])
if(!is.na(pos.id))
return(paste("pdf = {", common.dir, id.pdf[pos.id, 2], "},", sep = ""))
else
return(NULL)
}

## find the correspondence ID -> pdf and, if existing, substitute the field
for(i in pos.to.check) {
tpdf <- get.the.pdf.path(bib[i], id.pdf, common.dir)
if(!is.null(tpdf))
bib[i] <- tpdf

}
## the file "bibwithpdfs.bib" can be imported into Zotero
writeLines(bib, "bibwithpdfs.bib")

########## We are done #######

ajlyon · August 11, 2010

It's always easier to work from an environment that's familiar, so I'm glad you found an efficient way to do this in R + awk. Probably not the first thing that comes to mind for general scripting -- but it works!

It would be nice to improve CiteULike support, and perhaps it will happen one day.