What makes automatic downloading of PDF possible?

tesolchina · November 16, 2010

One useful feature of Zotero is that, for some database (e.g. JSTOR), while we save the citation, the PDF file will be downloaded atomatically. I am curious how the magic works behind the scene. How come it works for some databases but not for others? I am wondering if it has anything to do with the proprietary nature of some of database or how the metadata of the files in the database is structured.

Currently, I am looking into the Open Access Movement where people are trying to make more research literature freely accessible on the web. In particular, they are trying to promote the Interoperability of the content for more efficient dissemination. http://www.openarchives.org/ Is there any connection between the issue of PDF automatic downloading and the open archive initiative?

Many thanks~~
Simon

adamsmith · November 16, 2010

This is only somewhat connected to open access:
The main reason that this sometimes works and sometimes doesn't is the structure of the database and the nature of the translator.
However, as some commercial database vendors have become more and more protective of their pdfs, translators that download pdfs are harder and at times impossible to write (I believe Cambridge U press is such an example).

That's obviously not the case where articles are openly available anyway - for example, already all articles linked to google scholar entries directly are downloaded automatically.
Any open access site/database that wants to have pdfs downloaded into Zotero can do so even without a dedicated translator by using unApi
http://www.zotero.org/support/dev/make_your_site_zotero_ready

adamsmith · November 16, 2010

Somewhat relatedly, one of Zotero's directors has recently blogged about the general connection between Zotero and Open Access:
http://quintessenceofham.org/2010/11/12/zotero-and-open-access/

ajlyon · November 16, 2010

PDF downloading depends on the site translator that Zotero uses. We try to write translators to save full text in all cases when we can reliably fetch it from the site. This doesn't actually depend so much on whether the site is proprietary -- it has more to do with the structure of the site.

Sites that provide article data in well-defined standard formats and use standard ways of referring to full text are going to be in better shape for any data sharing, interoperability initiative. I know very little about the Open Archives movement, but standards help. Sites that use their own ways of describing content are bad players for Zotero and bad players for other initiatives. It is important to note, however, that being proprietary is not the core problem-- open and free databases can have poorly structured data that doesn't conform to standards, and proprietary databases can have high-quality data in useful standard formats.