Zotero Connector not saving PDFs from HeinOnline

Kitsunegari · May 7, 2021

When I try to download pdf articles from HeinOnline, it doesn't get the full text pdf, just shows a red X next to "Full Text PDF". I can download the PDF separately and drag it into Zotero, but when I click the connector button in Chrome, it thinks for a moment and then the text turns red with the red X icon - Debug output is D1770913059. This seems like the same problem that user "hbwhbwhbw" posted about a couple of days ago. It has been happening for me for about the same amount of time.

adamsmith · May 7, 2021

Could we get a sample URL where that happens exactly as you see it?

Kitsunegari · May 7, 2021

Hey - thanks for getting back to me. Sure thing - it has been happening on every article on Hein that I try to save through the connector for the last few days - here's an example:

https://heinonline-org.ezproxy.lib.ucalgary.ca/HOL/Page?public=true&handle=hein.journals/crmcj12&div=23&start_page=239&collection=journals&set_as_cursor=0&men_tab=srchresults

I access it through my university's library for access, but that's how I've always used Hein and it worked until a couple days ago.

The permalink to the above page is:
https://heinonline.org/HOL/P?h=hein.journals/crmcj12&i=226&a=dWNhbGdhcnkuY2E

Let me know if there's anything else I can do to help troubleshoot.

adamjtramp · May 13, 2021

Hello,
I'm with HeinOnline, can reproduce the problem, but not sure whether anything changed on our end to cause this issue. If there is any insight Zotero can provide that I can pass along to our dev team, I'll be happy to do so.

adamsmith · May 13, 2021

Thanks for being in touch. This is due to a change on your end, yes. The PDF download button leads to an intermediate (invisible to the user) page, which contains the actual URL in a metatag that triggers refresh to that page (which then contains the PDF):

<META HTTP-EQUIV="Refresh" CONTENT="0; URL=" PDFsearchable?handle=hein.journals/crmcj12&collection=journals&section=23&id=&print=section&sectioncount=1&ext=.pdf&nocover=&display=0">

The space at the beginning of the URL is breaking Zotero import of the PDF. We can work around this easily enough, but I'm also wondering why it's there?

adamjtramp · May 14, 2021

Thanks, our dev team is looking into this.

adamjtramp · May 26, 2021

We were able to remove the space, but now it seems the PDF isn't downloading at all, instead we receive a 'print.html' file. Any advice you can offer would be greatly appreciated.

<html>

	<head>


	<title>Redirecting...</title>

	


	<script>function sleep(millis,callback){setTimeout(function(){callback();},millis);}function foobar_cont(){window.close();};sleep(25000,foobar_cont);</script>
	<script type="text/javascript">window.location="PDFsearchable?handle=hein.aallar/spectrum0025&collection=journals&section=18&id=&print=section&sectioncount=1&ext=.pdf&nocover=&display=0";</script>
	<META HTTP-EQUIV="Refresh" CONTENT="0; URL='PDFsearchable?handle=hein.aallar/spectrum0025&collection=journals&section=18&id=&print=section&sectioncount=1&ext=.pdf&nocover=&display=0'">



</head>
	<body>
		Please wait while your request is being processed.  Due to the size of the requested file, the download may take a few minutes to complete.
	<br><br>
	<span id="newlink" name="newlink"></span>
	</body>
	</html>

kavana · May 31, 2021

I'm also having this problem while using the Juris-M connector. It attaches a link to print the document, rather than the document itself. Thanks for looking into this.

ninasch · June 1, 2021

I'm still not able to grab PDFs from HeinOnline.

adamjtramp · June 1, 2021

This should be resolved now.

dstillman · June 1, 2021

@adamjtramp: What did you change? It looks like the translator is looking for a double-quote after the URL=:

var m = pdfPage.match(/<META.*URL="([^"]+)/);

We can change that to accept any of single, double, or no quotes, if you prefer. (No quotes seems to be the standard recommendation, these days.) Otherwise I would think you would need to use single quotes for the CONTENT parameter and a double quote for the URL (which maybe is what you did).

dstillman · June 2, 2021

<META HTTP-EQUIV="Refresh" CONTENT="0" URL="PDFsearchable?handle=hein.journals/alterlj18&collection=journals&section=22&id=&print=section&sectioncount=3&ext=.pdf&nocover=&display=0">

@adamjtramp: It looks like this line is actually now incorrect HTML.

URL is a parameter for the CONTENT attribute. It's not an attribute itself. The only reason the redirect would be working at all now is because there's also a JS redirect on the page. Otherwise it would just reload the current page, due to the absence of a URL.

I assume you changed this for compatibility with the translator, which was looking for double-quotes after URL=, but I believe that was for a previous version of the site that used single-quotes for URL. The proper fix here is for you to move the URL back into CONTENT and just remove the quotes altogether, as per spec, and we'll update the translator to properly use the URL parameter.

dstillman · June 2, 2021

@adamjtramp: We've updated the translator to handle both the current form and the correct form, so you can update this whenever you want and the translator should continue to work.