Ugly PDF automatically downloaded instead of correct one

goatwriter · November 1, 2020

Is there a way to get the right PDF to download? I used Zotero on this page:
https://www.sciencedirect.com/science/article/abs/pii/S1877343517301264
(I have access to the PDF because of my university proxy).
Zotero automatically downloaded a long, weirdly formatted PDF which had the same text but lost the layout and correct page numbering.
So I opened the proper PDF and saved that to Zotero, then did a merge. But Zotero chose to keep the ugly PDF!
I'm on Linux Mint. Any help appreciated. Thank you.

dstillman · November 1, 2020

Zotero automatically downloaded a long, weirdly formatted PDF which had the same text but lost the layout and correct page numbering.

Are you sure you're not just seeing a snapshot of the ScienceDirect webpage? The PDF saves fine for me on that page.

Can you provide a Debug ID from Zotero for a save attempt that produces the "long, weirdly formatted PDF"?

So I opened the proper PDF and saved that to Zotero, then did a merge. But Zotero chose to keep the ugly PDF!

This is some sort of misunderstanding. If you merge two items, Zotero doesn't delete any attachments. If you don't want one of the remaining attachments, you need to delete it manually.

goatwriter · November 1, 2020

Thanks. It's ID D746775073.
Two thoughts:
I notice it says "open-access PDF" in Zotero.
Although I can see the PDF download button fine on that page because of my proxy, do I also have to manually add the proxy info to Zotero?

dstillman · November 1, 2020

Ah, OK, so you're not getting the ScienceDirect PDF, so it's downloading an OA PDF instead.

To be clear, you're viewing the page at the exact URL above? No proxy info in the URL? And you can click the download button and view the PDF successfully?

Can you provide a Debug ID from the Zotero Connector for reloading the page and trying to save?

goatwriter · November 12, 2020

Finally had some time to play with this.

It's fine if I use this link from Google Scholar: https://www-sciencedirect-com.ezproxy.otago.ac.nz/science/article/pii/S1877343517301264 and the page mentions my university by name.

But sometimes Google Scholar offers the below link instead. This page does still have the link to the proper PDF, but doesn't mention my university and when I use Zotero it downloads the open-access PDF.
https://www.sciencedirect.com/science/article/pii/S1877343517301264?casa_token=7TKo9A1kUUUAAAAA:Rr4Oh6ZhtozZ7CFscd3jEgKrDr9mwJGZv5bBzsw8v1GF9My05PYUeGAz83Zp9X2smbvdv8dRUNmd

dstillman · November 12, 2020

Are you on campus?

It doesn't have anything to do with Google Scholar. The first of those is using your university's web-based proxy and the second is not. Unless you're on campus or using a VPN, I'm not sure how you'd have access to the PDF when not using the proxy — maybe some sort of cached session from previously connecting via the proxy.

Zotero should automatically redirect you through the proxy, though. If you go to the Proxies tab of the Zotero Connector preferences, do you see an entry for your university's proxy (similar to the first URL above), and do you see an entry for "www.sciencedirect.com" in the host section below?

goatwriter · November 12, 2020

I don't have any configured proxies in that tab. Should I set one up?

I'm off campus but I've logged in using https://ezproxy.otago.ac.nz/login?url=http://scholar.google.com. This is like a session, so I don't have to log in every time I search.

I can see the proper PDF using either link, but Zotero needs the first one to get the right PDF. Not sure why. I've used the same process for other sites and it's fine.

dstillman · November 12, 2020

You shouldn't need to set anything up — it should be detected automatically and prompt you when you log into the proxy. Is "Automatically detect new proxies" enabled in that pane?

dstillman · November 12, 2020

Without a proxy set up, Zotero won't be able to detect many sites that you access through the proxy, and you won't get high-quality metadata or a PDF. (Zotero's ScienceDirect detection is written to work even when the proxy isn't properly detected, but that's an exception.)

goatwriter · November 12, 2020

Yes. The first three boxes are ticked.

dstillman · November 12, 2020

You should be able to set up a proxy manually as %h.ezproxy.otago.ac.nz/%p with "Automatically associate new hosts" and "Automatically convert between dots and hyphens in proxied hostnames" checked, and then enter "www.sciencedirect.com" below. Not sure why auto-detection isn't working for you, though.

goatwriter · November 12, 2020

That seems to be working and I get a notification at the top of the page too.
Thanks for your help!

goatwriter · November 12, 2020

Not sure if this is the same problem but Zotero won't download the PDF from this jstor page even though I can access it manually.

https://www-jstor-org.ezproxy.otago.ac.nz/stable/27556807?casa_token=fHa05rtjFVEAAAAA:hBsxz8TuO3-v1pZP_HQMwGNJT4OulSgRAFVD6J6aAItO27BY8ZHURR58pvEFHZPALutagJxU-0pQH-ynnbDG2MRBDJyfcDJrUijuLgRu9E0cWBIKY7stdQ&seq=1#metadata_info_tab_contents

I've added www.jstor.org to the hostnames but do I also need to add www-jstor-org.ezproxy.otago.ac.nz ?

Or could it be the acceptance pop-up for PDF download. In which case I'd have to make a jstor account ...

dstillman · November 12, 2020

You really shouldn't need to be entering hosts manually. If you're running other browser extensions, you might want to try disabling them in case they're somehow interfering with Zotero's detection and auto-association of proxy and hosts.

In any case, only www.jstor.org would need to be in the hosts list.

You can tell if Zotero is properly detecting a site by hovering over the save button and looking at what it says in the tooltip.

On JSTOR, completely separate from proxy stuff, you need to view a PDF manually once and accept the terms and conditions. That doesn't require an account.

goatwriter · November 12, 2020

I see. It was just the acceptance thing. Having an account would stop me having to do that every day I'm researching online. Thanks.