Retrieving PDFs via Zotero Connector works, but Find Available PDF in Zotero Standalone rarely does

squarebottle · November 23, 2019

When I visit the website of an academic journal, Zotero Connector recognizes it and automatically logs me into my school's proxy before downloading the PDFs. It's absolutely fantastic.

Zotero Standalone, however, doesn't seem to be so intelligent. Find Available PDF only works if the paper is available via open access or if Google Scholar finds a direct download link. This is quite rare, in my experience.

I feel like I'm overlooking something obvious. I thought that the key would be to figure out out my school's OpenURL resolver, which took my forever to do but I finally managed:

> https://getit.library.nyu.edu/resolve/

Unfortunately, that doesn't seem to make a difference for retrieving the PDFs. Imagine my disappointment. So, onward I trudged.

My next step was to scrutinize the URL as it appears in my browser after Zotero Connector logs me into my school's proxy. Here is an example:

> https://www-mitpressjournals-org.libproxy.newschool.edu/doi/10.1162/074793601750357196

Sure enough, this matches what I see in Zotero Connector Preferences>Proxies:

> %h.libproxy.newschool.edu/%p

Back in Zotero Standalone, I scoured the preferences for anything that might be able to make use of that. Sadly, I haven't found anything.

Is there a kind soul who might be able to tell me how to get my Zotero Standalone to use my school's proxy? I've spent hours searching for how to do this in addition to tinkering away on my own, but I'm completely stumped.

dstillman · November 23, 2019

Find Available PDF can only use institutional subscriptions when you're on campus or connected via a VPN. It doesn't use web-based proxies.

From the announcement blog post:

Zotero can also now take better advantage of PDFs available via institutional subscriptions. When you use “Add Item by Identifier” or “Find Available PDF”, Zotero will load the page associated with the item’s DOI or URL and try to find a PDF to download before looking for OA copies. This will work if you have direct or VPN-based access to the PDF. If you use a web-based proxy, only open-access PDFs will be automatically retrieved using this new functionality, but you can continue to save items with gated PDFs from the browser using the Zotero Connector.

It's possible we'll add the ability to log in to web-based proxies in Zotero itself in a future version, but it's a fairly minor limitation. If you use a web-based proxy, you just save via the Zotero Connector, and it should automatically attach a PDF if one is available. If one isn't, it will look for open-access PDFs. So this would only be relevant when using Add Item by Identifier or when trying to find PDFs for an existing metadata library or an import via RIS/BibTeX/etc. that didn't include files. For the latter cases, the solution would be to use Find Available PDFs on all the new items when on campus or connected via a VPN.

squarebottle · November 23, 2019

"Fairly minor limitation?" It's not minor to me! The reason I've spent so many hours trying to get this to work is because I was trying to *save* myself from far, far more hours of tedious work! Sigh. If anybody needs me, I'll by sobbing fairly minor tears all by my fairly minor self.

All fairly minor cathartic grumbling aside, Zotero is a great tool. The truth is that I'm spoiled by it. I really am grateful for everything it does, and I don't want to imagine doing my thesis without it. So, thanks for your work! :)

dstillman · November 23, 2019

The reason I've spent so many hours trying to get this to work is because I was trying to *save* myself from far, far more hours of tedious work!

Why, though? Can you explain how you ended up in a situation where this is a major issue? What I'm trying to explain is that, in normal usage of Zotero, it really shouldn't matter that much.

squarebottle · November 23, 2019

I use Zotero's RSS feature to track journals relevant to my field. When I see any articles that sound interesting, I add it to my library to take a closer look. I think this is the normal usage of the RSS feature.

Over time, I've amassed quite a collection like this. I love how easy the RSS feature makes it for me to stay up to date with my field's journals, and I love how easy it is for me to add articles from the RSS feeds to my library. The problem is that when I actually want to take a closer look at the articles, I have to manually fetch them. If there was only one PDF for me to fetch, then that wouldn't be an issue. But in what I expect is ordinary human behavior, I end up saving lots of articles to my library at a time and then putting off the task of fetching the PDFs.

So, like I said, Zotero has spoiled me. If the RSS feature was taken away, then I'd have to manually keep up to date with the journals and manually compile the lists of interesting papers. What takes me seconds with the RSS feature would take me hours without it, and it's something I have to do relatively often. On top of that extra work, I'd probably still put off the "take a closer look" subtask. Therefore, Zotero is 100% benefit and 0% drawback when it comes to the larger task of staying up to date with the journals in my field and finding interesting articles. That's why I feel like it's important for me to express my gratitude for all the work that has already been done to make Zotero what it is.

In short, Zotero already saves me hours of work when it comes to staying up to date with my field. The only thing left for Zotero to do for this task (other than read and annotate the PDFs for me, haha) is to fetch the PDFs for me, which is a tedious task that takes a long time.

Here is my usage as a step-by-step list:

1. Use Zotero to subscribe to RSS feeds of academic journals.
2. Scroll through the listed journal papers in the RSS feeds.
3. Add papers that catch my eye from the RSS feeds to my library.
4. Go to the URL for a saved paper.
5. Wait for the page to load.
6. Click the download button.
7. Select "Save download as" to tell the computer where the file should be saved.
8. Wait for the download to finish.
9. Open (or alt-tab/command-tab if already opened) the target download folder.
10. Click and drag the file from the folder onto its entry in Zotero.
11. Repeat steps 4-10 for every paper saved from the RSS feeds.

The Zotfile extension helps. It helped even more when its "Source Folder for Attaching New Files" feature worked, but that stopped working on my PC and my mac. But even when that worked, the task still took a lot of tedious clicking for every paper. In addition to being boring and repetitive, the amount of time it takes adds up very quickly.

If "Find Available PDFs" worked, then it'd be this:

1. Use Zotero to subscribe to RSS feeds of academic journals.
2. Scroll through the listed journal papers in the RSS feeds.
3. Add papers that catch my eye from the RSS feeds to my library.
4. If the PDFs aren't automatically fetched already, then right-click the group of added papers and select "Find Available PDFs."
5. Go make a cup of tea or check reddit while all of PDFs are automatically downloaded and added to their Zotero entries.

But alas, it rarely works. If I was to try incorporating Zotero Connector into my usage of the RSS feature, then these would be the steps:

1. Use Zotero to subscribe to RSS feeds of academic journals.
2. Scroll through the listed journal papers in the RSS feeds.
3. Add papers that catch my eye from the RSS feeds to my library.
4. Go to the URL for a saved paper and wait for the page to load.
5. Right-click the page and select Zotero Connector>Save to Zotero>[Metadata option]. (Note: You may need to select Zotero Connector>Reload via proxy>[Proxy option] first, but you should only need to do this once per journal at most.
6. Alt-tab/command-tab back to Zotero.
7. Right-click the entry that you added from the RSS feed, select "Move item to trash," and hit okay.
8. Repeat steps 4-7 for every paper saved from the RSS feeds.

I hope I've answered your question and demonstrated that I'm using Zotero in an ordinary, intended manner. Specifically, I'm using the RSS feature to find articles and add them to my library. And THAT is how users can end up in my situation without straying from normal usage!

As I did before, I want to end by emphasizing that I love Zotero and am immensely grateful for the work that you and other contributors do. Thank you!

dstillman · November 23, 2019

Ah, OK, yes, the feeds feature is another case where you could end up with items without PDFs, though it's still a bit of a misunderstanding to frame this in terms of the Find Available PDF feature.

When you use Add Item by Identifier or add from a feed, the translation process is run within Zotero itself, as opposed to within your browser when saving from the connector. If you have direct (on-campus/VPN) access to the PDF via the publisher page or an open-access version is available, a PDF will be automatically attached. That's the same thing that Find Available PDF would do, so it doesn't really make sense to think of this in terms of Find Available PDF — if it could do it, it would just happen automatically when you added the items in the first place. (It wouldn't make sense for you to have to manually perform an extra step after every save.)

But you're right that feeds functionality makes the lack of support for web-based proxies within Zotero itself a bit more problematic. We'll see what we can do in the future to address this.

squarebottle · November 23, 2019

I don't feel that I've incorrectly or misleadingly framed the problem at all. "Find Available PDF" is the thing that's failing to do what it's supposed to do from the perspective of the user's experience. You asked for an explanation of how normal usage of Zotero could result in a situation where the failure of "Find Available PDF" would matter very much. To your apparent surprise, I was able to provide you with exactly such an explanation. However, this isn't necessarily the only possible explanation!

Scenario A: Alice is a new user who has been storing her photo-copied papers in a filing cabinet system. Zotero looks like something that could make her life a lot easier, but is (understandably) daunted by the size of her filing cabinet. Just as she's about to give up, she discovers the magic wand button that offers to magically import all the info about a paper from just an ISBN or DOI. "Wow, that'll make this so much easier!" She takes a paper out of her filing cabinet, types the DOI into Zotero, and smiles when all the info appears exactly as promised. That's when she decides to start using Zotero. "I'll just add my papers to Zotero as I use them." Months later, she has a sizable number of entries in her Zotero library. That's how she ended up in a situation where "Find Available PDF" being able to fetch PDFs matters.

Scenario B: Bob, another new user but a bit more tech savvy than Alice, has spent years carefully growing and organizing the research bookmarks on his computer. He opens Zotero and begins to look for a way to import those bookmarks into Zotero. A surprisingly short amount of time later, he's found and completed the import process. Bob actually starts chuckling because it was so much easier than he thought it would be. He installs Zotero Connector and easily changes his habit from bookmarking papers to saving them to Zotero. Just as quickly, he comes to appreciate how Zotero Connector automatically downloads the PDFs and how great it is to have them. The delight he originally felt from importing his bookmarks turns to frustration when "Find Available PDF" fails to do the same for them.

Scenario C: Colleen has been happily using Zotero Standalone and Zotero Connector for years. One of the upperclassmen had recommended it to her during her university's freshmen mentoring program, so she has been spared the horrors of a college life without Zotero. Every so often, she exports her Zotero database. She sees the option to export it with files, but because she has noticed the "Find Available PDFs" button in the right-click menu, she figures that export option is for people who don't have reliable internet connections. She's seen other programs offer that option with that explanation. Even her Spotify playlist has trained her to think this way by giving her the option to download songs for offline listening. One day, her laptop stops working. "Good thing I made those backups!" she thinks to herself as she sets up a replacement. She downloads Zotero, imports one of the backups, and then all is right again--or so she thinks. All of her PDFs are gone! She remembers the "Find Available PDFs" button and thinks that if ever there was a time for that feature to shine, this was it. A chill runs down her spine and the color leaves her face when it fails to fetch a single PDF.

There's nothing particularly exotic about any of these scenarios. Neither Alice, Bob, nor Colleen are power users trying to get Zotero to do anything more than what it's intended. So you see, there are plenty of ways that regular people using Zotero in regular ways might end up in situations where "Find Available PDFs" could make a big difference. Alice wouldn't blame her filing cabinet, Bob wouldn't blame his browser bookmarks, and Colleen wouldn't blame her exports. (Well, maybe Colleen would *also* blame her exports.) In the same way, I don't blame the RSS feature. Their and my common experience is that "Find Available PDFs" isn't fetching the PDFs, so "Find Available PDFs" is where the failure happens from the perspective of user experience design.

But even if my situation was the only one, the underlying logic would be the same. "P happened and then Q failed" doesn't imply that Q failed because of P, nor does it mean that P is required for Q to fail, nor does it mean that not-P will keep Q from failing, nor does it mean that Q will only fail when P is involved... but none of this is actually the point of me writing all this. The point I'm trying to illustrate is that the problem with "Find Available PDFs" not using proxies (or failing for any other reason, from a UX perspective) affects more than just the people who use RSS. As such, I'm suggesting that you may need to reconsider your impact assessment.

Anyway, thanks again for all the work you do. Alice, Bob, Colleen, and I all agree that you rock. ;)

dstillman · November 24, 2019

Right — we're aware of all that. I mentioned Add Item by Identifier and import above. I'm just explaining that 1) adding items from the Zotero Connector is by far the primary way people add things to Zotero, which is why this isn't a higher priority and 2) supporting web-based proxies would be a feature of Zotero's translation layer, so when implemented it would run automatically for Add Item by Identifier and feeds. (In other words, Alice wouldn't be using Find Available PDFs at all, and neither would you for feeds.) And as I say, the workaround for finding yourself with a huge number of items without files is to go to a network that has direct access to files, or to connect via a VPN if possible, and then use Find Available PDFs. Supporting web-based proxies within Zotero is a totally separate, much more complicated problem on a technical and UX level, which is why it wasn't implemented at the same time as Find Available PDFs.

(C isn't a good example, by the way. An export isn't a backup and would have all sorts of other problems. A proper backup would include files. And the Spotify equivalent would be her using Zotero syncing, which would allow her to restore her library, including files, just by syncing.)

cdhender1024 · April 6, 2023

I am connected via VPN and "Find available PDF" has never worked. Not even once. If I find myself with a reference that doesn't have the full text for some reason, I have to download it to my computer then attach it to the reference. Why have the "Find available PDF" feature if it doesn't do anything?

dstillman · April 6, 2023

@cdhender1024: We’d need to see a Debug ID to say anything. It obviously works in general.