"Find available PDF" - look at url in metadata and attachment urls?

edited November 24, 2018

it would be amazing if "Find available PDF" also looked at the url in metadata and attachment urls. We imported a large RIS library, some with DOI, some without. But many items have a (freely available) PDF, that is either in the url in metadata and attachment urls. While it's not difficult to get the items, we still have to click the item (or copy to wget) and then click again (or drag) to get to Zotero. It would be so much neater if it was just one click.

We have ~2,000 items. So one right click "Find available PDF" vs. 2,000 3-step actions makes a big difference... (Especially when the PDFs are accessible without paywall and we have the direct URLs already ...)

(Of course, workaround is to get the local files from the RIS, and then re-import the RIS - but this means we have to decide up-front, and might be adding a lot of never-used-PDFs to the library. So we'd rather be able to get the PDFs in chunks, as we need them, distributed across different users - it's a shared library.)

  • To be a nincompoop isn't the "Find available PDF" function done by the add-in ZotFile and not Zotero proper?
  • Why would be working with 2000 items and a beta release?!?

    I don't know if you can have a production release and a beta release installed on same PC, but I'd never ever try to do "real" work with a beta release. Just seen to many programs blow up.
  • To get access to „find available pdf“ (though that’s now come out of beta) and to help test new features :)

    I haven’t tried this, but according to the he beta page it would seem possible to switch between the standard version and the beta version without problems!
  • edited November 25, 2018
    @max1836: Zotero betas are fairly stable — Zotero has a huge test suite that's run automatically on every commit, and most changes don't endanger data in any way regardless. You certainly shouldn't run the beta if you're uncomfortable, but a decent number of people do, it's generally perfectly safe, and it is indeed a great way for people to help us test new features (and one that we're grateful for).

    @bjohas: I'm not sure I'm understanding you. Can you provide an example RIS entry that shows what you're describing? "Find Available PDF" will already follow the URL in the URL field, if that's what you mean, and either look for a PDF it can translate from there or, if it's a direct link to an ungated PDF, download that PDF and add it as an attachment (even though the PDF URL itself generally isn't supposed to be in the URL field).
  • edited November 25, 2018
    @dstillman For example, this RIS has a URL that leads to a (free) PDF, but Find Av PDF doesn't download the URL.

    TY - JOUR
    TI - Reforming Instruction, Curriculum, Assessment, and Structure to Teach Vocational and 21st Century Skills
    AU - A Blom, X Cao, H Andriamihamina
    UR - http://documents.worldbank.org/curated/en/297731506359700310/pdf/119989-WP-P159532-PUBLIC-wbBotswanaetcpublication.pdf
    ER -

    As a follow-up question: Are URLs in attachments also checked, i.e. if the URL in the metadata was unsuccessful, will Find Av PDF proceed to url-type attachments? (Edit: Basically, I wonder whether this behaviour is implemented: If a smart search identifies the item has having a PDF attachment, and assuming that this PDF isn't behind a paywall, will Find Av PDF retrieve it?)
  • Right, so this works generally, except:
    documents.worldbank.org uses an invalid security certificate.

    The certificate is only valid for the following names:
    wbsvcnow.worldbank.org, wbsvcnowdev.worldbank.org, wbsvcnowqa.worldbank.org, wbsvcnowtst.worldbank.org
    Zotero only uses HTTPS for this feature, so if it's an HTTP URL and the equivalent HTTPS URL fails, as it does here, the PDF won't download.

    It doesn't have to work this way, but we do it because 1) the feature was designed primarily around Unpaywall data, which in many cases still has an HTTP URL when the HTTPS URL works fine, 2) this is a mostly automated function, and we don't want to expose details of potentially thousands of saved items to anyone who might be watching traffic, and 3) the future is HTTPS (e.g., with browsers already marking HTTP sites as "Not Secure"), and with certificates now freely available there's really no excuse for a site not to support HTTPS.
    Are URLs in attachments also checked, i.e. if the URL in the metadata was unsuccessful, will Find Av PDF proceed to url-type attachments?
    No. If there's already a PDF file attachment, the file should exist either locally or in the online library and shouldn't need to be retrieved again. Link attachments aren't checked, because they don't have any particular semantic meaning, and there's no way to know that a link attachment points to the primary PDF or a page with one.
  • @dstillman: I'm of course don't mean to knock those that help. Beta testing is important. However I'd just play around with a beta version and not use it for "production" work.

    I fully appreciate that there is no way for the Zotero development team to do testing with every set of products that users could configure. For example there are only about 5000 favorite browsers. :-)
  • Link attachments aren't checked, because they don't have any particular semantic meaning, and there's no way to know that a link attachment points to the primary PDF or a page with one.
    ... except if that link ends in .pdf, right? But ok, fair enough. Maybe it would be nice to have a Help link on the Find Av PDF window, that links to the relevant docs?

    Many thanks for the help with this - very much appreciated!
  • It's not that we wouldn't know if it's a PDF — it's that there's no way of knowing if it's the primary PDF. It could be, say, a link to supplemental materials.
  • Ah ok, I get it now! Thanks!!
  • Hello @dstillman - I've got an entry with this https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/155470/eth-48718-01.pdf in the URL field - yet find available pdf doesn't fetch the file. That's strange, right?
  • The site is returning an invalid value for the content type (specifying a character set for a PDF), but I've relaxed the check for this in the latest Zotero beta, so it should work now.
Sign In or Register to comment.