Zotero 7 beta: Download format selection?

edited August 8, 2023
Now that there is an epub reader and annotatable HTML, how are you thinking about importing full text?
Some sites do offer epub alongside PDF (e.g. Frontiers https://www.frontiersin.org/articles/10.3389/fsoc.2023.1190872/full ). Many pages offer full text HTML, of course.
I think it's fairly clear that most people will continue to prefer PDFs for now, but both HTML and epub have distinct advantage, not least with respect to accessibility (PDFs can be accessible-ish, but are much less likely to be and even accessible PDFs are a mess e.g. for people who want larger fonts).

Getting this right for the maximal number of users seems tricky -- I guess conceptually, you'd want something like a preference order? (such as: ideally PDF, if that's not available HTML, never epub) but that seems impossible UX-wise. Any thoughts on this?
  • It hadn't occurred to me that people might now want HTML for full text, but I guess that's right. And that's actually part of the problem here.

    I think we could come up with something reasonable for the preference — we basically would need a list with "PDF", "EPUB", "HTML" where the options could be both moved up/down and individually disabled. But, of course, we already save snapshots if snapshots are enabled, and often not full text. So then if someone puts "HTML" first, what does that mean?

    It's a bit hacky, but a potential solution here is suggested by #3078, where we're likely going to just look for, e.g., full[ -]?text snapshot in the title from the translators and use a localized string. Maybe that counts as a PDF alternative and anything else doesn't.

    Alternatively, we could just drastically scale back what we save snapshots for so that we're really only saving them as alternatives to the PDF. I assume there are some exceptions where we don't want to do that, but otherwise we have a situation where "HTML" is a full-text option along with "PDF"/"EPUB" but we also have the existing snapshot preference? Explaining that does seem fairly close to impossible.

    (And then we have the problem that the updated translators will need to return all of these for Z7 to pick from, but they can't be served to Z6, because Z6 would just happily save them all. I'm not sure we ever came up with a great solution for this sort of situation, but I'll think about what we can do.)
  • Leaving epub aside for simplicity for a second, let's think about two sites -- NY Times and Cambridge UP -- and two sets of preferences:
    But, of course, we already save snapshots if snapshots are enabled, and often not full text. So then if someone puts "HTML" first, what does that mean?
    Preference PDF > HTML
    Cambridge UP: Save PDF
    NY Times: Save HTML

    Preference HTML > PDF
    Cambridge UP: Save HTML
    NY Times: Save HTML

    So far that seems doable and simple enough.
    but otherwise we have a situation where "HTML" is a full-text option along with "PDF"/"EPUB" but we also have the existing snapshot preference?
    This would primarily seem an issue to accomodate the Snapshot preference of people who don't want snapshots?

    If we're thinking of the preferences above, would it be possible to also omit options, which will then *never* get saved so two more options:

    Preference PDF
    Cambridge UP: Save PDF
    NY Times: Nothing

    Preference HTML
    Cambridge UP: HTML
    NY Times: HTML
    JSTOR (i.e. pages w/o useful HTML full-text): Nothing

    Going that way, you'd be able to add epub into the logic easily, but it seems fairly complicated to convey?

    The remaining question is if we ever want HTML and PDF and I think there my answer would be No, although we're currently doing that in a number of places. (We may want PDF and attached link, but that seems different enough to accomodate)

    (I don't have anything on Z6/Z7 but yes, that's... tricky)
  • edited August 8, 2023
    I was really just referring to sites where we save snapshots that aren't full text, or sometimes aren't full text, but maybe I'm overestimating how much we do that. We just don't want to create cases where we skip a full-text PDF because the translator is programmed to offer a non-full-text snapshot, just because the user has HTML > PDF. But maybe there aren't many of these, and we can fix any that exist.

    If we were sure snapshots were always full-text snapshots, we could remove the separate snapshot preference and just have the prioritization pref. It's always been clumsy that you had to configure those separately when some sites offered both.
    If we're thinking of the preferences above, would it be possible to also omit options, which will then *never* get saved
    Sure. Just imagine a reorderable listbox with checkboxes and grippies for dragging:

    ==============
    | ✓ PDF == |
    | ✓ EPUB == |
    | ✓ HTML == |
    ==============

    ==============
    | ✓ EPUB == |
    | ✓ HTML == |
    | ✓ PDF == |
    ==============

    =============|
    | ✓ PDF == |
    | ✓ EPUB == |
    | HTML == |
    ==============
Sign In or Register to comment.