Publication lists for author/researchers' pages, adding full text search of pdfs...

I've been asked to add functionality to a lab's self-hosted wordpress website that provides the typical publication lists, one master list for the lab and one for each scientist ("authors").

I've looked at so many different tools my head is spinning. None provide all the functions. I went into this never having heard of Endnote but not Mendeley, Zotero, bibtex, bibtex2html, Zotpress, bibtexbrowser, bibsonomy, ... ack. I've come to the conclusion I'll need to write my own, relying on existing tools to do as much as possible, and Zotero looks like the best tool for managing the data itself.

Still, a few complications leave me unsure what path makes the most sense.

- They want full text search.
- They want links to download the documents (have to track copyright issues in some form)
- There is no standard tool used by the authors - a good portion only have word or text files with their list (about half have Google Scholar profiles)
- They don't want to pay much at all in licensing fees (I'm doing this as a volunteer effort, part of redoing their very old website)

On the whole they will use a tool if the institute administration office says "hey, this is a good choice... And we'll set it up for you!" They're already maintaining a list manually. This, in turn, means I do the homework and help with the importing and training. I'm pretty convinced Zotero is the best fit for both the individuals and the website list project. A few who have a tool in place may not change, but they are also the ones who can be expected to provide well formated exports going forward.

I'm not sure if I am approaching this right. I'd naively assumed I'd be able to use a Google Scholar API. Oh well. Now I naively assume I should go the other extreme:

A database with tables populated from exports from the tools the authors use, with various manual cleanup/additions along the way as needed. These would include, say, tables for Document (bibtex plus additional information), Author, Publication, probably ones for Publisher, Tags, etc.
In addition I'd need a table or tables for Self-Archive/Copyright Policies populated via RoMEO api.

The Publication table would include an indexed column for Extracted Full Text from the PDF, populated by a script running pdftotext and performing some sanity checking and cleanup. -- The full text search would run of this column.

Is this seem utterly misguided?
The alternatives include relying on Zotero directly for the data, combining it with self-archive rights data from RoMEO and full text data stored locally. I'm just not sure how to have the office manage the data for those who won't take up doing so themselves if the website draws from Zotero on the fly. The very large benefit here is relying on Zotpress for much of the work.

Sorry this is so long!
  • For people who won't use Zotero themselves, your best bet is probably a private group that you import people's exports into.

    For the people who would manage their own lists in Zotero, if you can wait a short while, much of this will be done for you. Zotero 5.0 will offer a My Publications feature, which will let people drag items they've authored into it to have those items published to the person's publications page on zotero.org. It will also offer public file sharing of included files, which isn't the case with other Zotero libraries, so without that you'd have to rig something up using a private group and an API key.

    This will all be accessible via the Zotero API. We'll also be offering a tool that can receive pushed changes from the API and perform custom actions — the initial use case will actually be to publish work to an institutional repository — but you'd also be able to use Zotpress if you didn't need to store much separately.

    The Zotero API offers full-text search of attached files (though not currently of metadata, which is currently limited to Title/Creator/Year).

    Zotero 5.0 should be available in beta within the next couple months (possibly with a preview release coming much sooner).
  • You just made my week.
    I'm not sure why I missed the ability to search full-text search. I haven't explored the API but had read over the API Basics. I see now I skimmed to quickly. My searching and reading has been too scattershot as I try to get a sense of what matters and what is out there. I know once it clicks it will seem so simple, but I went into this without a map.

    Thank you!
  • @hlovette: Perhaps the institutional repository of your university offers the required functionality?
  • "Zotero 5.0 should be available in beta within the next couple months"

    Did I miss the beta announcement? If not yet announced, can we have an ETA for the beta?
  • there's no beta yet and I don't think there's a reliable eta (as you can tell...).
  • The wait is killing me :( .... but I am sure it will be worth the wait :)
  • Hmmm ... seems like this must be very far behind?
  • The beta for 5.0 is out:
    https://forums.zotero.org/discussion/59829/zotero-50-beta/
    The "My Publication" feature is functional.
Sign In or Register to comment.