Better parsing of journal article keywords and finer control over file renaming standards requested

ashishmehta · January 7, 2020

First of all, I just want to say I really like Zotero. I just switched from Mendeley and there are many things I love about Zotero (e.g. use of my own PDF viewer, great parsing of title and authors, unencrypted database, open source, seamless integration with my browser, not Elsevier).

There are just a few things that I would suggest for the future
1) I'd really appreciate better parsing of the article keywords from articles. It seems that with most articles it does not identify the author's keywords. I'm surprised this is not already implemented since the title and author parsing is quite effective (better than Mendeley).
2) It would be very useful I could control the renaming standards of files better. I prefer to name my files FirstAuthorLastName_Year.pdf. I don't see how this is possible right now.
3) When adding parent item meta-data, it would be much more convenient to enter in comma/line-break delimited lists rather that separate fields for first and last name for each author. Then I could copy-paste blocks of text and make minor adjustments to comply with the formatting standard.
4) This one is a bonus "dream app" feature.. It would be cool to implement some text clustering machine learning to automatically suggest folder groupings of similar articles. Either by scanning the whole library and running a clustering algorithm or by having the user select an article and Zotero finds other semantically similar articles.

Thank you to the developers and to everyone behind the creation of this app. You are doing a great service to me and the academic community at large!

dstillman · January 7, 2020

article keywords from articles

This depends entirely on the site you're saving from. For many sites, Zotero will save keywords as automatic tags by default. If you're not seeing that somewhere you expect it, we'd need an example URL to say more.

renaming standards of files better

See the ZotFile plugin.

When adding parent item meta-data, it would be much more convenient to enter in comma/line-break delimited lists rather that separate fields for first and last name for each author.

While not quite the same, you can switch to single-field mode, paste in a newline separated list of "First Last" creators, and then toggle the field mode for each, which at least for Western names will parse the line into separate "Last" and "First" fields. Parsing of comma-separated names could be done (and maybe even worked at some point), though generally speaking you want to avoid manual entry whenever possible.

ashishmehta · January 7, 2020

Thank you for this information!

In regard to article keywords, here is an example: http://doi.org/10.1093/scan/nst043
Here is another example:
https://doi.org/10.1007/s00406-014-0510-z

It doesn't work for most of my articles in fact. Maybe I am missing something?

dstillman · January 8, 2020

If you hover over the save button you can see the translator Zotero is using to save the item.

For the first page it's using Embedded Metadata, which means it's getting metadata embedded in the page's source code, and they're not providing the keywords in the format Zotero is using. There's another format available in the page (JSON-LD), and they are providing keywords for that, but Zotero doesn't yet support that format (though it's planned).

For the second one, Zotero's Springer Link translator uses the RIS available on the page, and they're not including the keywords in the RIS. While the Springer Link translator could probably be adjusted to augment the data with the keywords on the page, it would be better for them to add the keywords to the RIS, since getting metadata from the visible parts of the page is usually pretty fragile.

ashishmehta · January 8, 2020

Oh I see this clears things up a bit. I had thought all data was being parsed from the page. I can see how that would be fragile. It might still be better than nothing though depending on how effective or ineffective it ended up being. In any case, thanks for the help!