Feeds: Quality of metadata

mjthoraval · August 28, 2022

I have tested the feeds of different journals in my field. For some of them, the main metadata displayed in the columns in the centre pane is extracted nicely (Title, Creator, Date, Publication, ...). But in most cases, a large part of the metadata extracted in not correctly attributed to the different fields, usually going all together to the field "Abstract". In the worst case, it could only get the title correctly.

Is there a way to improve the metadata extraction for each journal feed, as with the Zotero Connector?
Or is the problem coming from the publisher of the feed? In that case, what is the best way to report to them the problem? Is there any standard that they should follow so that it works nicely with Zotero?

dstillman · August 28, 2022

The metadata that's displayed for unsaved items for feeds isn't the same as the data that Zotero saves. The former is just standard feed metadata that comes from the feed, and you should see the same in any feed reader. When you save, it should be more or less equivalent to going to the page and saving using the Zotero Connector.

mjthoraval · August 28, 2022

Thank you for your explanations. The metadata I was discussing is the feed metadata displayed in the centre pane of the Zotero Feeds. It is good to know that the metadata used when importing to Zotero relies on the Zotero translators and therefore should be more reliable.

Then the problem is from the publishers. It seems that they format the data themselves to look ok in some feed readers, without filling correctly the feed metadata that it used by Zotero.
I will contact some publishers to see if they can fix their feeds.

I can probably use the feed of Journal of Fluid Mechanics as a good example to follow (except the inlineFormula that cannot be displayed in Zotero) to explain what is needed:
https://www.cambridge.org/core/rss/product/id/1F51BCFAA50101CAF5CB9A20F8DEA3E4

Elsevier seems to use a random formatting to put all the metadata in the description element, without filling the standard metadata elements.
For example, the RSS feed of the Journal of Computational Physics: https://rss.sciencedirect.com/publication/science/00219991

For this AIP journal Physics of Fluids, they also use their own formatting for the information put in description:
https://aip.scitation.org/action/showFeed?type=etoc&feed=rss&jc=phf

For the Springer journal Experiments in Fluids, they do not mention the authors at all.
https://link.springer.com/search.rss?facet-content-type=Article&facet-journal-id=348&channel-name=Experiments in Fluids

DWL-SDCA · August 29, 2022

In my experience, what comes through with the publisher RSS feed is _not_ the metadata that Zotero captures when the items are moved from the feed view into a Zotero collection. As @dstillman said above, metadata from moving a feed item is almost always identical to that if you went to the article itself on the publisher's website and clicked the Zotero import button in your browser. In my opinion the purpose of a feed is to allow a viewer to learn what is new and to go to the source to read the full article. Zotero eliminates at least one step for you by saving the visit to the publishers' websites for downloading the desired article metadata yourself.

Items inside the RSS feed "collections" are not yet fully accepted / integrated into your true Zotero library collections. Zotero feeds are great for capturing newly published articles. However, Zotero recognizes that seldom will a user want to always add to their library everything in every journal issue. Zotero lets you select only the items you need and allows you to place the items in the collection you choose.

mjthoraval · April 5, 2024

I have found this page on the formatting of RSS Feeds:
https://www.crossref.org/wp/labs/whitepapers/rss-best-practice/

But this is quite old, and the link to the PRISM Module is not working anymore:
http://www.prismstandard.org/resources/mod_prism.html

Is there anything more recent on the standard way to implement RSS Feeds for scientific journals?
I could find also this page, but also dead now:
https://idealliance.org/specifications/prism-metadata/

The Elsevier support directed me to this page, saying that their RSS Feeds work perfectly well in these RSS readers:
https://service.elsevier.com/app/answers/detail/a_id/10818/supporthub/sciencedirect/kw/RSS
They obviously do not want to support Zotero, but I would like to have the correct information on the standards to argue that they should fix their RSS Feeds.

Has anyone tried to contact Elsevier to ask them to provide proper RSS Feeds?
Most of my attempts to contact different publishers to fix their RSS Feeds have failed so far...

dstillman · April 5, 2024

The Elsevier support directed me to this page, saying that their RSS Feeds work perfectly well in these RSS readers

I mean, Elsevier obviously doesn't care about this sort of thing. The feeds work "perfectly well" in that all feed readers will display the contents of the description field, but it's just a block of text that they're dumping there. Zotero displays it as well, in Abstract. The main thing we can do is preserve formatting and newlines when displaying description, since right now we're just stripping the HTML tags. We'll look into doing that. But that won't help with the center columns, of course. We could add some custom hard-coded rules to try to parse known lines out of description from specific publishers, but that would be kind of ridiculous — this is a format with predefined fields, and the publishers should use them.

(Again, though, this doesn't affect what actually gets saved to Zotero.)

mjthoraval · April 15, 2024

I have found the following RSS Feed that contains information for the Volume, Number and Page: https://www.mdpi.com/rss/journal/fluids

For example, the following item:
"Deep Reinforcement Learning-Augmented Spalart–Allmaras Turbulence Model: Application to a Turbulent Round Jet Flow"
Contains the fields:
- prism:volume
- prism:number
- prism:startingPage

However, they are not displayed in Zotero:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/0oz7rvazyw3cdbbnwlvv.png
Is there a problem in the RSS Feed, or are these fields not supported in Zotero?

dstillman · May 23, 2024

Additional PRISM fields should work now in beta 81

And just to note here, we added support for rendering HTML in abstracts last month.

mjthoraval · May 23, 2024

Thank you very much.
I can now see the Volume, Number and Page:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/yy53j1d0o09sjpzqrl30.png

But my feeds from the American Physical Society got broken on the way: they cannot be refreshed anymore:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/irqm3vivu1wti0ntnrjw.png

[JavaScript Error: "Error processing feed from http://feeds.aps.org/rss/recent/prfluids.xml:

TypeError: feedText.createDocumentFragment is not a function"]

[JavaScript Error: "Error processing feed from http://feeds.aps.org/rss/tocsec/PRE-Fluiddynamics.xml:

TypeError: feedText.createDocumentFragment is not a function"]

[JavaScript Error: "Error processing feed from http://feeds.aps.org/rss/tocsec/PRL-NonlinearDynamicsFluidDynamicsClassicalOpticsetc.xml:

TypeError: feedText.createDocumentFragment is not a function"]

They still look fine in Feedly.
And the other Feeds can be refreshed.

Debug ID D357392977
Zotero 7.0.0-beta.81+721f54fe4 (64-bit)
Windows 10

AbeJellinek · May 23, 2024

Will be fixed in the next beta, thanks. Sorry about that.

mjthoraval · May 23, 2024

Thank you.

dstillman · May 30, 2024

Fixed now in beta 82

mjthoraval · May 30, 2024

Thank you. It is working nicely now, also showing the Pages:
https://s3.amazonaws.com/zotero.org/images/forums/u265723/84qw7effhkx1fojvm38n.png