CrossRef
Automatic retrieving Metadata for PDFs is a very useful feature of Zotero. When doi can be found goes via CrossRef query. This is also perfect, since the amount of errors is reduced compared to Google scholar. However, I am missing page/article numbers for APS journals in so generated data. Missing journal abbr. is also something that could be fixed automatically.
Would it be possible to update crossref.js in such a way that it follows the url to find the missing info?
Would it be possible to update crossref.js in such a way that it follows the url to find the missing info?
I don't think going to the actual publisher page and scraping from there is feasible for a DOI lookup.
In other words - I can see how this works when you know already where the doi will take you, but I don't see what you do in the normal case, where you don't.
If you manage, no guarantees about getting this back into Zotero - I think it sounds interesting, but there might be concerns about leaking user data (you'd be making request to multiple third party URLs) and excessive http requests - I don't make these type of calls, just flagging these issues.
At the present time the behavior of Zotero is rather inconsistent. By following the logic of adamsmith when getting the bibliographic data from the publisher's web page one should only aim at getting doi and the rest should be accomplished by sending a request to CrosssRef. That is not how Zotero with its rather large number of translators works!
Very close to the request of some verification tool would be to allow users to fill in missing fields themselves by using some simple scripting language. Say,
for all Journal_Articles in My_Collection
if (Publication == Physical Review Letters) {Journal Abbr.="Phys. Rev. Lett."}
Batch editing and journal abbreviations are separate topics, there is, in fact, an experimental plugin to address the abbreviations issue which should work with this or the next version of the Zotero beta:
http://citationstylist.org/tools/?#abbreviations-gadget-entry
The name of the journal is "Physical Review Letters", also CrossRef correctly provides this info. However, Google Scholar always uses a different capitalization: "Physical review letters". If some subordination rules are set, say, Journal's webpage has a priority over CrossRef, CrossRef has a priority over Google Scholar this inconsistency would never take place.
Batch editing could just be an alternative to this approach if adamsmith considers it to be too http-request-intensive.
And as I note above, the principal problem with querying the publisher site first is that I don't think it's feasible technically.
If I'm wrong about that and it can be done technically, there may be other issues to worry about (the leaking and http requests - though, again, this isn't something I decide).
An option to update existing data e.g. from CrossRef is yet another topic, I think that'd be nice to have but not trivial to do, as you'd have to deal with merging conflicts (i.e. what do you do if the locale and the remote version of the file have different information for a field - not saying that can't be done, but it takes work).
What I find can be a valuable feature if Zotero would be able to give a context sensitive hints to the search based on the indexed material (a la google). Show similar articles...
Concerning CrossRef vs. Google Scholar. Why not to go a step further in the following scenario:
i) doi can be found in pdf -> request to CrossRef ->Successful return
ii) doi cannot be found in pdf ->Google Scholar Search->Free doi Look up at CrossRef -> Successful return or i).
Extra query of CrossRef is not going to kill your internet traffic, but it can do some verification.
I made manual tests for the old articles which do not contain doi. By querying CrossRef with results of Google Scholar search I was always able to find a unique doi and fill in missing bib. data!