DSpace Translators not longer valid?
Hi all,
I have recently send an email to the dev list
http://groups.google.com/group/zotero-dev/browse_thread/thread/b6eda90295e5be3f
about issues with some DSpace sites, more properly I have noted that when the DSpace translator is used it doesn't work properly and no data are imported in Zotero.
I continue this discussion in this tread because seams the most appropriate place where do it.
You can check the issues with this url (standard DSpace installation):
http://dspace-testhaton.cilea.it/jspui/handle/123456789/60
Anyway, I have learned from the Richard Karnesky mail (thanks for your answer) that the DSpace translator is used when the keyword "dspace" is present in the web site URL and after have read some line of code I think that it also look for specific html element in the page that is present only in one of the two UI that dspace now provide...
The DSpace platform has been update since the translator has been produced so some html details are changed and these changes break the translator to work. Finnally, in the last version of DSpace we have RDF info embedded in one of the UI (JSPUI) and COINS (Z39.88) data in the other
(XMLUI). Where the DSpace Translator are not used (because the url doesn't contains the dspace keywords) the RDF import work well...
see for examples:
http://researchspace.auckland.ac.nz/handle/2292/3065
http://www.archenvimat.pz.cnr.it/handle/10122/365
so should the DSpace translator to be removed? if we want to mantain it for "old" dspace site could be reduced the priorioty of the Site Translator so that the RDF translator will always win?
Thanks,
Andrea
I have recently send an email to the dev list
http://groups.google.com/group/zotero-dev/browse_thread/thread/b6eda90295e5be3f
about issues with some DSpace sites, more properly I have noted that when the DSpace translator is used it doesn't work properly and no data are imported in Zotero.
I continue this discussion in this tread because seams the most appropriate place where do it.
You can check the issues with this url (standard DSpace installation):
http://dspace-testhaton.cilea.it/jspui/handle/123456789/60
Anyway, I have learned from the Richard Karnesky mail (thanks for your answer) that the DSpace translator is used when the keyword "dspace" is present in the web site URL and after have read some line of code I think that it also look for specific html element in the page that is present only in one of the two UI that dspace now provide...
The DSpace platform has been update since the translator has been produced so some html details are changed and these changes break the translator to work. Finnally, in the last version of DSpace we have RDF info embedded in one of the UI (JSPUI) and COINS (Z39.88) data in the other
(XMLUI). Where the DSpace Translator are not used (because the url doesn't contains the dspace keywords) the RDF import work well...
see for examples:
http://researchspace.auckland.ac.nz/handle/2292/3065
http://www.archenvimat.pz.cnr.it/handle/10122/365
so should the DSpace translator to be removed? if we want to mantain it for "old" dspace site could be reduced the priorioty of the Site Translator so that the RDF translator will always win?
Thanks,
Andrea
Note that both the eRDF & COinS translators in Zotero are still somewhat limited. The COinS translator is about as rich as it could be, without extending the OpenURL spec further. The RDF vocabulary is getting richer. However, it is a shame that DSpace offers up a lot of data that Zotero can't capture through either of these methods. In particular, the abstract & attached files are not captured (and cannot ever be captured via COinS 1.0). Also, there is no batch import (and currently can't be via eRDF). I would say "no:" it should be improved, if possible, to work with modern versions of DSpace and/or DSpace should use a method to get very rich data into Zotero.
Re this latter point: look into unAPI+RDF/MODS XML for now & please follow the development of the newer bibo ontology for RDF.
As far as improving the DSpace-specific translator: are there robust ways to get rich information from DSpace, regardless of the version? Or, at least methods to get the version? If the latter, we could default to RDF (or COinS) & supplement the info w/ abstract & attachments and also support batch import.
I think we would rather not see an impractical constraint like dspace instances having to have "/dspace/ in their URL's. That was never an accurate expectation (nor /handle/ or any other URL structure for that matter).
Ideally we could identify later versions of a DSpace instance by adding html/head/meta identifying it as such. Would this allow a DSpace translator for Zotero to not be reliant on the URL structure?
If at all possible there shouldn't eventually not be a "DSpace" translator and we should be relying on metadata fields in the html, RDFA and/or attached RDF and finally the Open URL CoinS. The existing DSpace translator is probably more appropriate for existing pre-1.5.2 DSpace sites and should be kept for legacy purposes until we can clean up the Zotero behavior and end of life versions earlier than 1.5.0.
Mark Diggory
Most translators use URI-specific code so that Zotero does not have to interrogate every page with every translator. So, putting information in head/meta is useful for version info & confirmation for a site-specific translator, but it is probably not the best solution here. Yes. And I would still add unAPI+MODS to that list: it is not that hard to code & it works now. Batch import, abstracts, file attachments are all handled.
We've been working on an implementation of unAPI for our repository, but Zotero doesn't offer unAPI on our site because the DSpace translator (which does not work for our repository as it a) is DSpace 1.5 and b) does not follow the stock DSpace site design) gets there first. So it looks like the current behaviour is for Zotero to apply the translators first and only fall back to unAPI if none of them match.
For anyone adding unAPI to an existing DSpace repository, this isn't going to work. If unAPI is better than scraping the HTML (which it unquestionably would be) why isn't this the other way around?
If we're comfortable saying that unAPI and COinS should always take precedence over DSpace, we can adjust the translator priority accordingly.
Maybe someday the default DSpace translator can be improved so it can cope with various versions, but currently it seems suboptimal that so many repositories are being locked out from offering Zotero.
As we are ramping up to include more and more theses in our repository, this is becoming a problem for us and our (Zotero) users.
Can we go ahead with giving unAPI and COinS precedence?
On this issue, I would suggest:
1) a tag added to HTML content to repel the old DSpace->Zotero translator from sites who does not want it to be used. It could be even a tag to tell to Zotero which strategy to use for the site. For instance:
<meta name="zotero.translator" value="meta"/>
2) If the DSpace translator is repeled, the <meta name="DC{TERM}.element.qualifier" content="xxxx"/> seems just fine for me: the DSpace community would certainly be open to improve meta generation in DSpace items display. Andrea Bollini gives a nice example of this (source of http://researchspace.auckland.ac.nz/handle/2292/3065)
Where the <meta name=... /> scraping by Zotero is documented?
Thanks!
Christophe
1.) there is no guarantee that "/handle" will be available over the long term or be appropriate for DSpace instances.
2.) We've worked very had to provide appropriate metadata in the html head metad fields and likewise, using COinS.
Further improvements will come in the future. There is no reason for Zotero to differentiate DSpace sites from other more generic resources, if it were possible to have priority on META tags and CoinS first, and then various DSpace centric features.
Also note that in 1.6.0, we will be introducing a specific DSpace version META tag. Which should alleviate things a bit.
My biggest question... who is supposed to be doing this work?
Mark
--
Mark R. Diggory
Head of U.S. Operations
http://www.atmire.com - Institutional Repository Solutions
Sarah Shreeves
IDEALS - http://www.ideals.illinois.edu/
There was a discussion today about the support of Google Scholar indexing by DSpace:
http://jira.dspace.org/jira/browse/DS-396
It is mainly based on the following meta tags:
<meta name="citation_journal_title" content="Journal Name">
<meta name="citation_authors" content="Last Name1, First Name1; Last Name2, First Name2">
<meta name="citation_title" content="Article Title">
<meta name="citation_date" content="01/01/2007">
<meta name="citation_volume" content="10">
<meta name="citation_issue" content="1">
<meta name="citation_firstpage" content="1">
<meta name="citation_lastpage" content="15">
<meta name="citation_doi" content="10.1074/jbc.M309524200">
<meta name="citation_pdf_url" content="http://www.publishername.org/10/1/1.pdf">
<meta name="citation_abstract_html_url" content="http://www.publishername.org/cgi/content/abstract/10/1/1">
<meta name="citation_fulltext_html_url" content="http://www.publishername.org/cgi/content/full/10/1/1">
<meta name="dc.Contributor" content="Last Name1, First Name1">
<meta name="dc.Contributor" content="Last Name2, First Name2">
<meta name="dc.Title" content="Article Title">
<meta name="dc.Date" content="01/01/2007">
<meta name="citation_publisher" content="Publisher Name">
May be the presence of the pattern <meta name="citation_xxxx" should be considered by Zotero as "bibliographically friendly" and scraped in priority whatever is the underlying software?
Have a nice evening!
Christophe
http://drupal.org/node/641580
A report by someone who experimented it:
http://www.monperrus.net/martin/accurate+bibliographic+metadata+and+google+scholar
Many things are not publicly documented with Google (Search engine orthographic approximations? sorting rule for search results?). They are mainstream: alternatives must be encouraged but the main stream must be perfectly supported (IMHO).
This being said, others have a similar approach to make Dublin Core more precise:
http://iodeweb1.vliz.be/odin/handle/1834/882?mode=full
http://www.ceemar.org/dspace/handle/11099/897?mode=full
https://doclib.uhasselt.be/dspace/handle/1942/10024?mode=full
(they are using "bibliographicCitation" instead of "citation" but the approach is similar: better map MARC to an extended DC).
Something that may help.
http://wiki.lib.sun.ac.za/index.php/SUNScholar/XMLUI_Theme/Tutorial#DRI2XHTML_Transformers