SSOAR translator?

felwert · April 14, 2014

SSOAR <http://ssoar.info/>; is an open access repository for the social sciences. It already exports BibTeX and Endnote, so importing entries into Zotero is not that difficult. But having a native translator would be great, especially in order to directly download the availabe PDF documents. Would anybody be willing to contribute a translator for the site?

adamsmith · April 14, 2014

looks cool, sure. Probably won't be fast, though we've had a German social science librarian contributing a lot lately, maybe he'll be interested and do it more quickly.

zuphilip · April 16, 2014

SSOAR is using DSpace [ http://en.wikipedia.org/wiki/DSpace ] as their repository software. Maybe a more general approach is possible here, but I guess that the BibTeX-export in DSpace have to be configured in DSpace... @adamsmith: I guess that DSpace was discussed earlier here as well? On a page with a single item (e.g. http://www.ssoar.info/ssoar/handle/document/1919 ) there seems to be normally pretty good metadata and the metadata translator from Zotero is showing up. @felwert: Can you give some specific example for an url (document), where the translator symbol is not showing up? (If you want to batch save a larger amount of the PDFs, there is also an OAI-PMH from SSOAR.)

adamsmith · April 16, 2014

@zuphilip - there used to be a dedicated dspace translator, we removed it in favor of the embedded metadata translator, which, as you say, handles most dspace installations quite nicely.

But for the page you link to, e.g. - even if we patch RDF to recognize DC.type content="incollection" as a book section - the data in the bibtex is much better/clearer than what's in the site header. So using the bibtex seems worthwhile.

zuphilip · April 17, 2014

@adamsmith I am not convinced that the BibTeX data is better than what could be extracted from the metadata. The booktitle is missing in the BibTeX data and the volume in a bookSection is at least unusual and not really useful without the series name the book is in.

In the meta tags there is also the pages and ISBN saved, maybe one should move them to another field. I could speak with the GESIS about such things. What do you think?

adamsmith · April 17, 2014

I see.
Obviously if we could just get better data in the site header w/o a translator, that'd be ideal, yes. The DC vocabulary, IIRC, isn't great for that - adding google highwire metatags would probably produce the clearest data, but even with DC they could likely be clearer.

As things are now, using the metadata is too much of a guessing game (e.g. both publisher and book title are in DC.source etc.)

zuphilip · April 26, 2014

There is now a seperate SSOAR translator. Update manually or automatically tomorrow.

zurpher · June 12, 2016

@zuphilip

I have noticed that the SSOAR import could possibly improved. When importing data from:

http://www.ssoar.info/ssoar/handle/document/46968

I do not get the DOI and tags (e.g. Thesaurusschlagwörter, Klassifikation, Freie Schlagwörter).

Do you think this can be included in the SSOAR site translator?

DWL-SDCA · June 12, 2016

For most journal articles the headers contain the DOI. For example:

meta name="DC.identifier" content="http://dx.doi.org/10.17645/si.v1i1.109" xml:lang="de"

Sometimes even when the DOI is provided on the web page of an item it isn't in the meta tag information.

edit
Let me add, however, that I almost never download the article metadata from this site but follow the link to the article on the publisher's site. The publisher's site almost always contains more recent information while the SSOAR site metadata is missing volume, issue, page information.

zuphilip · June 14, 2016

If the doi is in the meta field, then we can extract it from there. I prepare a commit for that.

However, taking all tags looks too much and not useful anymore. The examples shows Thesaurusschlagwörter, Klassifikation, Freie Schlagwörter moreover, and they are present in English and German:


<meta name="DC.subject" content="Allgemeine Soziologie, Makrosoziologie, spezielle Theorien und Schulen, Entwicklung und Geschichte der Soziologie" xml:lang="de" />
<meta name="DC.subject" content="General Sociology, Basic Research, General Concepts and History of Sociology, Sociological Theories" xml:lang="en" />
<meta name="DC.subject" content="Quantifizierung" xml:lang="de" />
<meta name="DC.subject" content="quantification" xml:lang="en" />
<meta name="DC.subject" content="Frankreich" xml:lang="de" />
<meta name="DC.subject" content="France" xml:lang="en" />
<meta name="DC.subject" content="Klassifikation" xml:lang="de" />
<meta name="DC.subject" content="classification" xml:lang="en" />
<meta name="DC.subject" content="Konvention" xml:lang="de" />
<meta name="DC.subject" content="convention" xml:lang="en" />
<meta name="DC.subject" content="Institution" xml:lang="de" />
<meta name="DC.subject" content="institution" xml:lang="en" />
<meta name="DC.subject" content="Neoliberalismus" xml:lang="de" />
<meta name="DC.subject" content="neoliberalism" xml:lang="en" />
<meta name="DC.subject" content="sozialer Prozess" xml:lang="de" />
<meta name="DC.subject" content="social process" xml:lang="en" />
<meta name="DC.subject" content="Institutionstheorie" xml:lang="de" />
<meta name="DC.subject" content="theory of institutions" xml:lang="en" />
<meta name="DC.subject" content="Institutionenökonomie" xml:lang="de" />
<meta name="DC.subject" content="institutional economics" xml:lang="en" />
<meta name="DC.subject" content="Forschungsansatz" xml:lang="de" />
<meta name="DC.subject" content="research approach" xml:lang="en" />

I don't see any (easy) option to just grab a handful non-repeating tags from these.