Excel to BibTex to Zotero

postiffm · July 13, 2015

I have my library in Excel, want to get it into Zotero.

I exported to CSV, wrote perl to convert like this:

@book{
author = {Williams, James B. & Randolph Shaylor, eds.}
title = {God's Word in Our Hands: The Bible Preserved for Us}
publisher = {Ambassador Emerald}
lccn = {220.09 G589}
location = {Office}
}

But in the Info box, the lccn goes to the extra field, like so:

Extra: LCCN: 220.09 G589

and the location goes to the place field, like so:

Place: Office

Neither of these seem right, for Place should refer to publication location, and I would think that location in Archive would be suitable for where I shall find my resource, either in my print or digital libraries. And, my call number is technically not the same as the LCCN; but it should go to the "Call Number" field.

Can I automate putting my personal call number into the "Call Number" field and my location (Office, Kindle, Logos, iBooks) into the Loc. in Archive field?

Looking at the documentation at https://www.zotero.org/support/kb/field_mappings, I see that "Loc. In Archive" is supposed to be mapped to "location = {LOC. IN ARCHIVE}" and lccn to "Call Number." Doesn't seem to be happening that way.

Any ideas? Thanks!

noksagt · July 13, 2015

Bibtex has no generally agreed upon standard for those two fields. Lccn in unambiguous enough that we should probably support it on import, but I've not seen location used in the way you describe. See http://www.fb10.uni-bremen.de/anglistik/langpro/bibliographies/jacobsen-bibtex.html for a different take.

So, I'd advise you to either modify your own copy of the bibtex translator or to use a different intermediate format.

postiffm · July 13, 2015

What's the easiest way to find out the "latest truth" about what BibTex fields are mapped to what Zotero fields? Any easy place to look in source code or other reference material?

noksagt · July 13, 2015

The code is reasonably readable:
https://github.com/zotero/translators/blob/master/BibTeX.js

postiffm · July 27, 2015

I ended up using RDF. Generated sample output from Zotero, then made my input like that. It took some trial and error with a Perl script. But, I was able to take my Excel data and change it into crude but workable RDF format. Saved a lot of typing. Here is the perl script if anyone is interested. Have at it.

#!/usr/bin/perl
# Usage (in cygwin console): myLibToZoteroRDF LibraryCatalog.txt

# The txt file is saved from Excel as txt (not comma separated, but like
# that, with tabs as separators.)
# The headings of the columns are:
# Author, Title, Volumes (that is, # of vols), Publisher, Call Number, Truncated Call #, Location.
# 0, 1, 2, 3, 4, 5, 6
# If your layout is different it is simple to change the indices below to
# make it work. Remainder of columns are ignored.

# Key to getting this to import properly is to remove all & from the
# input fields, like in multiple author names, and all strange
# characters like "..." which is \205.

# chmop the first line - headings
$line = <>;
$count = 0;

# Print the head of the RDF file
print <<EOF;
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:z="http://www.zotero.org/namespaces/export#"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:bib="http://purl.org/net/biblio#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
EOF

# xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/">

while ($line = <>) {
@fields = split('\t', $line);
#print @fields;

# iterate on this field
$numVols = $fields[2];
#print " number = {$numVols}\n";
#for ($i = 1; $i <= $numVols; $i++) {

# I purposefully ignore this Excel-computed field that is a
# short form of call number (213 G327 => 213)
#$category = $fields[5];
#print " note = {$category},\n";

$publisher = $fields[3];
$publisher =~ s/\&/ and /g; # replace & with and

$author = $fields[0];
$author =~ s/\"//g; # Could split last name out, etc. but try without
$author =~ s/\&/ and /g; # replace & with and
# Could add <foaf:givenname>Alva J.</foaf:givenname> for first name/initial
# but for now I'm just doing a quick and dirty implementation because
# there are so many possibilities of authors/editors/combinations.
$title = $fields[1];
$title =~ s/\"//g;
$title =~ s/\&/ and /g; # replace & with and
$callNum = $fields[4];
$resourceLocation = $fields[6]; # Office, Kindle, Logos, etc.
# I iterate on this field
#if ($numVols > 1) { print " number = {$i},\n"; }
$itemNo = $count+1;

print <<EOF;
<bib:Book rdf:about="#item_$itemNo">
<z:itemType>book</z:itemType>
<dc:publisher>
<foaf:Organization><foaf:name>$publisher</foaf:name></foaf:Organization>
</dc:publisher>
<bib:authors>
<rdf:Seq>
<rdf:li>
<foaf:Person>
<foaf:surname>$author</foaf:surname>
</foaf:Person>
</rdf:li>
</rdf:Seq>
</bib:authors>
<dc:subject>Own</dc:subject>
<dc:subject>$resourceLocation</dc:subject>
<dc:subject>
<dcterms:LCC><rdf:value>$callNum</rdf:value></dcterms:LCC>
</dc:subject>
<z:numberOfVolumes>$numVols</z:numberOfVolumes>
<dc:title>$title</dc:title>
</bib:Book>
EOF

# <prism:volume>1</prism:volume>

$count+= $numVols;
# } # for loop that iterates over number of volumes
}

# Close the RDF file
print "</rdf:RDF>\n";

print STDERR "The total number of books in my library is $count\n";