Is there a way to "Retrieve Metadata" for epub files within zotero?
I have over 5,000 articles, books, and other bits in my library right now. Near 1/6th of them are EPUB format books. Are there any internal or external tools to batch retrieve the metadata for these files?
That could be done in Zotero, probably more easily than for Calibre, actually, but it's a completely separate workflow that'd have to be implemented. Something like that is generally planned, but will take time.
- Many formats have tools that allow the plane text to be dumped in a similar way that we use the xpdf/poppler pdftotext program & so our current method of getting PDF data could conceivably be extended to other formats
- Many of these formats (including PDF) have the ability for structured metadata to be embedded & there are various tools that could read the metadata included with the file
I believe that Calibre ships with tools that read/write the embedded metadata.For PDFs, specifically, I think the current state of XMP is that it's so bad that it's entirely useless (i.e. you get so much false positives, that importing it is worse than nothing).
I'd imagine that for epub that'd be better, so that could be an approach.
Is there an equivalent of pdftotext for epub?
Last question is, how many individual formats can Zotero reasonably have custom approaches for (and which ones).
Calibre ships with ebook-convert, which could convert epub to a text file. There's likely others.
No idea of which attachment formats we should be supporting (djvu? rtf? doc(x)?). I do think a more generalized approach of being able to plugin different tools to either extract metadata or to generate a text file from the file would be better than what we have now.