Available for beta testing: new PDF recognizer
This is an old discussion that has not been active in a long time. Before commenting here, you should strongly consider starting a new discussion instead. If you think the content of this discussion is still relevant, you can link to it from your new discussion.
https://www.rfc-editor.org/info/rfc4291
Oddly, the recognizer finds the meta data for the previous (1998) version of this spec when given the RFC 4291 (published in 2006).
Thanks for all your work on this!!
If you're asking whether you can get a PDF from a DOI with Add Item by Identifier, no, not currently, though that should be more possible in the future.
If you are considering improvements I have some suggestions, things I’d find quite helpful:
1. A setting for title maximum character count.
2. A string setting for connectors, or at least options (I.e., “ - “, “_”, etc)
3. Options to set an overall format, FIRST_AUTHOR - YEAR - TITLE, etc.
4. Options for authors, first author only, first plus second, etc
5. An option to either truncate final word at max character count, or omit partial final word
Also, some pdfs don’t turn up meta data, usually older ones in my experience. Could you render the first page to an offscreen buffer and use OCR to at least get the author and title? Or pull it out of the pdf if it contains text (as opposed to bitmap).
If the rename mode would also have a selectable mode that allows the user to approve or cancel each proposed rename, that would be totally awesome. You could consider allowing selection between multiple file name possibilities when there isn’t a clear winner, including one where you could make edits to the proposed new name.
It sounds like you may have something like 1 - 5 in the works already.
Thanks again for the useful tool.
http://spate-irrigation.org/wp-content/uploads/2017/06/Technical-Sheet_Geomembrane_bag.pdf
http://www.itnphil.org.ph/docs/How to construct a rainwater harvesting tank.pdf
https://cleancookstoves.org/binary-data/CMP_CATALOG/file/000/000/102-1.pdf
Any advice on how I can retrieve said data?