How to Index National Standard/Regulation PDFs?
I understand that Zotero is primarily designed as a literature management tool, mostly for scientific papers. However, I also expect that it could be used to organize other types of published documents in standard formats, such as national standards or regulations.
In my work, I rely heavily on Chinese national standards, and it would be extremely helpful if Zotero could manage my large collection of such files. Unfortunately, Zotero currently seems unable to index these PDF files, which prevents it from generating the .zotero-ft-cache needed for global searching, even though the PDFs themselves are text searchable.
Here is an example of such a file:
https://drive.google.com/file/d/1MrB0O5ODbeIrr7yP6ZMALd2DZXZeWuw9/view?usp=sharing
I would like to ask whether support for indexing this type of file might be possible in the future, or if there is an add-on that could help to do it?
Thanks!
In my work, I rely heavily on Chinese national standards, and it would be extremely helpful if Zotero could manage my large collection of such files. Unfortunately, Zotero currently seems unable to index these PDF files, which prevents it from generating the .zotero-ft-cache needed for global searching, even though the PDFs themselves are text searchable.
Here is an example of such a file:
https://drive.google.com/file/d/1MrB0O5ODbeIrr7yP6ZMALd2DZXZeWuw9/view?usp=sharing
I would like to ask whether support for indexing this type of file might be possible in the future, or if there is an add-on that could help to do it?
Thanks!
The first step for metadata retrieval would be to identify a reliable source for that metadata - there would be further requirements after that, but this one is essential (and not always possible for standards, unfortunately). Perhaps you can list the original source of the PDF, so that someone could take a look?
Yes, I understand that the metadata of a document can be obtained either from the website (translator) or directly from the PDF. For example, metadata for Chinese national standards can be found here:
https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=5CD35A41D3B485E0F073F68EF987246D
and the Jasmine add-on is able to recognize this information. At the very least, I can even enter the metadata manually.
However, this does not solve the problem of generating the .zotero-ft-cache required for global searching. When I right-click the PDF and select reindex, nothing happens. From what I’ve searched, Zotero uses pdftotext (?) to generate the index for PDFs. Do you think there’s a way I could manually generate this?
Thanks!
Using pdftotext, I can indeed extract some text from your PDF, but it isn't the only place where something could go wrong. I hope someone with a better knowledge of that specific functionality can offer more assistance.