How to Index National Standard/Regulation PDFs?

I understand that Zotero is primarily designed as a literature management tool, mostly for scientific papers. However, I also expect that it could be used to organize other types of published documents in standard formats, such as national standards or regulations.

In my work, I rely heavily on Chinese national standards, and it would be extremely helpful if Zotero could manage my large collection of such files. Unfortunately, Zotero currently seems unable to index these PDF files, which prevents it from generating the .zotero-ft-cache needed for global searching, even though the PDFs themselves are text searchable.

Here is an example of such a file:
https://drive.google.com/file/d/1MrB0O5ODbeIrr7yP6ZMALd2DZXZeWuw9/view?usp=sharing

I would like to ask whether support for indexing this type of file might be possible in the future, or if there is an add-on that could help to do it?

Thanks!
  • edited 4 days ago
    By "indexing" I suppose you mean retrieve metadata? Because Zotero can definitely accept standards as references, and fulltext indexing should work in general.

    The first step for metadata retrieval would be to identify a reliable source for that metadata - there would be further requirements after that, but this one is essential (and not always possible for standards, unfortunately). Perhaps you can list the original source of the PDF, so that someone could take a look?
  • (It is about indexing PDF to search the full text)
  • (right. .zotero-ft-cache, I should have recognized that! My bad)
  • Hi Aborel,

    Yes, I understand that the metadata of a document can be obtained either from the website (translator) or directly from the PDF. For example, metadata for Chinese national standards can be found here:
    https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=5CD35A41D3B485E0F073F68EF987246D
    and the Jasmine add-on is able to recognize this information. At the very least, I can even enter the metadata manually.

    However, this does not solve the problem of generating the .zotero-ft-cache required for global searching. When I right-click the PDF and select reindex, nothing happens. From what I’ve searched, Zotero uses pdftotext (?) to generate the index for PDFs. Do you think there’s a way I could manually generate this?

    Thanks!
  • Sorry, as pointed out by poettli I had misunderstood your question. I can indeed reproduce the problem, but at this point I don't see a cause.

    Using pdftotext, I can indeed extract some text from your PDF, but it isn't the only place where something could go wrong. I hope someone with a better knowledge of that specific functionality can offer more assistance.
  • I can reproduce the issue in Zotero 7.0.24, but it works fine in the Zotero 8 beta.
Sign In or Register to comment.