my Zotero don't have this function "PDF Indexing"

edited December 19, 2018
i can't find "the PDF tools" in the "Search"
and i had downloaded the Xpdf project, but nothing have changed
my computer system is WIN10,and i have reinstall Zotero , it made no sense
my email address [removed — D.S.]
  • The PDF tools are now bundled with Zotero, so there's no need to install them separately.

    Where did you see information that made you think you need to install something?
  • there can't upload picture i will sent your email for help thinks
  • Or post the picture to Dropbox or similar and give a link in this thread
  • Right, that second picture is of a very old version of Zotero. The current version bundles the pdf tools automatically, so there is no need (or option) to manually install them. What makes you think you need to do anything manually here? Is something not working for you when you are trying to use Zotero?
  • thanks, now , i have found that the Zotero can't support chinese PDF for Indexing perfectly. But it can support English PDF very well。 So it might be misunderstand by myself
  • Can you please just describe in detail what you are trying to do and what is happening? You are trying to index a Chinese PDF and having problems? What exactly is happening?
  • My English is not very good, so i may not describe in detail perfectly. Sorry for that. When I use Chinese PDF, it can't automatically update the index(such as title、the author) like English PDF. (Maybe the Zotero can't use Google Scholar to update ) And there is another problem. When i insert a citation there didn’t show footnote or Endnote in the word. I do not know why。 https://docs.google.com/document/d/1RmWYiP37NeW8zIOaLh8rmHA9GMtP76CZk2Xbl_4b56o/edit?usp=sharing
  • Okay, so what you are describing is not "indexing" at all. "Indexing" refers to making the full text of the PDF searchable by Zotero.

    What you are referring to is "Retrieve Metadata from PDF". Zotero doesn't use Google Scholar for that at all anymore. However, if the PDF doesn't contain searchable text or Zotero can't find information about that PDF online, then it won't be able to find metadata for the PDF automatically. In that case, you can right-click on the item and choose "Create Parent Item", then enter the information manually.

    Note that using Retrieve Metadata from PDF is NOT the best way to import items into Zotero during your daily usage. Instead, it is best to use the Save to Zotero button from your web browser toolbar: https://zotero.org/support/getting_stuff_into_your_library

    Regarding the second issue, the final image you linked to shows that you are using an in-text (Author Date) citation style, not a footnote/endnote citation style. Select a style that uses footnotes/endnotes (e.g., Chicago full-note) and you will both see the option to choose between footnotes or endnotes and will see citations inserted as footnotes/endnotes.
  • @bwiernik Threre are two issues for retrieve metadata from Chinese PDFs. For one thing, Chinese PDFs usually don't have valid metadatas for they mostly converted from ms-word(by editor maybe?). For another, decoding with GBK is more properly than with utf-8 for Chinese PDFs(I don't know why, but generally it's).
    I know collecting items from web-page is the most efficient way for zotero, but drag-and-drop is quite easy for everybody, especially the people coming from other softwares.
  • edited January 20, 2019
    Zotero doesn’t use metadata embedded in the PDF at all. It uses the text from the PDF to search for metadata online.

    What software are you coming from? Dragging PDFs is similarly not the best method for importing from other software in many cases.
  • @bwiernik Yes, but "text form PDF" is related to the process of fulltext indexing, and this process used utf-8 as default decode method in the past(when the user have to download `pdfinfo` and `pdftotext` binaries, and It seems to bundled with standalone client), and almost probably until now.
    And the `pdftotext` has limit this process to utf-8, without any modification manually(adding additional files and modifying the command).
    Therefore, the full-text Zotero get from Chinese PDFs will lack some information, for decoding with utf-8 other than GBK.

    I have use Zotero for about 5 years, and almost all my items were generated from web translator, therefore, I can be treated as a native user? Definitely, I agree that drag-and-drop is not the best method, but at least an easy method in my opinion. However, it have some disadvantages, Zotero still enhanced retrieving metadata process in the recent past. It seems that Zotero still values the method of dragging PDF and retrieving metadata.

    Zotero now use its own web engine for metadata, so I really hope it can support Chinese PDFs(which have to be search at cnki.net and wanfangdata.com.cn).

    At last, supporting PDF full-text indexing with GBK and Chinese metadata engine will benefit a lot of Chinese researchers.
  • The bundled pdftotext is provided with all the necessary encoding mappings including GBK, and should be identical to the regular Xpdf pdftotext. We use UTF-8 only for the pdftotext output, which is for internal usage only, and shouldn't have any effect on PDF indexing or search. Though it's possible that before bundling pdftotext and pdfinfo into Zotero, there were issues with encoding for some languages.

    Interesting what exactly information have you found missing in the full-text content extracted from Chinese PDFs?

    Can you provide PDF examples?

    Regarding our PDF metadata recognition service, there are some limitations for non-Latin script languages, which is necessary to reduce the probability of incorrect metadata. But if there is a DOI in the PDF, it should be recognized.
  • @martynas_b Thanks, I can't recurrent the case that GBK works better than UTF-8, I test 2 Chinese PDFs files, and the two methods return the similar full texts. The Chinese PDF I used to convert to text, may be too old.

    As for the DOI, both cnki.net and wanfangdata.com, have valid DOI, but theire DOIs are not work for Zotero.

    for example in http://www.doi.org.cn/portal/index.htm (belongs to wanfangdata.com)

    http://dx.doi.org/10.3866/PKU.WHXB201112303

    I think their DOI lacking some apis, it isn't zotero's issue.
  • Yeah, it seems the DOI isn't registered in any of the APIs we use. Although the metadata can be retrieved from the publisher web page linked by doi.org. Maybe we should start doing that, if all other DOI search APIs fail.

    (And it looks that we are already loading the publisher website for this DOI, but the Airiti translator only cares about Airiti pages https://github.com/zotero/translators/blob/master/Airiti.js#L150 )
  • (this is an ISTIC, not an Airiti, registered DOI.
    curl "https://api.datacite.org/prefixes/10.3866"
    I've looked for an ISTIC API but couldn't find anything)
  • I mean the Airiti translator is one of the default DOI search translators, which loads doi.org for all DOIs that are failing with other search translators. And this is kind of inefficient.
Sign In or Register to comment.