Bizarre Indexing and Metadata Problems
I suspected that there was something wrong with the PDF indexing as the ratio of documents to words was very odd (4087 documents - each averaging over 10 pages - but only 308278 words indexed. So I decided to take a one page Word document that I had written (575 words), convert it to PDF and then import it into Zotero. By reflex, after I had imported the PDF, I requested a retrieve of metadata. Of course it should have returned failure as the document was one I had just written and it had never been published anywhere. Imagine my surprise when I was told that said document was a book written by Saul Bellow entitled Seize the Day. When I checked the indexing statistics, it now read 4088 documents (correct) but 308280 words (it claims to have added just two words out of the entire document. Can anyone give me a clue what is going on and how to work around it ? I'd really like to have my documents indexed and would like a clue how Zotero decided that my one pager had been written by Saul Bellow who fortunately is no longer around to take insult at my woeful writing skills.
If you provide a Debug ID for the Retrieve Metadata attempt, we can take a look.
I'm still confused on the word count issue. Before I posted, I went to the Oxford English dictionary and it said there were only 141,000 distinct words and a few tens of thousands of obsolete words. Together that does not come close to the 308,000 words the index summary says are there. Since you wrote the code you obviously know what is going on but I am curious whether you count words with and without pluralizing suffixes etc as one word or different words....
Thanks
Do you want me to continue submitting these as I find them or just wait until you have another update and see if they persist despite all the other things you are fixing ? I do not want to inundate you with related problems.
I think Zotero is fabulous and am moving my entire cache of references to it.
Feel free to post that one, and then maybe hold off until after the next release.
Thanks