Dissociate PDF
Hi,
I use "metadata retrieval" to find data for many pdf. But in certain cases, data is completely false. I'm searching how to make the correction. If I delete the entry, it throw also the pdf. How I can dissociate the pdf from the metadata attach to it?
Thanks
Mel
I use "metadata retrieval" to find data for many pdf. But in certain cases, data is completely false. I'm searching how to make the correction. If I delete the entry, it throw also the pdf. How I can dissociate the pdf from the metadata attach to it?
Thanks
Mel
If this happens a fair amount we'd be interested to know about the PDF (e.g. a link if it's somewhere online) - retrieve metadata is designed to have a very low rate of false positives.
Sorry for long delay but I had many files and I was blocked a lot of times by google scholar...
So from my 2000 pdf, I get :
- many which was not found, but they are too old, not indexed, or always from the same journal (J Hepatol) which have a really poor metadata extraction (also tested by Jabref)
- Those 5 pdf with various errors; not the good author, title or journal: http://ubuntuone.com/42q7ZShd6NAgQAm9fSFMaX
Thanks for drag tips, I never saw that before.
Mel
- Groeneweg et al Hepatology 1998 28.pdf - picks up an ISBN from the bibliography. Maybe restrict ISBN more?
- Holroyd & Overdyke J Neuropsychiatry Clin Neurosci 2012 24(3).pdf
This is just bad luck. They published two nearly identical articles _and_ google scholar's data for the first one is flawed. Not much we can do about that.
http://scholar.google.com/scholar?q="had elevated ammonia. This occurred in" "visual perception, construction," "treatment of dementia with behavioral" "of VPA in this population, potential side" "this retrospective, chart-review study, all patients" &hl=en&lr=&btnG=
- Mike Garcia Mdel Ann Hepatol Jun 10 Suppl 2.pdf
This works correctly, just doesn't get complete data, because google scholar doesn't have anything beyond author and title. Nothing to be done.
- Filippini et al Reumatismo 2002 54(2).pdf
gets everything but the year right for me (and uses English title). Year comes from google scholar, but generally I think that's pretty good.
- Jalan J Hepatol 2010 Sep 53(3).pdf
is a review of the study Zotero retrieves and contains large amounts of verbatim text from the original study. I doubt we can do much about that.
So I'd call this three false positives. The first one I think we might be able to avoid, which would bring us down to two. In either case, we're looking at false positive rates in the range of .1-.25 percent, that seems pretty reasonable to me.
How about first x pages _and_ first 25%?
Generally super-short books are likely very rare. My concern would be
1. Items where Zotero just isn't able to index much of the text, but does find the ISBNs
2. Reports or other potentially short items which may get an ISBN
both of these are unlikely scenarios, but not sure how much.
If you meant text, then maybe we can bump the number of extracted pages to, say, first 10 and then only search first 50% of text. That should skip references in most cases.