Retrieve Metadata false positive (ISBN in bib)

this PDF http://journal.r-project.org/archive/accepted/leeper.pdf
is falsely recognized as a book based on an ISBN in the bibliography. It's a relatively short paper (8p.) but the ISBN is on page 7 - I don't know how common that is, but it seems like we should - and should be able to - avoid that relatively easily
(4.0.18beta-r14)
  • For some reference here's the original logic behind MAX_PAGES = 7 though that didn't help in this case.

    I'll have to take a closer look if this is possible, but it seems to me that we would want to look at, say max of 7 pages or no more than... 3/5 of the PDF (whichever is smaller), but at least 1 page. We would just need a good way to get total number of pages in the PDF.
  • yes, that's exactly what I had in mind. I thought we do get the total no. of pages from pdfinfo?
  • We do. I just realized that we should also have the number of pages already in the DB.
Sign In or Register to comment.