normadize

About

Username
normadize
Joined
Roles
Member

Comments

  • Added DOI detection, but right now it's only showing the dx.doi.org url -- it's not parsing the redirected page. Should be ok as it's mostly a proof of concept. Same url: https://www.dropbox.com/s/7xwuf3vemtlhhsf/pdf-meta.sh Execute without argume…
  • I updated the script to do 3 searches, but I have not implemented any DOI searches as I've been too busy. It is now fetching the bibtex and displaying it the console for all 3 searches. https://www.dropbox.com/s/7xwuf3vemtlhhsf/pdf-meta.sh There w…
  • I don't think GS likes wget's user agent, and I think it also wants a cookie from you, but you could probably do this. It may be easier to modify recognizePDF.js at this point, especially if that's your eventual goal. You can use mozISpellCheckingEn…
  • No problem, I'll implement 3 searches (from 3 different text chunks to obtain 3 different sane search strings) and also look for a DOI so a comparison is more pertinent. What do you mean by automating GS lookup? Currently the script calls firefox t…
  • I agree that it's possible that other characters could cause problems. If we find that your approach improves our ability to detect PDFs without increasing the false positive rate, we should certainly change the current implementation. My concern is…
  • Updated script: https://www.dropbox.com/s/7xwuf3vemtlhhsf/pdf-meta.sh Syntax is: pdf-meta.sh article.pdf I'm curious to hear your results for pdfs for which Zotero fails. execute without any arguments to see more options. It should work on most sy…
  • @adamsmith, @Simon: Ok, I assumed we wouldn't rely that much on what GS suggests as publishers. I still think Zotero should have a ranked list but not necessarily defined by the user (can come with Zotero) of publishers to try successively, based o…
  • Use -layout. I just corrected my childish error above. It's actually -raw. Using -layout cause double column to appear also double column in text which breaks semantics when fetching a text line. I have my doubts regarding whether your other changes…
  • It does have a few quirks: it concatenates lines itself, but not always, it depends on the document. This ends up sometimes with a big standard deviation in line length as text lines can be very long but few compared to the lines with garbage text f…
  • As far as I can tell, you've been discussing searching an alternative database using the results from Google Scholar. My suggestion is to follow links from Google Scholar to other databases for which we have translators instead of using a second sea…
  • @Simon: I posted above a link to a pdf which pdftotext produces no output at all for me: this one Binary encoded -- I meant binary charset. If you're using linux, try "file -bi filename.pdf" to get the mime type. You'll see that for the pdf I linke…
  • From how your proposal developed - if I understood your first posts correctly you hadn't initially envisioned using google scholar for this at all, e.g. - you got something out of this discussion that was helpful to you as well, so I really do wish …
  • adamsmith, you're sorry but label me hostile. Some more valid logic is that if that was the case, I'd surely waste time offering suggestions and possible solutions despite channel noise, e.g. http://forums.zotero.org/discussion/23748/automating-mass…
  • Please point me to where Rintze described that full algorithm that I just did. I can't see it. You first said it should work in general and not for specific cases like mine. I provided an idea and an algorithm exactly for that. Now you seem to …
  • @aurimas: I also pointed to CrossRef's PDF-EXTRACT tools for content extraction. Here's a technique that I just tried and worked surprisingly well. It's borrowed from a former colleague, Stephen Kell, who wrote a set of quite nifty scripts a long w…
  • I mean: How does the user tell Zotero s/he wants to search for something? I have absolutely no sense of the GUI vision behind all this yet. I made several suggestions in this thread about that: a) a search box (you said it's too much work), b) a bu…
  • Well, I have a ton of pdfs I'd like to import and start using. You guys can take your time. > - do we take people to the search result > page (aurimas' solution), or auto-import > the first item? I think both myself and auri…
  • Brilliant!! Thanks very much. p.s. are you always this active on these forums? I'm impressed.
  • @aurimas: that's exactly what I suggested in a post above yours when describing the HTTP GET request. Virtually all functionality is already there in Zotero so when adding a PDF and then clicking on "import metadata for pdf", the user could select w…
  • > windows likes to mess with read only tags, though No, it doesn't. However, I apparently do ... it was my fault as the same pdf was open inside the browser by Zotero (maybe zotero could make FireFox release the file handle in that case when…
  • Thanks for the code entry point. How about the source code entry point for the "Retrieve Metadata for PDF" feature (from the right-click menu of a drag-and-dropped PDF). I know you pointed me to the dev list, but would same me some time to reex…
  • This might be one of those "leave the choice to the user, don't make it for them" issues. I think the title should be used and then Zotero would present the user with a list of results and checkboxes, just as it does when clicking the icon on a sear…
  • > something like - you right-click on the title and select > "complete using IEEEXplore" - that might even allow users > to get the PDFs. That's pretty much what I was suggesting above (the 2nd suggestion) although I was talking a…
  • But you'd only have to send the search string to the online publisher/database search engine, exactly in the same way you'd make it with a browser, i.e. Zotero would send same HTTP GET query as if it was from the browser on that webpage (*). The ret…
  • > You can't easily define another search engine, no. > And even if you could - what would you use?"" Well, sadly, like most researchers, I too have a narrow field of research so I would use IEEE Xplore which has most of the papers for …