Create item from current webpage feature in Zotero for Firefox

VanjaAvdic · October 3, 2013

Hello,

I conducted a number of Google searches for a project and saved the search result pages via the "create item from webpage" feature in Firefox. I wanted to do all the searches in the same day so that my results were not affected by the day-to-day changes that can occur with Google results.

I saved each result page (i.e. Pg. 1, Pg. 2 of however many results pages there were) with the "crate item from current webpage" in my Zotero library and then went back to sort through them. For the pages that I saved first I was able to access the links that the Google results returned but for the pages that I saved later, none of the links work. I.e. I can see the Google results page, like Pg. 10 of 40 for the search that I did, but none of the links work anymore.

I am just wondering if I went about this incorrectly, i.e. the saved page would just be the search result page and not have access to any of the links. And if this is the case, how come the earlier ones worked (I did the searches over the course of two days, and the ones from the 2nd day are the ones with the broken links, I did not change any settings over the course of the two days). When I click on a link that no longer works the page that loads has this message "Attachment id is not an integer". If I select "view online" a webpage with working links loads but they are different from the ones in the snapshot version.

My goal here is to conduct the searches in the same time period and then analyze the results later because if I analyzed as I went, the searches would end up being done over a long period of time. If you have any recommendations as to how to do this, that would be great. Would I just have to take the extra time and conduct the search, click on each link and save those instead of the search result page? Bookmark in Firefox instead of Zotero?

Sorry about the long description! Any advice will be greatly appreciated.

Thanks,

Vanja

aurimas · October 3, 2013

I am just wondering if I went about this incorrectly, i.e. the saved page would just be the search result page and not have access to any of the links.

When you "create item from webpage" you are telling Zotero that you want to create a bibliographic (citable) item that refers to the current web page you have open. Zotero also attaches a snapshot of the webpage. It does not attempt to follow all links from the page and save those pages as well, so if the links go dead on the server, they will not work from the snapshot either. Though I don't think that this is the case here.

When I click on a link that no longer works the page that loads has this message "Attachment id is not an integer". If I select "view online" a webpage with working links loads but they are different from the ones in the snapshot version.

This suggests that there is an issue with the way web page snapshots are created. I will take a closer look.

In general though, besides being a one-click solution, you do not gain anything by using Zotero for this (I think). You can just as easily (and I think in this case bug-free) save the page you are viewing via the browser's Save As function (Ctrl + S on most browsers).

VanjaAvdic · October 3, 2013

Thanks for the quick reply, much appreciated!

Vanja

adamsmith · October 3, 2013

my guess would actually be that this has to do with the way google links work - if you right-click on the search results and select "copy link location" in a google search,

you'll see that the links actually go through google - I don't actually know what they're doing, but I wouldn't bet on those links being stable.
There are specific tools that will allow you to download webpages with every link one level deep, depending on what you're doing, those might be better suited for your task then Zotero.

aurimas · October 3, 2013

That's true, but it looks like in the snapshot, the links become relative links, which breaks it even more. I can reproduce this on my end and it certainly should not be happening.

aurimas · October 5, 2013

Nvm, there is no bug in Zotero.

=== Technical details ===

The way Google results work is that they have proper links to search result pages, but the links also contain onmousedown events, which rewrite the href value with the URL that redirects through Google. So by the time the link is followed (essentially the onmouseup event), the URL changes and you're going through Google's servers. This way you can right-click copy links and get proper URLs, but clicking on them sends you through Google. The javascript that rewrites the links uses relative links, which is why this breaks.

=== End of technical details ===

There is nothing that we will be able to do about this.

Edit (fixed description above): so basically what adamsmith said. I just didn't catch it at the time. I was opening pages in new tab, so my links were being rewritten and I was seeing the relative URLs.