Shapshot feature does not work

lycopodiopsida · October 22, 2009

Snapshots does not work properly on articles from http://www.heise.de/tp/
To reproduce:

1) open some tp-article, f. ex. this one( http://www.heise.de/tp/r4/artikel/31/31330/1.html )
2) make a snapshot
3) try to view the snapshot

You will see something like that: GIF89a��÷ÿ�ÿÿÿ��ÿÿÌÿÿ™ÿÿfÿÿ3ÿÿ�ÿÌÿÿÌÌÿÌ™ÿÌfÿÌ3ÿÌ�ÿ™ÿÿ™Ìÿ™™ÿ™fÿ™3ÿ™�ÿfÿÿfÌÿf™ÿffÿf3ÿf�ÿ3ÿÿ3Ìÿ3™ÿ3fÿ33ÿ3�ÿ�ÿÿ�Ìÿ�™ÿ�fÿ�3ÿ��ÌÿÿÌÿÌÌÿ™ÌÿfÌÿ3Ìÿ�ÌÌÿÌÌÌÌÌ™ÌÌfÌÌ3ÌÌ�Ì™ÿÌ™ÌÌ™™Ì™fÌ™3Ì™�ÌfÿÌfÌÌf™ÌffÌf3Ìf�Ì3ÿÌ3ÌÌ3™Ì3fÌ33Ì3�Ì�ÿÌ�ÌÌ�™Ì�fÌ�3Ì��™ÿÿ™ÿÌ™ÿ™™ÿf™ÿ3™ÿ�™Ìÿ™ÌÌ™Ì™™Ìf™Ì3™Ì�™™ÿ™™Ì™™™™™f™™3™™�™fÿ™fÌ™f™™ff™f3™f�™3ÿ™3Ì™3™™3f™33™3�™�ÿ™�Ì™�™™�f™�3™��fÿÿfÿÌfÿ™fÿffÿ3fÿ�fÌÿfÌÌfÌ™fÌffÌ3fÌ�f™ÿf™Ìf™™f™ff™3f™�ffÿffÌff™fffff3ff�f3ÿf3Ìf3™f3ff33f3�f�ÿf�Ìf�™f�ff�3f��3ÿÿ3ÿÌ3ÿ™3ÿf3ÿ33ÿ�3Ìÿ3ÌÌ3Ì™3Ìf3Ì33Ì�3™ÿ3™Ì3™™3™f3™33™�3fÿ3fÌ3f™3ff3f33f�33ÿ33Ì33™33f33333�3�ÿ3�Ì3�™3�f3�33��ÿÿ�ÿÌ�ÿ™�ÿf�ÿ3�ÿ��Ìÿ�ÌÌ�Ì™�Ìf�Ì3�Ì��™ÿ�™Ì�™™�™f�™3�™��fÿ�fÌ�f™�ff�f3�f��3ÿ�3Ì�3™�3f�33�3��ÿ��Ì��™��f��3ÿÿÿ��!ÿADOBE:IR1.0Þí�!ù��Ø�,��±�;

adamsmith · October 22, 2009

yes, seems to be a general problem on telepolis - the snapshot saves a damaged html

fbennett · October 22, 2009

Odd. Maybe there's another reason for this, but the publishers may be attempting to sabotage attempts at storage here? When the page is downloaded with all of its prerequisites, two files turn up in the bundle with the name "1.html" (the page name at the end of the URL):
./tp/r4/artikel/31/31348/1.htmland
./ivw-bin/ivw/CP/tp/r4/artikel/wissenschaft/31/31348/1.htmlThe latter isn't an HTML page at all, but a GIF image consisting of a single pixel. Go figure.

Because the Zotero snapshot downloader places the prerequisites of the page in a single folder, without mangling the names to prevent namespace conflicts, the HTML for the page is being clobbered by this image, and that's what you see when you hit the preview button.

Looks like a short-term fix in the Zotero code might be to assure that the HTML of the page is written into the snapshot folder last, after the prerequisites derived from the page? That wouldn't address all possible problems resulting from namespace conflicts, but it would fix this one.

adamsmith · October 22, 2009

weird - it's a decidedly leftist and pro open-web journal - it seems bizarre that they of all people would sabotage storage. Could there be any other reason?

fbennett · October 22, 2009

It might be meant to step around some browser quirk (I seem to remember that these almost-empty GIF images were needed to get IE to behave correctly in some situations). No idea why the file would need to have an HTML extension, but there could be a reason. I'm 'way out of my depth beyond this point, so I should sit down now and stay out of the way while the grownups deal with this. :)

MarSraM · June 7, 2012

Was there any progress so far?

This issue kills quite a bunch of snapshots and hence deserves certain attention. A proper working snapshot feature would fortify zotero as a serious web reference tool - reliably producing URLs and snapshots. :-)

Zotero's snapshoter fails at the following page for example: http://www.h-online.com/security/news/item/LinkedIn-passwords-in-circulation-Update-1612022.html It's snapshot says merely "GIF89a€ááá!ù,@D;". :-(

aurimas · November 12, 2012

This should be fixed in Zotero 3.0.9

MarSraM · November 12, 2012

Thank you, aurimas! :-)