how to reduce the # files stored in web page attachments?
I may be a little confused about attachments vs snapshots.
Let's say I've found a webpage I want to store as a new item, but its full of ads and has some extraneous divs. I delete the unwanted divs using the aardvark firefox extension. Now I save the current page as an new item.
Then I click "view snapshot". The result is just what I expected. Then I go to "show file", and find that there are lot of files downloaded from the site that are not used at all.
As a second test, I stored a webpage without doing any manipulation, verified that the snapshot was fine, and then went and deleted most of the files in "show file", and got the same webpage back.
As a result, it looks to me like zotero is storing a lot of files that it shouldn't have to. Since I'm going to end up with 1000's of references, this could really add up...
plus it makes syncing between computers with foldershare impossible, since there's a 10K file limit.
Am I misunderstanding something? How can I cut down the # of files being stored in attachments?
Let's say I've found a webpage I want to store as a new item, but its full of ads and has some extraneous divs. I delete the unwanted divs using the aardvark firefox extension. Now I save the current page as an new item.
Then I click "view snapshot". The result is just what I expected. Then I go to "show file", and find that there are lot of files downloaded from the site that are not used at all.
As a second test, I stored a webpage without doing any manipulation, verified that the snapshot was fine, and then went and deleted most of the files in "show file", and got the same webpage back.
As a result, it looks to me like zotero is storing a lot of files that it shouldn't have to. Since I'm going to end up with 1000's of references, this could really add up...
plus it makes syncing between computers with foldershare impossible, since there's a 10K file limit.
Am I misunderstanding something? How can I cut down the # of files being stored in attachments?
It is great to be able to capture web pages so accurately when you need it, but it would be nice to have a "simple snapshot" option which, at a minimum, didn't download pictures...
Linux/Mac users might want to see this thread, which suggests a tool you can use to automatically create hard links (though you'd want to then be using a sync tool that was smart enough to copy the links themselves and not full copies of the linked files).
We usually recommend saving the print-friendly versions of pages when they're available.
My request is for a function that does step 5 of the following use case:
1) go to the webpage
2) make a new item from the web page
3) show the snapshot attached to the new item
4) edit the displayed snapshot/webpage with aardvark
5) take a new snapshot of the edited webpage using the firefox "save as complete webpage" functionality (except dump the files into the zotero tree structure, not the "web page complete" tree structure) and replace the original snapshot with the new one
The button for the new function for step 5 could be placed next to the add new item button if a checkmark in the preferences section was selected.
BTW, the reason the use case works this way (versus aardvarking a desired webpage and then adding a new item) is that the site translators must be applied to the original webpage, not the aardvarked webpage.
Scrapbook can effectively delete unwanted items on a page.
Note that there's no difference between the WebPageDump code and the built-in Save As Complete Webpage functionality in this regard. If you save an Aardvarked page from File->Save and look at the source, you'll see all the original elements. WebPageDump is just much better at saving pages accurately, so it grabs many more elements that would only be displayed under certain circumstances (which, unfortunately, often means ad content).
You can always save a snapshot of a snapshot if you want to edit pages saved via translators. You lose the original URL (replaced with a file:/// URL), but 1) it's often the URL of the parent item anyway and 2) as an archival tool, Zotero by design doesn't modify existing pages when saving highlighting and annotations, instead saving the data for those elements in the database and overlaying them on the page. Directly supporting modification of saved snapshots tied to an original URL seems a little problematic (even if you can in practice do this anyway by using Aardvark before saving a page)...
Have anybody an idea what I make wrong?
Welcome to the forums!!
Sorry, I don't have an answer for you, but...Since your problem is different from the one indicated in the thread title, I suggest you start a new thread with your problem. Otherwise people won't know that you have a different problem, and may not read your post.
Thanks for your attention and the suggestion. I will follow your advice.