Pruning database, but cannot find pdf file size
I need to prune my database to bring it less than 6GB, but the listing contains no column for pdf size. I should start by deleting duplicates, but it is not clear if I can simply delete all the duplicates leaving the original intact? Next I can probably delete any really large files like books but I need file details. I'd like to keep any pdf upon which I have annotations. Help would be appreciated. Thanks.
Deleting and merging duplicates is not the same thing and merging will not remove duplicate attachments (items will keep both attached files to prevent you from losing, e.g., a PDF file with annotations).
If you've already merged the items, it's going to be tricky to now remove individual files other than by going into the items and removing them individually.
There's no way to show PDF file sizes in Zotero itself, but you can just run a storage analyzer app on your Zotero data directory and that will show you the largest files.
To the new user the help available is a steep learning curve. I didn't pick up the existence of the storage analyser. Even as I Google "Zotero Storage Analyser" now it scores no hit. Maybe its existence could be added to the "Tools" menu? If it depends upon Cloud connection, that topic is not clear.
It might now be easier for me to go back to my original pdf folders and start again from scratch and to find a way to import them all as a batch process which can be done while I'm asleep.
I guess my first wishlist item is for the user manual to be clearer on the issue of pruning, specifically to say whether the duplicate list does, or does not, include all the copies, to distinguish from my expectation that the "original" copy is not a duplicate. I searched on pruning and didn't find anything directly relevant, but what I came up with is that in Zotero duplicate management is achieved by merging. So I merged 6000 pdfs+duplicates laboriously, one by one, and that wiped out a day.
Secondly, I would wish that duplicate management is handled at the importing stage (which I mistakenly thought/hoped it was).
Thirdly I would request that the duplicate list shows which pdfs have annotations and could be preferenced to begin by deleting duplicate whose metadata is clearly incomplete, and/or duplicates without annotations. I have no knowledge of SQL, but in MATLAB I could scan my 50 libraries of pdfs, and try to look for duplicates that way. But that sounds like reinventing the wheel?
Fourthly, I would like a means of handling references (metadata no pdfs = 'vanillas') so that I can routinely record the existence of a paper, even if I cannot currently obtain a copy, and thus make a "to get list" from that.
Fifthly, I wish for a database which can handle unpublished material for which no pubmed metadata is available; I simply make my own.
If any of these wishes are already handled and possess links e.g. to the storage analyser which I didn't know existed, I would much appreciate links to the help file or to YouTube tutorials?
Thanks 1e+06 for your further comments.
As for clearing your duplicates, if you really just repeated bulk imports, you can sort by Date Added in the middle pane and delete entire ranges of items that way, though if you've already gone through and merged duplicates that wouldn't be as safe of an option.
Why does the Zotero online help instruct that the manner to remove duplicates is to merge files, if merging them does not reclaim all the storage being used? I laboriously selected one file of often three or four, not being able to look at the file during that selection process, not being able to tell if they had annotations.
Please recommend the most efficient means of dealing with this? Wiping my database and account and starting again from scratch? If so, what is the standard means of importing an existing database and checking each pdf to see if it already exists amongst those already imported?
Papership seemed to be the only way I could look at my pdfs stored in Zotero on my iPad. Is this not correct?
Appreciated.
Depending on how easy it is to re-import your existing items, wiping your database may well be the easiest way to go, yes. You'd do so by: 1) Disabling sync, 2.) Closing Zotero 3.) Moving your zotero data folder to a different location 4.) Restarting Zotero and 5.) Using "Restore to Online Library" from the "Reset" tab in the Zotero Sync preferences.
(There are other ways to do this; this is the safest and most painless one).
I don't quite understand the question about standard ways checking for duplicate PDFs. Is there any reason to believe you should have duplicate PDFs after importing?
(Also, some of the confusion you have seems to stem from a confusion about what Zotero _is_. It's a reference manager, _not_ a PDF manager, though it allows you to work with PDFs attached to references. But Zotero's model always thinks about references, never about PDFs (or any other file) as the principal unit)
However, Qiqqa became orphanware and other users chose to migrate to Zotero. In Qiqqa I had close to 10K pdfs organised in 50 folders devoted to separate topics for which many publications belonged to more than one topic but the system had no way of keeping just one copy of each, so there were hundreds of duplicates. The attraction for me to Zotero was my impression that it only needed one copy of any file, but each could referred to from multiple topic folders. (That would be my ideal but with Zotero I have evidently dipped out again).
Fortunately, I over the last 35 or so years, have (re)named all my pdf files systematically and uniquely, but without indicating the existence of annotations.
Last night I have merged all 50 folders into a single folder and largely pruned them down to a minimum set (alas losing their folder associations and many annotations).
Today I will save the original Zotero data folder to HD, and bring the merged data into the MacBook and Restore to Online.
Zotero seems to be a steeper learning curve than Qiqqa was. I've yet to try citing with it and I'm interested in what other wordprocessors can use it, since I dislike MSWord with a passion. Thanks.
Not sure why you think they don't. I think you're still conflating references and PDFs (e.g., a single reference can have multiple PDFs attached in Zotero. That's what you were seeing after merging duplicates). I don't quite understand why you did that, but hopefully not for working with Zotero, because that obviously wouldn't have been necessary.
Secondly, I'm still confused about duplicates. I'm always going to have the problems of duplicates, because I cannot always be sure whether I've previously loaded in a pdf. Is there a way I can configure Zotero to prune the database of duplicates of the main pdfs relieving the user of that worry?
Thanks.
Don't point your data directory to anything other than an empty folder.
Before you do anything -- what is your preferred outcome in terms of where files are etc.? As I said, Zotero is not a PDF manager. It gives you a fair bit of flexibility about how to store your PDFs and you should decide that first and then we can look at how Zotero can (or maybe cannot) help you with that.
Second -- the typical way to move from one reference manager to the other isn't to just dump your PDFs in. That's going to overall not give you great results, especially if some of them are older. Qiqqa should export to common formats like bibtex and RIS that Zotero can import, typically including the files.
Third, duplicates: The only thing Zotero does in terms of duplicates is to show you which _references_ (again, not PDFs) are duplicate and allow you to merge the _references_. If you're concerned about duplicate files, you'll have to individually delete them (how depends on the set-up chosen above).
Re-exporting these files is no longer an option. Qiqqa is defunct and no longer runs under the latest version of Windows. I had previously exported them as pdfs.
I now have a folder on my Macbk (running Parallels) under User:Me "Zotero" largely cleaned of duplicates size 9.81GB, for 7779 pdfs.
I have increased my quota to "unlimited".
Until I started today's reconfiguration Zotero and Papership were doing a reasonable job of allowing me to see and annotate these pdfs. I had not yet tried to use it to cite.
Since then I've tried to follow:
"Depending on how easy it is to re-import your existing items, wiping your database may well be the easiest way to go, yes. You'd do so by: 1) Disabling sync, 2.) Closing Zotero 3.) Moving your zotero data folder to a different location 4.) Restarting Zotero and 5.) Using "Restore to Online Library" from the "Reset" tab in the Zotero Sync preferences."
It seemed to me that your step 3) had to be run before step 2). But for me 3) is ambiguous. Do you mean literally moving the folder, or just changing the link?
Restarting Zotero I am confronted with how I disable Sync? There is no "disable Sync button". But I tried unchecking Sync Automatically. When I go to Reset tab I see "Library" with a blank box beside it without any suggestion of how it is to be filled in.
If it is my login OAERICLE and (since I have not unlinked my account) why is it not already filled in?
Next "Restore to On-line library". Restore doesn't mean much in the context that I don't know whether Zotero will try to match my filelist with existing entries and folder topics? This carries many implicit concepts which are obviously clear to the designers but alas not to me just wanting to re-import all the files in my folder. A little help box suggesting that this will overwrite its content is there is any. I don't care what it does. I'd just like to be able to search and read my files on my iPad.
I have now copied the original contents of Zotero folder to external HD and created a new folder "Zotero" in its place. Nothing sensible happens.
My difficulty is that skimming the help manual I do not see any general discussion of concepts, like why it places some files in folders and other not.
Please describe what I need to do to from scratch, including if necessary deleting my account, but keeping my credit. I'm sure I will be delighted once its working. Thanks very much.
You say that Zotero is a reference manager, but not a file-manager, but it will do some file management, but not including removal of duplicates.
Because of the explosion of scientific literature, this is going to be an ongoing problem for most users like myself. Any Google Scholar search these days throws up dozens of relevant articles, not just one or two, many of which may already be in the user's database. Over years one just cannot remember, so one downloads just in case. Preferably at the time of the search any manager needs to 1) check if an article already exists in the user's database and ideally, inhibit downloading the same file and 2) check if there exist annotations and flag that. So I guess this is a wishlist.
Meanwhile, Zotero has gone ToZero. To try to become operational again, I have acted to remove the duplicates in my list of files and need to start again, dragging my files into a new database. But it is not clear to me how I do that and Zotero storage is relatively expensive. Some more details about the options available in the sync dialog box would be helpful. Thanks much.
You asked about starting from scratch, so that seems like we're were we'd want to be.
I mean I have no functionality.
I need to understand the dialog boxes
Zotero issue: How do I tell if my Zotero Cloud holding is in fact complete? How do I tell how many pdfs are up there to compare with what shows on my macbook? I've submitted support queries, and received a deafening silence. Can I get a summary comparison of what Zotero thinks is in the cloud and on my MacBook?
I even uninstalled and reinstalled PaperShip and downloaded the lot as second time. It seems from the comments online that Papership has been orphaned. I wish I'd known that before paying so much for Zotero.
Zotero MUST have a large number of loyal users using iPads (in my case with top memory). So what does Zotero recommend as a Papership replacement for reading, annotating and resyncing with MacBook? Thanks much.
papership isn't actively developed or even supported anymore, so I'd be wary of it.
You can't easily compare local vs. cloud Zotero storage. The assumption should be that all your files are synced, though, and they normally are if you're not receiving sync errors.
You can test whether individual files are only by checking if you can open them through.