Multiple notes need deletion

I have a library of approx 2.5k references; duplication over time has generated many (many) repeated child-notes and URL links.

The number within each reference varies from 1 to 6, which ends up being over 13,000 notes! This generates an enormous drag on replenishing the database.

Has anyone encountered or solved how you can identify and delete these notes at once? Using an automated script or tag?

Many thanks
  • OK, with some further investigation I have managed to export the whole library **without notes** then set about rebuilding a new database.

    This was done generating a new folder and importing the same library. All folder structure and tags came back.

    BUT... there are two issues with this, the extensive handwritten notes peppered throughout the library that are now lost, and also the attached pdf files.

    Any suggestions on how to save these?
  • no, exporting/importing is a bad idea. Breaks old documents, breaks syncs, and you'll lose your original notes.

    Look at some of the notes and links you'd like to get rid of? Anything they have in common? Maybe a specific phrase (for notes) or a title (for links)? If you have a reliable way to search for them, we can give you instructions for how to delete them.
  • Hi Adam, thanks for the comments. No trouble, I have kept my old database separate in case it needs reloading.

    Yes, there are common threads; there are three main types; 1. upload dates, language type and generic author names. ie. 'Jun21', 'eng' or 'Joe B et.al et al., respectively. Can you suggest an automated function? There are thousands to purge.

    Many thanks for your help.
  • could you paste the full content of such a note? I'm still a little unclear of what they look like.
    Also, if the partent item has a URL, could you also post that? If not, title and Library Catalog of the parent item.
  • Example 1:
    Bansal-Pakala, P., A. G. Jember, and M. Croft. “Signaling through OX40 (CD134) Breaks Peripheral T-Cell Tolerance.” Nat Med 7, no. 8 (2001): 907–12.
    “1078-8956 (Print)Journal Article,” n.d.
    “1078-8956 (Print)Journal Article,” n.d.
    “Aug,” n.d.
    “Aug,” n.d.

    Example 2:
    Barouch, D. H., J. Kunstman, M. J. Kuroda, J. E. Schmitz, S. Santra, F. W. Peyerl, G. R. Krivulka, et al. “Eventual AIDS Vaccine Failure in a Rhesus Monkey by Viral Escape from Cytotoxic T Lymphocytes.” Nature 415, no. 6869 (2002): 335–39.
    “0028-0836 (Print)Journal Article,” n.d.
    “0028-0836 (Print)Journal Article,” n.d.
    “0028-0836 (Print)Journal Article,” n.d.
    “0028-0836 (Print)Journal Article,” n.d.
    “Jan 17,” n.d.
    “Jan 17,” n.d.
    “Jan 17,” n.d.
    “Jan 17,” n.d.
    “query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11797012.” Accessed September 10, 2012. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11797012.

    Example 3:
    Barton, G. M. “Viral Recognition by Toll-like Receptors.” Semin Immunol 19, no. 1 (2007): 33–40. doi:S1044-5323(07)00005-X [pii] 10.1016/j.smim.2007.01.003 [doi].
    “Barton, Gregory MReviewEnglandSeminars in immunologySemin Immunol. 2007 Feb;19(1):33-40. Epub 2007 Mar 2.,” n.d.
    “Eng,” n.d.
    “Feb,” n.d.
    “Nlm,” n.d.

    I'm happy to keep the main reference and URL at the end to quickly refer online; it's really the multiple notes in between that are the problem. They don't contain any more text than what you see above.

    Many thanks for your help. Let me know if you need more detail
  • How utterly bizarre. Could you also answer the question about where those came from? I.e.
    Also, if the partent item has a URL, could you also post that? If not, title and Library Catalog of the parent item.
    That may both help to get you a better search strategy and to hopefully fix where all those garbage notes are coming from in the first place.
  • Hi Adam, my thoughts exactly.

    Sorry, I don't know how to provide you with that information. Do you mean pull down links from Zotero online?

    Many thanks
  • Or do you mean this?

    Example 1:
    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11479622
    Example 2:
    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11797012
    Example 3:
    http://www.ncbi.nlm.nih.gov/pubmed/?term=10.1016%2Fj.smim.2007.01.003
  • What I mean is this:
    The way I understood you, the notes you have above are all attached to/grouped under a regular Zotero item (or rather 3 of them -- one for each example), what I' call a parent item (if I had spelled that right the first time...). That's correct, yes?
    What I'm curious about is what is in the URL field of those parent items.
    And if that's indeed what you posted right above, what's the date added for those?
    And to be quite sure I'm looking in the right place, what does it say under "Library Catalog"? With that, I should have some good suggestions on how to at least filter out &delete most of the notes.
  • I see. That is correct, I understand what is a parent item. The answers to your questions are as follows:

    Example 1: 8.8.2011
    Example 2: 9.10.2012
    Example 3: 9.13.2011

    So, the dates on the notes don't bear relation to when the notes were generated. In fact, for Example 2, the note “0028-0836 (Print)Journal Article,” n.d." which appears in quadruplicate was generated at three times. 8.8.2011 @ 7.02pm and 12.21am and 9.10.2012 twice @ 10.55am.

    In fact, there are other notes from parent items not mentioned here that say 'October 14, 1996'. Which predates any of this work.

    Just checked under library catalog and the field is empty for all three examples. I think it is empty for all of my references.

    I tend to use PMID or DOI in the wizard to import references. One of the most useful tools I have found in Zotero.
  • Now this is interesting!

    Just sorted by 'Library catalog'.
    1. No entry = 2086 references with duplicate notes everywhere
    2. NCBI Pubmed = 292 items with hardly any duplicate notes
    3. Google scholar = 18 items with no duplicate notes
    4. CrossRef = 105 items with five parent items with duplicate notes.

    Overwhelming majority of duplicates come from parent items without a Library catalog entry
  • FYI the three dates I mention for the notes above also apply to the URL
  • You never imported some other way, did you? Like a file from another reference manager?

    My best guess would be those all come from PubMed via PMID, but that should still say so in the Library Catalog (as it does when you try now).

    Anyway -- doesn't look like this is something that would still occur. If you find more recent examples of this (like in the last 3-6 months, I'd be curious about those.

    But as promised, to the constructive part:
    Create an advanced search (magnifying glass) with two search conditions at match all at the top:

    Item Type -- is -- Note
    Note -- contains -- n.d.

    Then click "Create saved search" and create the search.

    The saved search will appear at the bottom of your collections. Go into it, click into the middle panel and then select all via keyboard shortcut (ctrl+a /cmd+a on Mac). Note how this only selects the notes, not the items they're attached to. Now use right-click ---> move items to trash to delete them all.

    Note that Zotero may freeze in the process. Don't worry & just let it do its work -- may take up to 30mins, though hopefully much faster.

    Then immediately empty your Zotero trash.

    Repeat for other relevant search terms.

    Let me know if that works and/or if you need help with anything else.
  • It is possible that I have imported from another reference manager ie. Endnote (urgh), as they are all around the 2011 mark.

    Now, that search term is a good suggestion, however, 'n.d.' appears only after I copy and paste the item for you. Note, it is outside the parentheses. Therefore I will need to generate multiple searches for 'eng, 'nlm' and for each month of the year. Does that sound correct?

    That should whittle it down by a few thousand....
  • The other question I had was about which database to use. It is all pulled down from Cloud storage by default, correct?

    So, I should use my local database to upload as the master copy, correct?
  • Ah, too bad, didn't realize how you'd created those.
    And yes, I think our old EN importer may have done just that.

    So then yes, months and language codes (or whatever that is) would be your best bet -- set the saved search for match any (i.e. an OR search) and search for Note -- contains -- nlm etc.
  • Not sure I understand the question about database:

    Just do this locally and then sync -- Zotero will do everything correctly automatically.
  • Thanks for that,

    I only ask because the notes still exist in the cloud, so if I haven't deleted them there won't they just reappear?

    I thought that all standalone software's, be that Firefox or the Mac app, would use a reference sql database?
  • not quite -- they use a local sqlite database and then sync that to the server. But the sync function is smart -- it logs what you delete and then deletes it remotely (and vice versa). (Remember to empty your Zotero trash, though. Otherwise they items stick around there).
  • Ok great, thanks very much Adam. I'll get cracking.

    Appreciated.
  • OK, sorry one more thing Adam. I have conditioned the library to a point where i'm happy.

    Really need to be careful with the next step.... I hit sync after trashing the extra notes. The standalone warned me there were many instances (all 2340) where the local copy will be deleted by the server copy. AAhhhh!!

    I hit cancel sync.

    So there is a reset button... Preferences > Sync > Reset.

    'Erase all server data and overwrite with local data'.

    Just to confirm I press that button to make our hard work pay off?

    Many thanks
  • I don't understand why that would happen, but yes, if you're happy with the library the way you have it locally and there's nothing on the server you need that's not present locally, go ahead and use that option (you may want to make sure to have a backup of the database just in case:
    https://www.zotero.org/support/zotero_data#backing_up_your_zotero_library , though I don't expect any issues).
Sign In or Register to comment.