Is there a buildin function of "find broken & duplicate attachment links"?
I have to combine some duplicate bibliographies in Zotero, while they all have correct PDF ¬e attachments of the paper.
I found that after the duplicate bibliographies were combined, the PDF ¬e attachments were all gathered together under the remaining bibliography. Now it's time for PDFs ¬es to be duplicate...And the storage file of Zotero is about 1GB now with 300 bibliographies...
So I use a third-party software to directly remove the duplicate PDFs in the storage file of Zotero. According to the size and CRC32 of the PDFs, most of the duplicate PDFs are happily deleted(For the same paper, the PDFs with different notes inside are remained due to different size and CRC32, but I don't know them). Unfortunately, now it's time for me to have a lot of broken attachment links...
I hope there could be a buildin function, which will:
1. Scans your storage for missing attachments and possible duplicates.
2. Directly delete the broken attachment links.
3. Let users to compare and choose, which version of PDF ¬e should be remained. Then delete the abandoned PDFs ¬es.
In fact, the broken attachment links are useless. We can download the PDF again only if we know which bibliography has no attachment. Meanwhile, the duplicate PDFs are really agonies, increasing the size of the storage file of Zotero and confusing the latest version of the PDF ¬e attachments. Just let the users to choose the right version of PDFs ¬es, please. This save time to develop the Zotero and allow users to rescan their notes and achievements.
I found that after the duplicate bibliographies were combined, the PDF ¬e attachments were all gathered together under the remaining bibliography. Now it's time for PDFs ¬es to be duplicate...And the storage file of Zotero is about 1GB now with 300 bibliographies...
So I use a third-party software to directly remove the duplicate PDFs in the storage file of Zotero. According to the size and CRC32 of the PDFs, most of the duplicate PDFs are happily deleted(For the same paper, the PDFs with different notes inside are remained due to different size and CRC32, but I don't know them). Unfortunately, now it's time for me to have a lot of broken attachment links...
I hope there could be a buildin function, which will:
1. Scans your storage for missing attachments and possible duplicates.
2. Directly delete the broken attachment links.
3. Let users to compare and choose, which version of PDF ¬e should be remained. Then delete the abandoned PDFs ¬es.
In fact, the broken attachment links are useless. We can download the PDF again only if we know which bibliography has no attachment. Meanwhile, the duplicate PDFs are really agonies, increasing the size of the storage file of Zotero and confusing the latest version of the PDF ¬e attachments. Just let the users to choose the right version of PDFs ¬es, please. This save time to develop the Zotero and allow users to rescan their notes and achievements.
@emilianoheyns Are the listed 3 functions critical and necessary? I think this kind of interaction logic is more urgent, but is it easy to implement at the code level?
https://github.com/retorquere/zotero-storage-scanner
I believe that this function is necessary, since the dispose of broken links to files and duplicates in zotero are somehow unfriendly to the researchers who hold lots of references.
P.S. My colleagues in my research team refuse to use the zotero. After my introduction of zotero, we all agreed that the software of reference management should be a tool with essential functions inside after we download the standalone version. For a newbie, they wish that they would just download the software (such as the cracked version of Endnote X8) and transfer the library to it. Then within 10 minutes they could continue their reference reading in the new software.
Now the software began to show its ambitions, trying to contain all in the field of the knowledge management, including personal blog and site, the data interface to the APIs, different programming languages. However, for researchers like students and professors, they just need to find the literature, read the literature, write the notes, and then insert literature into the MS Word as required by most of the publisher and conferences. The necessary functions and plugins in this work flow should be refined and integrated in the standalone version.
The development of the software should be focused on deeper and high-efficiency function for the specific clients. Look at the Endnote X8. You have no plugins to add, but you could leave all the problems of references and PDFs to it. It will prevent the failure of attachments. Your job is to search the keywords and writes notes about them——enough for authors.
A version that is compatible with all requirements is destined to lose both mild and core users, leaving only a few hobbyists interested in programming. Because of the uneven quality of the various plugins, a more focused software will take most of the non-professional users away when the enthusiasm is gone. I sincerely hope that the zotero could do better on the the standalone version itself. Thanks for your selfless efforts!
For instance, doing an advanced search for all PDF attachments in my library, I find 10047 items. Outside Zotero, if I locate all PDFs in the /storage/ folder, I find 11925 files. So over ~10 years, I have accumulated about 1900 stray PDFs. Not too big a worry in terms of file size etc. (especially since unliked stuff doesn't sync) but it would be great to be able to clean that up.
"
However, for researchers like students and professors, they just need to find the literature, read the literature, write the notes, and then insert literature into the MS Word as required by most of the publisher and conferences. The necessary functions and plugins in this work flow should be refined and integrated in the standalone version.
"
Zotero is VERY skilled at this. I hear complains about EndNote often at my research organization. The few people who use EndNote and Zotero (in the way it should be used), almost never go back to EndNote.
As for transferring libraries, I have heard EndNote is worse than Zotero.
However, let's focused on the Endnote X8, the version released in 2016. Here are some basic functions of it.
1. Delete the attachment when delete the bibliography
In zotero, we have to use the Zotfile as it is really a core plugin for the standalone version of zotero. PDF attachments added by Zotfile are shown as attachments which are linked to a PDF in the /storage/ folder. When I delete a bibliography from the Trash collection, the PDF attachment is not deleted.
In Endnote X8, all the attachments will be deleted when you delete the bibliography.
2.Auto index the PDF without failure
In zotero, some PDFs which are downloaded in the ACS publications could not be indexed. See my discussion https://forums.zotero.org/discussion/69771/unindexed-pdf-attachment-cannot-be-indexed-by-click-the-indexing-button-the-green-one#latest
In Endnote X8, the bibliography which are listed in my discussion could be correctly indexed and searched.
3. Find Duplicates and Broken attachment links
In zotero, the function of “Find Duplicates” is deficient and “Find Broken attachment links” is missing, as discussed above. The crude combination of the field of the duplicated bibliography creates a lot of duplicate attachments.
In Endnote X8, 3.1. Go to Menu - References - Find Duplicates. You will see a popup window which lists the duplicated bibliography one by one. You could remain and modify a version of the duplicated bibliography in the window, and then remove the other one. 3.2. Go to Menu - Tools - Find Broken attachment links. The Endnote X8 will search the stray links and remove them directly.
4. Highlight the words when you search them
In zotero, when you search some phrases such as “mass transfer coefficient” and “CO2 dissolution”, the words are broken up so that I cannot judge whether the search results are suitable. There is No highlight of the searched words in the listed result. I have to open them one by one to check.
In Endnote X8, when I search phrases such as “mass transfer coefficient” and “CO2 dissolution”, the software will highlight the listed results with yellow shadings on the phrases in the fields of the bibliography (even in the Notes field). The I could only read the references with a glimpse on the search results.
5. Insert citation in the standalone version.
In zotero, when I want to insert a citation into the MS Word, I have to copy the name of the reference and turn to the MS Word to search the name in the Quick Format bar. See my discussion https://forums.zotero.org/discussion/69318/add-an-icon-of-insert-citation-and-search-paper-more-efficiently#latest
In Endnote X8, I just click the Insert Citation button in the menu, then the bibliography will be inserted in the MS Word. The Endnote X8 just check the front-most document.
In general, we conclude the research hotspot with Web of Science, search papers in Google Scholar, download PDFs with Sci-Hub, read PDFs in Foxit Reader and write papers with MS Word as the templates are provided by most of the journals. The software all have low learning threshold and high integration degree. Now it is 2018. For zotero, as a reference management software, it lacks of some essential function comparing with its major competitors. Though the plugins are colorful, the developers’ enthusiasm for plugin update is ephemeral. In addition, too many users are fickle. A highly integrated standalone version is appropriate for most of the researchers. If they decide to insist in zotero in the future after trail, they no longer need to transfer the libraries among the software.
Better handling of identical files when merging is planned, but it's far from simple, because of the possibility of different metadata or different notes. And many downloaded PDFs are watermarked, so they wouldn't match anyway. If you put them "in quotes" then they'll only show up when they match as phrases.
Richer search results with snippets are planned but for technical reasons can't happen for a while. You certainly don't need to go to Zotero first and copy names. Just type the name of the thing you're trying to cite. If you prefer to browse by collection, you can use the classic view, or make that the default. As we say in the linked thread, a collection browser will likely be integrated into the Quick Format bar in a future version. Just to set your expectations appropriately, with the exception of a collection browser in the Quick Format bar, nothing you've mentioned in this thread is anywhere close to a high priority relative to other things.
Based on the current version, is there any methods to check that which file attachment is broken?
Or do I have any methods to return the PDF links created by the zotfile to the PDF attachment(the attachment file type is PDF)?