Questions about renaming and metadata

vezione · October 8, 2023

As my library has gotten larger and larger alongside trying to manage my docs and database as I switch computers, I'm starting to get confused by the renaming options. I've searched for my question in the forums but don't feel any the wiser. A lot of my PDFs have been collected by way of assigned readings at university and have random file names and most needed to have the OCR (Zotero OCR plugin) treatment. Question 1: Does metadata get updated after a PDF goes through that process? Second Question: The number of PDFs I have with "No matching references" is astoundingly high. Can I use Zotero to fix this? I have some reference plugins, but I imagine either I'm doing something wrong or it's not a thing Zotero handles. Final Question: What's the difference between Zotero file renaming and using Zotfile to rename? I currently use Zotfile to name things by author, date, and title which I think is the default Zotero naming convention but I'm not sure. The only trouble I run into is when I have different PDFs of chapters within a book. If I rename the chapters, they all take on the name of the book and author. How can I make it clear that the name of the chapter should be the title and not the name of the book? I am thinking this is a metadata issue so everything comes back to making sure I have correct metadata. (Note: if I can automate the process, it would be great. Going one by one would take so long. I'm at the point when trying to figure this all out is taking up more time than me getting research accomplished.)

tim820 · October 9, 2023

1. OCR should have no effect on metadata.
2. What do you mean by PDFs with "No matching references" ? If you mean no parent item, you add that with right-click Create Parent Item:
https://www.zotero.org/support/attaching_files#child_versus_standalone_attachment_files
3. Since you have Zotfile enabled, the easiest way to see if the alternative renaming offered by Zotero (also) does what you want is to set 'Use Zotero to rename' in Zotfile Preferences under Renaming Rules. The Zotero (v6) renaming rules (default or modified) are described here:
https://www.zotero.org/support/preferences/hidden_preferences

If you create a separate item for each book chapter ('book section'), with the chapter's title, Zotfile will rename the chapter PDF file with the chapter title (with right-click Manage Attachments\Rename&Move if necessary). If you want some other file name, you can select the PDF title in the middle pane and edit it in the right pane (with 'Rename associated file' ticked).

In terms of time taken, different people approach that in different ways. Of course it's easiest if you get everything correct when items are first added. So it's best if you figure out what is the correct way to do things when you first start using Zotero, and not just 'wing it'. If you don't get things right then, you can then either devote the time to fixing everything at once later, or do so in smaller chunks (eg all the 'A' titles, then the B's etc). Or you can wait until you need to cite an item and then notice that you can't cite it correctly until it's fixed. ;)

adamsmith · October 9, 2023

1. OCR should have no effect on metadata.

In pure theory that's mostly true, but in practice, any PDF that requires actual OCR is likely older or from a book or similar source where CrossRef lookup, Zotero's primary data source for retrieve metadata, isn't going to work.

vezione · October 10, 2023

My usage at the beginning was really just me throwing things into Zotero to keep track of them so I didn't pay much attention with items when they were first added. That's since evolved as things began to get more complicated in my studies, but also realizing there's a lot of potential that I'm missing out on. There's probably no hope for the things in my library that I added before all this. Now, I take more when adding things, making sure they follow some kind of database structure.

1. In terms of OCR and metadata - it's correctly been pointed out that the items I need to OCR are older or just scanned pages that we're given to read. I don't have an easy time with reading so utilize text-to-speech and am constantly frustrated by how many things I required to read but the files can't be read - hence the OCR. I want to keep these things in my database for reference since I'm making notes and annotations in them. This is where the metadata question comes from. Let's say I have 50 articles or chapters per class that as part of assigned reading, usually about half are those that don't have metadata either because they're older or scanned. So, they just sit as files in my library. After running the OCR plugin, if I lookup metadata, sometimes there's a match and a parent item gets created. Most of the time it doesn't. Which leads to...

2. If I right-click on a standalone pdf file (Attachment) and "retrieve metadata from PDF" - a plugin task - it will either find a match and create a parent item or return "No matching reference." Now I get that it's telling me that the file doesn't have metadata info to make a parent item, and this is where time becomes an issue. It seems to me that my options are either to find a copy of that piece somewhere online and import that or create a new item and enter all the info manually. These are all assigned readings which I have to read anyway but now it's taking me more time to have to make them readable by a text-to-speech. I am building a flow to take all the annotations I make and bring them into Zotero with my readings - all in the goal of making doing coursework easier.

Is there anything to be done to make it easier to get those pdfs without metadata turned into an item? As soon as something doesn't work in this flow, I end up having to use different programs and tools and will never get them consolidated into one place so the work I put into it is for nothing. Ultimately, I don't think this should be my problem to deal with and professors should be providing accessible material - but we're not there yet and so I have to struggle along. That's the spirit of where my question is coming from. I don't want to add even more time to my workload but recognizing I have to right now, what's the easiest way to do it.

adamsmith · October 10, 2023

Is there anything to be done to make it easier to get those pdfs without metadata turned into an item?

Not really, I'm afraid. If you're able to get ISBNs for books, you can add items using those in the Create Parent Item dialog, but otherwise it's pretty much what you say under 2.