Proposal: Parallel PDF Finding for Large Libraries

Hi all,

I have a library with ~12,000 items, many missing PDFs. When I run "Find Available PDF" on a large selection, it processes items one at a time, which takes hours. I've been experimenting with a patch to `attachments.js` that allows concurrent requests (defaulting to 3 parallel), and it dramatically speeds up the process while still respecting the existing per-domain rate limiting.

## The Problem

In `addAvailableFiles()`, items are processed sequentially:

```javascript
processNextItem(); // starts ONE item, waits for completion, then next
```

For a library with thousands of items missing PDFs, this means waiting 1+ second per item even when they're from different domains that could be queried in parallel.

## Proposed Solution

Start N items concurrently (configurable, default 3):
```javascript
const MAX_CONCURRENT = Zotero.Prefs.get('findPDF.maxConcurrent') || 3;
for (let i = 0; i < MAX_CONCURRENT; i++) {
processNextItem();
}
```

The existing per-domain rate limiting (`SAME_DOMAIN_REQUEST_DELAY`, `MAX_CONSECUTIVE_DOMAIN_FAILURES`) remains intact, so we're not hammering any single service. We're just allowing requests to *different* domains to happen in parallel.

## Testing

I've tested this with:

- 1000+ items in a batch
- Various concurrency levels (3, 5, 10)
- Verified Unpaywall/Sci-Hub rate limits are still respected

With 3 concurrent requests, processing is roughly 3x faster with no increase in failures.

## Questions for the Team

1. Is this something you'd consider for Zotero core?
2. Should concurrency be user-configurable via hidden pref, or a fixed reasonable default?
3. Any concerns about UI responsiveness with concurrent progress updates?

Happy to submit a PR if there's interest. Thanks for considering!

---

*Note: I can also share the full patch diff if helpful.*
Sign In or Register to comment.