Unexpected attached files [was: Attached files replicated, substituted, renamed. Possible sync bug]

I have a situation where the attached files for an item are corrupt. First, if you look at this screen shot http://www.b3sz.com/files/zotero-screen1.png you'll see a few things:

1) There should only be one attached file but Zotero shows 33.

2) The name of the file attachment is "ACM Full Text PDF" which doesn't match the file name. (I only recently found you could rename a file - more on this later.)

3) A search of zotero storage does find 33 instances of a file with this name, each in a different storage directory. (Like storage\4FVMI6MM\Landry - 2009 - Analyzing the London ambulance service's computer.pdf, storage\625HQ7DR\Landry - 2009 - Analyzing the London ambulance service's computer .pdf)

4) The contents of these files are all (mostly) different. Two seem to be the right file. I don't recognize the others.

5) I just searched for files named "ACM Full Text PDF" which should be common because that is the default unless I rename it. There aren't any.

6) A Zotero database check finds no problems.

7) Version is 2.0.9. Running in FireFox 3.6.13. Windows XP SP3.

8) I'm syncing to my own WebDAV server (Apache on Fedora)

One item for consideration: I do most of my work on a desktop but also sync to my laptop. I mark up the PDFs as I read them (highlight, etc.). I know this gets the file contents out of sync and I don't think zotero is intended to handle this. Just the other day I had the idea that I could force uploading the file by renaming it. I think it was this file (Landry - 2009...). I clicked on the title, deleted a space, checked Rename associated file, and saved. Sure enough, the file synched back to my desktop fine. This may have nothing to do with anything, but it smelled close enough to mention. That was 2 days ago.

At least one other item had this same problem, but only 4 replications of the PDF. The file name is different, but the file title in zotero is the same ("ACM Full Text PDF").

I backed up the zotero database. I have access to the WebDAV logs. I don't know the last time the laptop was sync'ed, but not today, so there is an older copy of the database on it.

I'd like to know: a) How can I find all affected items? So far I have only found the 2. b) How do I fix my library? c) How do I prevent it from happening again?

Thanks in advance.
  • edited March 4, 2011
    2 and 5 aren't actual problems , but a display bug. The label in the central panel of Zotero is not the same as the file name and, for some reason, the label on the right, which _should_ be the filename, isn't either. If you click on "show file" you'll see that the actual file name corresponds to the article - that's done automatically on import.

    So the main question to me is how those 33 file attachments appeared. I don't think your re-naming could have caused this, but I'm not sure.

    edit: deleted wrong stuff - attachment name vs. file name works as intended
  • 2 and 5 are not bugs. The attachment title isn't necessarily the same as the filename. That's all.
    I mark up the PDFs as I read them (highlight, etc.). I know this gets the file contents out of sync and I don't think zotero is intended to handle this.
    As long as the file mod time gets updated, that's fine, and Zotero is designed to support it. If the mod time doesn't get updated, Zotero has no way of knowing the file changed.

    For the 33 items, nobody has ever reported this before. But all Zotero items have date added, for example, so there shouldn't be any mystery as to when they were created.
    The contents of these files are all (mostly) different. Two seem to be the right file. I don't recognize the others.
    Can you be more specific?
  • Support for Zotero is amazingly responsive. Thanks all.

    After investigating the other files in detail I've found the cause of the "problem". Now I don't know if it is a bug or just unexpected behavior.

    All of the PDFs are from the same conference (SIGMIS-CPR'09). I just reproduced the issue by going to the ACM entry for the Landry paper (doi 10.1145/1542130.1542163) and clicking on the icon in the URL bar to download to zotero. Sure enough, it downloaded all 33 papers in the conference into one conference journal entry for the Landry paper. I didn't know it was possible for zotero to download multiple files for an item this way.

    No sync errors, no corruption, no file renaming issues.

    If this DOI were an entry for the entire conference proceeding, then I could understand this behavior. But it seems to be a bug to stuff 33 separate conference papers into one journal entry for only one of those papers. So this is a site translator bug?
  • That certainly sounds like a translator issue, yes. Maybe ajlyon can have a quick look. That's good news, obviously, thanks for helping to track this down.
  • I'm posting this as a new issue under Translators. We can consider this thread closed.
  • No need to create a new thread.
  • I can't replicate this with http://portal.acm.org/citation.cfm?doid=1542130.1542163

    Are you perhaps seeing a different URL? Or the page content is different? The translator would save multiple PDFs if there were multiple matches for the XPath expression //a[@name="FullTextPdf" or @name="FullTextHtml" or @name="FullText Html"], so if you are seeing multiple such A elements in the page source, this would make sense. But I don't see that on the ACM page, and I don't get the behavior you describe.
  • Someone once said everything is obvious once you understand it. Here's the deal.

    ACM supports two views, tabbed and single page view. In the single paged view the table of contents for the publication and links to all of the PDFs are included in the main page.

    For example (my raw, proxied URLs):

    Tabbed view: http://portal.acm.org.proxy-bc.researchport.umd.edu/citation.cfm?id=42375&preflayout=tabs

    Single page view: http://portal.acm.org.proxy-bc.researchport.umd.edu/citation.cfm?id=42375&preflayout=flat

    Your last view is sticky, so now this will give you a single page view:
    http://portal.acm.org.proxy-bc.researchport.umd.edu/citation.cfm?id=42375

    So I must have clicked on the single page view some time ago and it stuck and I didn't notice the difference.

    Interestingly, if you choose the tabbed view and click on the table of contents, and then save to Zotero, it prompts for which entry to save. This prompt doesn't happen in the single page view.
  • Hmm. I'm surprised that this is the end of the line for this issue. I understand there may be higher priority issues, but I would ask that if Zotero is going to import more than one item that it should at least prompt first, as it does when viewing the table of contents.
  • Please go to http://github.com/ajlyon/zotero-bits/raw/master/ACM.js and save the file to the translators directory of your Zotero data directory (http://www.zotero.org/support/zotero_data).

    It should start working again. If this works for you, please post here so that I can submit this change to be pushed to all users.
  • I'm using Firefox 3.5.16 and Zotero 2.1.1. I dropped ACM.js into the translators directory without effect. So I then removed "ACM Digital Library.js", restarted Firefox, and it seems to work fine. I tested on two documents, both in tabbed view and single page view with a number of PDFs listed. It downloaded only one, the correct one.

    Thanks.
  • Glad to hear that this works. Out of curiosity, did you ever edit the ACM translator using Scaffold? I think that the file in your directory should have been "ACM.js" -- but editing it using Scaffold could have renamed it.
  • Fixed in trunk; should be in the next release of Zotero.
  • No, no editing of this file. I've not peeked in the translators directory previously.

    My laptop also has "ACM Digital Library.js" that contains "lastUpdated":"2011-02-24 23:30:00". It does not have "ACM.js" either.

    The laptop is still on Zotero 2.0.9 and Firefox 3.6.13.
  • Translator filenames in XPIs are based on the 'label' field, which may not be the same as the filename in SVN.
  • Ok. I suppose I should bring the two in line to simplify things. Is there any reason not to?
Sign In or Register to comment.