Problem importing EndNote citations with linked PDFs

Hi there -
I'm testing out Zotero as a possible replacement for EndNote X. I've successfully imported a few references, but I'm having trouble with references with linked PDFs. These citations imported into Zotero have the following in the URL field:

internal-pdf://author date - number/author date.pdf

which links to the EndNote file structure for PDFs, but Zotero cannot open this link. How do I fix this?

Also, in addition to the imported citation are 2 additional zotero items - web pages - with everything empty except the URL field showing the PDF link (same as above) or the URL associated with that reference
(e.g. www.blackwell-synergy.com/.../j.1526-100X.2006.00186.x )

If I import a few thousand citations this will be a royal pain to fix manually. Any hints as to how to deal with these issues?

I'm very excited about Zotero and would love to make this work. Thanks!
  • edited September 13, 2007
    but I'm having trouble with references with linked PDFs. These citations imported into Zotero have the following in the URL field:

    internal-pdf://author date - number/author date.pdf
    See previous discussion.
    How do I fix this?
    Submit a bug report to Thomson re. this Endnote issue (past experience suggests that they will do anything, but the should be pestered none-the-less). You should remove the URLs or use a text editor to correct the path for now.
    Also, in addition to the imported citation are 2 additional zotero items - web pages - with everything empty except the URL field showing the PDF link (same as above) or the URL associated with that reference
    (e.g. www.blackwell-synergy.com/.../j.1526-100X.2006.00186.x )
    If this is related to the above, the above fix should suffice. If it isn't, please submit the smallest example which exhibits this behavior.
  • thanks! replacing 'internal-pdf://' with the appropriate directory in the export file works somewhat. after importing, clicking the URL on the info tab will open the PDF. (However, unlike PDFs that I add in manually, the PDF is not listed as an attachment & doesn't show up as an item in the center pane. One thing I like about EndNote is that it's easy to see which refs have a PDF and which I still need to track down. )

    The import process still generates 2-3 items for every citation:
    1. the full citation (with the thankfully working URL to the PDF)

    2. a web page item with all fields empty except for the URL link to the PDF

    and if the original endnote citation had an associated web page link,
    3. a web page item with all fields empty except for the URL link to web page

    items 2 and 3 are not hierarchically associated with the full reference; when importing multiple records its impossible to tell which of these web page items go with which reference.

    I'm running firefox 2.0.0.6, zotero 1.0.0rc3, windows XP

    I'll write to Thompson, but anytime I have raised an issue with them, they're response is: "Oh yeah, well, we're looking into that problem." And it might get fixed in their next overpriced version. Which is why I very much appreciate your help!
  • However, unlike PDFs that I add in manually, the PDF is not listed as an attachment & doesn't show up as an item in the center pane.
    Yes--this is intended behavior. A URL is different than a file attachment. The "legacy" bibliographic formats that Endnote exports do not support a robust method of having multiple URLs or specifying the type of resource the URL points to (I think this might be true of Endnote's internal data model too, but am not sure).

    Zotero is doing the right thing. These are not "attachments" that zotero tracks & are stored with the rest of your zotero data, but merely links to the files.

    I suppose an RFE for zotero could be to make it more easy to auto-download attachments (including copying them if file:// urls are used). I don't know what this would do to diskspace & speed, though...
    One thing I like about EndNote is that it's easy to see which refs have a PDF and which I still need to track down. )
    A "hack" until any changes to zotero are made could be to use your text editor to add a keyword, depending on whether or not there was a link to a pdf. This will get transformed into a tag in zotero & you can use it to either represent that a reference has a PDF somewhere in your file system or does not).
    The import process still generates 2-3 items for every citation
    Which file format are you trying to export/import? Can you try to export a file that has only one citation in endnote that still exhibits this bug & post here?
  • Thanks for the explanation of how Zotero handles this. Makes more sense now.

    File formats:
    I used EndNote X output style RefMan (RIS); exported to a text file; here's an example export of 1 journal article below. This creates 3 unrelated items when imported into Zotero. (At least it does on my machine.) EndNote PDF link is in L1 field; \\Iris\ points to where my data is stored on our server.

    TY - JOUR
    AU - Strayer, David L.
    AU - Eviner, Valerie T.
    AU - Jeschke, Jonathan M.
    AU - Pace, Michael L.
    PY - 2006
    TI - Understanding the long-term effects of species invasions
    SP - 645-651
    JF - Trends in Ecology & Evolution
    VL - 21
    IS - 11
    N1 - Understanding the long-term effects of species invasions
    L1 - \\Iris\\\PDF\Strayer-2401669120\Strayer.pdf
    N2 - We describe ..abstract text..chronic effects of species invasions.
    UR - http://www.sciencedirect.com/science/article/B6VJ1-4KFV34H-1/2/b086d017af345ed4a03c75600e9b8833
    ID - 886
    ER -
  • edited September 14, 2007
    That imports as a single entry on my machine. Does a file with only this single record exhibit the bug on your machine?

    (I replaced ' -' with '  -' (two spaces instead of one--without this change, it doesn't import, so I think it was only the way it is rendered in this forum))...
  • The above example imports to three top level items on my machine. Only the main (expected) one appears in the "Imported Friday..." collection. The other two are dummy "Web Page" files not in any collections. These are the "phantom files" I reported a few days ago.
    http://forums.zotero.org/discussion/1246/intercepting-ris-files-diacritics-and-and-phantom-items/
    There are a few links to RIS text files on that page which also import unsubordinated dummy items.

    And does Zotero really 'do the right thing' in this case? Sure, it's great if it keeps the URL, but what if we are talking about a PDF instance of an (otherwise paper) journal article, I don't want the URL showing up in my citation. At least the Chicago Styles presently include the contents of Zotero's URL field by default. Simon (I think) helpfully created a ticket for a pref to turn this off, but if I understood Bruce correctly, the inclusion of the URI for an item is really the desired behavior (since it meets the CSL principle of 'fallback to something intelligent').
    http://forums.zotero.org/discussion/1198/ugly-url-inserted-in-chicago-citations-imported-from-jstor-etc/

    I'm quite sure I don't understand all the issues at stake, but it seems that Zotero should preserve the following distinct types of URI/item location information:

    1) The web location and access date of the version I intend to make reference to. I want this in the any bibliographic references to the item (and I signal that by including it in my Zotero metadata.) Rare in my case. Used for web pages, blog entries, and online articles without print versions

    2) A link to the local copy of the item I referenced. Mostly we'll just attach the item and let Zotero keep it internally, but there may be non-static things for which we don't want to do this.

    3) The location where an online instance of the item can be found (a PDF, for example). This is very useful to keep, not least if I want to pass the link to a collegue, but I don't want it in my citation. Until URI citation is widely accepted in my field, I'm still going to follow the common practice of only citing the paper edition of journal articles, even if the only paper I ever saw it on was the white background provided by Adobe Reader.

    4) The URL of the database entry (JSTOR, EBSCOhost) from which I got the metadata. This should never go into a bibliographic citation as the item's URI, since it is not a URI for the item. And according to Bruce, Zotero's default behavior of putting it there should be changed, and doesn't correspond with the CSL spec.

    5) Anything else related ("Google Scholar Linked Page").

    Some of these (4,5) are good candidates for attachments (though as I understand, too many of those increase database unwieldiness). Others (1,2,3) should be kept on the 'Info' pane itself--but distinguished from one another. (Or, how does everyone else see this?) Perhaps fritillaria does want his 'science direct' URI in the citation, but in my case I wouldn't. For me, it's a (3) not a (1).
  • OK, created some tickets:

    #759, RIS import creates unlinked webpage items outside of parent

    #760, Files specified in file:// URLs should be imported

    #761, Support EndNote internal-pdf:// URLs in RIS import

    One #760 is implemented, replacing the internal-pdf:// link with a file:// URL should cause the PDF to be imported as a child attachment rather than as a linked URL (and/or a URL field value—now it's doing both, which is obviously incorrect, but that'll probably get worked out when we address this).

    #761 will eliminate that step entirely.

    Thanks for the delineation of the various cases, Scot. We'll review the various import mechanisms and try to correct the ones that are doing the wrong thing. Imported RIS files (as opposed to those saved via translators) may be tricky, though, since there's much more ambiguity. In general, due to the citation behavior, we should probably err on the side of just creating linked URLs rather than putting values in the URL field, though an interface option for converting between the two would probably be helpful too.

    (As a side note, the fact that using a UNC path (or any filesystem path) works when clicking the URL field is just a fluke of how Firefox handles those (i.e. by opening them). Really, that field should probably only accept valid URLs, though as I note in #760, we could try to interpret URLs beginning with slashes as file:// URLs.)
  • edited September 14, 2007
    Dan, I assume you're on US Eastern time, makes you either a maniac developer or an insomniac for responding in such detail at this hour. I am constantly surprised by the times you all seem to be doing real productive work. Cheers to you all, and thanks.
  • The above example imports to three top level items on my machine. Only the main (expected) one appears in the "Imported Friday..." collection. The other two are dummy "Web Page" files not in any collections. These are the "phantom files" I reported a few days ago.
    Yes--you're right. Or at least it does when I start from an empty database & it is easy to see what is added that isn't in a collection ;-)
    #761, Support EndNote internal-pdf:// URLs in RIS import
    Note that this will require user interaction. In recent versions of Endnote, the PDF are all stored in the 'BIBNAME.Data/PDF/' directory. This is kept in the same directory as BIBNAME.enl (and so can be anywhere on the drive) & the RIS file that is exported knows nothing of the original location & can of course be saved anywhere. It looks doable, but getting Thomson to implement file:// links would seem preferable in the long run for everyone.
  • Scot, I agree with how you'd like Zotero to approach things. I definitely don't want URIs in my citations - in my field I almost always use the standard paper citation format. the only exception is citing various grey literature documents which are only found on the web. It would be good to be able to separate out these cases.

    with the issue of importing PDF links from EndNote, I would like to eventually stop using EndNote entirely, so it would be great if there was somehow the option of transferring the PDF file structure from BIBNAME/BIBNAME.Data/PDF to the Zotero/storage directory, so that all references & stored pdfs are in the same directory.

    Sadly, I doubt Thompson is going to implement file:// links to help this migration practice along....

    Dan/Scot/noksagt thank you so much for your help & hard work. I look forward to the day when I can remove EndNote from my computer!
  • Note that this will require user interaction. In recent versions of Endnote, the PDF are all stored in the 'BIBNAME.Data/PDF/' directory. This is kept in the same directory as BIBNAME.enl (and so can be anywhere on the drive) & the RIS file that is exported knows nothing of the original location & can of course be saved anywhere.
    Thanks—I misread that other thread. I've updated the ticket to reflect this. It seems just requiring the RIS file to be in the same directory as the .enl file may be the best approach, since popping up a second file dialog would be totally confusing for most users.
Sign In or Register to comment.