Strip newlines from PDF when pasting in Notes?

When I copy-paste some text from a PDF to the note editor, the pasted text contains a lot of newlines, e.g.

This is a
piece of
pasted text
in stead of
This is a piece of pasted text
Would it be possible to automatically strip all these newlines - converted to breaks in HTML - upon pasting? A special 'Paste from PDF' would do as well :-)
  • There are system tools to do this. This is not Zotero's job.
  • edited June 7, 2010
    Thanks for your quick reply!

    I am sorry, but I am not sure whether I understand what you mean. Do you mean I should use a third-party tool like http://www.textfixer.com/tools/remove-line-breaks.php to remove the linebreaks?

    I use the copy-paste feature a lot to copy pieces of the PDF to the Note editor, so I can add my own comments to those pieces.
  • edited June 7, 2010
    That's exactly what I mean.

    Or, for OS X users, Preview in Snow Leopard is much smarter about copying text from PDFs (supposedly—I haven't really tested it).
  • edited June 7, 2010
    Mmm... for me this third-party solution is not really a solution. And unfortunately, my University forces me into the desert of WinXp :(

    Thanks anyway.
  • To be clear, I'm certainly not recommending a particular third-party solution—just saying that there are better ways to fix this than on an application-specific basis.
  • This is a niggle that I'd also be glad to see solved. Basically there are three places where it can be done:

    (1) at the point of copying. (By Adobe reader, Sumatra, OSX's Preview, Evince or the like when the copying command is invoked)

    (2) at the system level (either automatically, which would have side effects you might not always want) or manually, by means of a keyboard shortcut that runs a linebreak removing routine.

    (3) at the point of pasting, by the app into which the text is pasted.

    I agree with Dan that the best place for this to happen is not (3). And individual apps don't typically have this functionality. The last time I checked, Open Office and Word didn't do it either. I'd love to see it happen at (1), since copying from PDFs is almost the only time I run into this problem these days. Most times when you copy a bit from a PDF you *want* soft-wrapped text. It's the format that we work in in almost all our tools these days. But unfortunately I've never (in Linux and Windows) seen a PDF reader that does this.

    Dan is suggesting (2). Perhaps we could use this thread to gather a few ideas for the various operating systems. The ideal, it seems to me would be a tool that would do a cleanup on text in the clipboard before pasting, it would be activated by a global keyboard shortcut, for example.

    Having said that, Zotero's notes application is a place where pasting from PDFs is probably a major activity. If there are not good system tools available, and since PDF viewers are still lame at this point, it doesn't seem completely unreasonable to think that Zotero might be made to 'do the friendly thing' and make up for the weaknesses of the PDF readers and clipboard tools. For the users' sake.
  • I got annoyed with the same problem and wrote a little Autohotkey script to take care of it. It's set up so that when I press control-alt-V, it strips newlines and pastes, leaving the original text on the clipboard.

    Once I got this script to do what I wanted I stopped tweaking it. I'm not that great with Autohotkey, and really bad with regular expressions, so this is far from a complete solution, but here it is in case it helps.

    #Persistent
    #NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.

    SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
    SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.

    ;Define hot key: control alt V
    ^!v::

    ;Get clipboard text
    orig := clipboard

    ; Note for following code that `r`n = newline

    ;Remove a dash followed by newline, since that's probably a single word across a linebreak
    StringReplace clipboard, clipboard, -`r`n, , All

    ;Replace a single newline with a space
    StringReplace clipboard, clipboard, %A_Space% `r`n, %A_Space%, All
    StringReplace clipboard, clipboard, `r`n, %A_Space%, All

    ;Comma followed by newline --> comma followed by space
    StringReplace clipboard, clipboard, "," . `r`n, "," . %A_Space%, All

    ;Replace multiple adjacent spaces with a single one
    clipboard := RegExReplace(clipboard, "\s+" , " ")

    Send ^v ;paste
    sleep 100
    clipboard := orig ; return clipboard to original state

    ; To do
    ; double line break --> leave alone as intentional new paragraph

    return

  • I should add for clarification, if you want to try the script I just posted, all you have to do is install Autohotkey, and save the script as a textfile with the extension .ahk somewhere on your computer. If you doubleclick that file a Autohotkey icon will appear in the system tray and you should be able to press control-alt-V to paste formatted text.
  • @dwg, thanks for the script. This solved the same issue I had pasting from acrobat into OneNote. Here is a modified version of your script that acts during the copy action with a crude means to maintain paragraph breaks...

    ; REMOVES LINE BREAKS WHEN COPYING

    #Persistent
    #NoEnv ; Recommended for performance and compatibility with future AutoHotkey releases.

    SendMode Input ; Recommended for new scripts due to its superior speed and reliability.
    SetWorkingDir %A_ScriptDir% ; Ensures a consistent starting directory.

    ;Define hot key: control alt c
    ^!c::

    Send ^c ;copy
    sleep 100

    ; Note for following code that `r`n = newline

    ;Code the paragraph breaks with a special combinations
    StringReplace clipboard, clipboard, .`r`n, -.-, All
    StringReplace clipboard, clipboard, `r`n`r`n, -*-, All

    ;Remove a dash followed by newline, since that's probably a single word across a linebreak
    StringReplace clipboard, clipboard, -`r`n, , All

    ;Replace a single newline with a space
    StringReplace clipboard, clipboard, %A_Space% `r`n, %A_Space%, All
    StringReplace clipboard, clipboard, `r`n, %A_Space%, All

    ;Replace multiple adjacent spaces with a single one
    clipboard := RegExReplace(clipboard, "\s+" , " ")

    ;Replace the paragraph break codes with newlines
    StringReplace clipboard, clipboard, -.-, .`r`n, All
    StringReplace clipboard, clipboard, -*-, `r`n, All

    return
Sign In or Register to comment.