Re: [dev] ssam rocks! unwrapping paragraphs

From: Felix Van der Jeugt <felixvdj+suckless_AT_posteo.be>
Date: Wed, 23 Mar 2022 07:50:59 +0000

On Tue, Mar 22, 2022, at 9:49 PM, 201009-suckless_AT_planhack.com wrote:
> sed is the canonical paragraph mangler. It's worth spending a bit to
> grok how that is true.
>
> tr -d '\r' | sed '/^$/!{H;d;};p;x;s/\n/ /g;'
>
> Gutenberg lines are CRLF-terminated so `tr` is needed.

"Greg Reagle" <list_AT_speedpost.net> wrote:
> Right I forgot to mention that I had to
> tr -d '\r'
> first. Thanks for mentioning that.
>
> Close, but no cigar. That sed command introduces extra blank lines.
> It is incorrect. ssam reigns supreme!
>
> tr -d '\r' < 2488-0.txt | ssam -e 'x/\n+/ v/\n\n+/ c/ /' | wc -l
> 7667
> tr -d '\r' < 2488-0.txt | sed '/^$/!{H;d;};p;x;s/\n/ /g;' | wc -l
> 7782

Either command is incorrect. ssam will leave a file ending in a single
newline ending with a single space and no newline. sed will print
empty lines before paragraphs rather than after and not work when the
file does not end with an empty line (two newlines).

I'm not sure how to fix the ssam command (but it will probably be more
elegant), this should work for sed:

  sed 'H;$!{/^$/!d};x;s/^\n//;s/\n\(.\)/ \1/g;p;d'

Cheers,
Felix
Received on Wed Mar 23 2022 - 08:50:59 CET

This archive was generated by hypermail 2.3.0 : Wed Mar 23 2022 - 09:00:08 CET