Re: [dev] ssam rocks! unwrapping paragraphs

From: Greg Reagle <list_AT_speedpost.net>
Date: Tue, 22 Mar 2022 22:04:51 -0400

On Tue, Mar 22, 2022, at 9:49 PM, 201009-suckless_AT_planhack.com wrote:
> sed is the canonical paragraph mangler. It's worth spending a bit to
> grok how that is true.
>
> tr -d '\r' | sed '/^$/!{H;d;};p;x;s/\n/ /g;'
>
> Gutenberg lines are CRLF-terminated so `tr` is needed.

Right I forgot to mention that I had to
  tr -d '\r'
first. Thanks for mentioning that.

Close, but no cigar. That sed command introduces extra blank lines. It is incorrect. ssam reigns supreme!

  tr -d '\r' < 2488-0.txt | ssam -e 'x/\n+/ v/\n\n+/ c/ /' | wc -l
7667
  tr -d '\r' < 2488-0.txt | sed '/^$/!{H;d;};p;x;s/\n/ /g;' | wc -l
7782
Received on Wed Mar 23 2022 - 03:04:51 CET

This archive was generated by hypermail 2.3.0 : Wed Mar 23 2022 - 03:48:08 CET