Re: [dev] [st] DCS strings cause Unicode glitches

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Wed, 17 Jun 2020 21:38:18 +0200

On Thu, Jun 18, 2020 at 02:05:19AM +1000, Tim Allen wrote:
> I discovered recently that if an application running inside st tries to
> send a DCS string, subsequent Unicode characters get messed up. For
> example, consider the following test-case:
>
> printf '\303\277\033P\033\\\303\277'
>
> ...where:
>
> - \303\277 is the UTF-8 encoding of U+00FF LATIN SMALL LETTER Y WITH
> DIAERESIS (ÿ).
> - \033P is ESC P, the token that begins a DCS string.
> - \033\\ is ESC \, a token that ends a DCS string.
> - \303\277 is the same ÿ character again.
>
> If I run the above command in a VTE-based terminal, or xterm, or
> QTerminal, or pterm (PuTTY), I get the output:
>
> ÿÿ
>
> ...which is to say, the empty DCS string is ignored. However, if I run
> that command inside st (as of commit 9ba7ecf), I get:
>
> ÿÿ
>
> ...where those last two characters are \303\277 interpreted as ISO8859-1
> characters, instead of UTF-8.
>
> I spent some time tracing through the state machines in st.c, and so far
> as I can tell, this is how it works currently:
>
> - ESC P sets the "ESC_DCS" and "ESC_STR" flags, indicating that
> incoming bytes should be collected into the strescseq buffer, rather
> than being interpreted.
> - ESC \ sets the "ESC_STR_END" flag (when ESC is received), and then
> calls strhandle() (when \ is received) to interpret the collected
> bytes.
> - If the collected bytes begin with 'P' (i.e. if this was a DCS
> string) strhandle() sets the "ESC_DCS" flag again, confusing the
> state machine.
>
> If my understanding is correct, fixing the problem should be as easy as
> removing the line that sets ESC_DCS from strhandle():
>
> diff --git a/st.c b/st.c
> index ef8abd5..b5b805a 100644
> --- a/st.c
> +++ b/st.c
> _AT_@ -1897,7 +1897,6 @@ strhandle(void)
> xsettitle(strescseq.args[0]);
> return;
> case 'P': /* DCS -- Device Control String */
> - term.mode |= ESC_DCS;
> case '_': /* APC -- Application Program Command */
> case '^': /* PM -- Privacy Message */
> return;
>
> I've tried the above patch and it fixes my problem, but I don't know if
> it introduces any others.
>

Hi Tim,

Thanks for the detailed report and the patch. It looks good to me and I pushed
it to master.

-- 
Kind regards,
Hiltjo
Received on Wed Jun 17 2020 - 21:38:18 CEST

This archive was generated by hypermail 2.3.0 : Wed Jun 17 2020 - 21:48:07 CEST