[hackers] [st] fix unicode glitch in DCS strings, patch by Tim Allen || Hiltjo Posthuma

From: <git_AT_suckless.org>
Date: Wed, 17 Jun 2020 21:37:23 +0200 (CEST)

commit 818ec746f4caae453d09368b101c3e841cf39870
Author: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
AuthorDate: Wed Jun 17 21:35:39 2020 +0200
Commit: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
CommitDate: Wed Jun 17 21:35:39 2020 +0200

    fix unicode glitch in DCS strings, patch by Tim Allen
    
    Reported on the mailinglist:
    
    "
    I discovered recently that if an application running inside st tries to
    send a DCS string, subsequent Unicode characters get messed up. For
    example, consider the following test-case:
    
        printf '\303\277\033P\033\\\303\277'
    
    ...where:
    
      - \303\277 is the UTF-8 encoding of U+00FF LATIN SMALL LETTER Y WITH
        DIAERESIS (ÿ).
      - \033P is ESC P, the token that begins a DCS string.
      - \033\\ is ESC \, a token that ends a DCS string.
      - \303\277 is the same ÿ character again.
    
    If I run the above command in a VTE-based terminal, or xterm, or
    QTerminal, or pterm (PuTTY), I get the output:
    
        ÿÿ
    
    ...which is to say, the empty DCS string is ignored. However, if I run
    that command inside st (as of commit 9ba7ecf), I get:
    
        ÿÿ
    
    ...where those last two characters are \303\277 interpreted as ISO8859-1
    characters, instead of UTF-8.
    
    I spent some time tracing through the state machines in st.c, and so far
    as I can tell, this is how it works currently:
    
      - ESC P sets the "ESC_DCS" and "ESC_STR" flags, indicating that
        incoming bytes should be collected into the strescseq buffer, rather
        than being interpreted.
      - ESC \ sets the "ESC_STR_END" flag (when ESC is received), and then
        calls strhandle() (when \ is received) to interpret the collected
        bytes.
      - If the collected bytes begin with 'P' (i.e. if this was a DCS
        string) strhandle() sets the "ESC_DCS" flag again, confusing the
        state machine.
    
    If my understanding is correct, fixing the problem should be as easy as
    removing the line that sets ESC_DCS from strhandle():
    
    diff --git a/st.c b/st.c
    index ef8abd5..b5b805a 100644
    --- a/st.c
    +++ b/st.c
    _AT_@ -1897,7 +1897,6 @@ strhandle(void)
                    xsettitle(strescseq.args[0]);
                    return;
            case 'P': /* DCS -- Device Control String */
    - term.mode |= ESC_DCS;
            case '_': /* APC -- Application Program Command */
            case '^': /* PM -- Privacy Message */
                    return;
    
    I've tried the above patch and it fixes my problem, but I don't know if
    it introduces any others.
    "

diff --git a/st.c b/st.c
index ef8abd5..b5b805a 100644
--- a/st.c
+++ b/st.c
_AT_@ -1897,7 +1897,6 @@ strhandle(void)
                 xsettitle(strescseq.args[0]);
                 return;
         case 'P': /* DCS -- Device Control String */
- term.mode |= ESC_DCS;
         case '_': /* APC -- Application Program Command */
         case '^': /* PM -- Privacy Message */
                 return;
Received on Wed Jun 17 2020 - 21:37:23 CEST

This archive was generated by hypermail 2.3.0 : Wed Jun 17 2020 - 21:48:31 CEST