Re: [hackers] [st][PATCH] fix C1 bytes (0x80-0x9F) shown as garbage in UTF-8 mode

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Sun, 15 Mar 2026 16:35:00 +0100

On Sun, Mar 15, 2026 at 01:22:55PM +0000, amrit44404 wrote:
> From 39dd8d1a573f76d969a8c55e80358ec33a1c6c76 Mon Sep 17 00:00:00 2001
> From: amritxyz <amrit44404_AT_proton.me[1]>
> Date: Sun, 15 Mar 2026 18:43:11 +0545
> Subject: [PATCH] st: fix C1 bytes (0x80-0x9F) shown as garbage in UTF-8
> mode
>
> Raw C1 bytes are not valid UTF-8. utf8decode() returns U+FFFD for
> them which gets drawn on screen as a replacement character.
>
> Fix this by skipping C1 bytes in twrite() before utf8decode() sees
> them. The ESC_STR guard lets them through when inside a STR sequence
> so they can still act as sequence terminators.
>
> Also add an early return in tputc() as a safety net for any direct
> callers, and call strhandle() when a C1 byte terminates a STR
> sequence so OSC sequences are not silently lost.
>
> Tested: printf '\x8f' now produces no output.
> ---
>  st.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/st.c b/st.c
> index 6f40e35..d0bf933 100644
> --- a/st.c
> +++ b/st.c
> _AT_@ -2396,6 +2396,9 @@ tputc(Rune u)
>   Glyph *gp;
>  
>   control = ISCONTROL(u);
> + /* in UTF-8 mode, ignore C1 control characters early */
> + if (IS_SET(MODE_UTF8) && ISCONTROLC1(u) && !(term.esc & ESC_STR))
> + return;
>   if (u < 127 || !IS_SET(MODE_UTF8)) {
>   c[0] = u;
>   width = len = 1;
> _AT_@ -2455,8 +2458,11 @@ check_control_code:
>   */
>   if (control) {
>   /* in UTF-8 mode ignore handling C1 control characters */
> - if (IS_SET(MODE_UTF8) && ISCONTROLC1(u))
> + if (IS_SET(MODE_UTF8) && ISCONTROLC1(u)) {
> + if (term.esc & ESC_STR_END)
> + strhandle();
>   return;
> + }
>   tcontrolcode(u);
>   /*
>   * control codes are not shown ever
> _AT_@ -2546,6 +2552,11 @@ twrite(const char *buf, int buflen, int show_ctrl)
>  
>   for (n = 0; n < buflen; n += charsize) {
>   if (IS_SET(MODE_UTF8)) {
> + /* skip C1 bytes before utf8decode() mangles them */
> + if (ISCONTROLC1(buf[n] & 0xFF) && !(term.esc & ESC_STR)) {
> + charsize = 1;
> + continue;
> + }
>   /* process a complete utf8 char */
>   charsize = utf8decode(buf + n, &u, buflen - n);
>   if (charsize == 0)
> --
> 2.53.0
>
> References
>
> 1. mailto:amrit44404_AT_proton.me (link)

Hi,

The patch looks garbled (no TAB indent). Can you fix it and resend?

Also are there particular applications where you noticed this?

I hope this doesn't break anything...

-- 
Kind regards,
Hiltjo
Received on Sun Mar 15 2026 - 16:35:00 CET

This archive was generated by hypermail 2.3.0 : Sun Mar 15 2026 - 16:36:38 CET