Re: [hackers] [st][PATCH v2] st: fix C1 bytes (0x80-0x9F) shown as garbage in UTF-8 mode

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Страхиња Радић <sr_AT_strahinja.org>
Date: Tue, 17 Mar 2026 15:51:50 +0100

Дана 26/03/16 07:33PM, NRK написа:
> > check the behaviour before and after the patch:
> > $ printf "\xef\xea\xek\xee"
>
> Xterm also shows the diamond question mark (U+FFFD).
>
> > Turns out those bytes are not valid UTF-8 so st was showing U+FFFD
> > instead of ignoring them.
>
> What's the justification for ignoring it? Showing U+FFFD for invalid
> utf8 is pretty standard behaviour.

As I wrote earlier: there is nothing to "fix". This patch amounts to
OP's personal preference.

I wanted to link my previous message, but weirdly it seems to be
missing from the archives[1], so I'll just quote it here with a few
modifications:

> What is there to fix? IMHO, invalid UTF should show as a replacement
> character. At best it is a personal preference/opinion to silently
> ignore it. There are counterexamples which support the current
> behavior, however. For example, Linux console (VT, VC) also displays
> U+FFFD for invalid UTF characters when in UTF mode, even if they are
> part of escape sequences. Try
>
> printf '\e%%G\x9b1mBold\x9b0m\n'
>
> in Linux VC (Ctrl+Alt+Fn). In Artix Linux, this will output
>
> <black square>1mBold<black square>0m
>
> including the characters U+FFFD which represent invalid UTF, because
> the console is in UTF mode, but
>
> printf '\e%%_AT_\x9b1mBold\x9b0m\n'
>
> prints the text "Bold" in bold.

[1]: https://lists.suckless.org/hackers/date.html
Received on Tue Mar 17 2026 - 15:51:50 CET

This archive was generated by hypermail 2.3.0 : Tue Mar 17 2026 - 16:00:41 CET