Дана 26/03/16 07:33PM, NRK написа:
> > check the behaviour before and after the patch:
> > $ printf "\xef\xea\xek\xee"
>
> Xterm also shows the diamond question mark (U+FFFD).
>
> > Turns out those bytes are not valid UTF-8 so st was showing U+FFFD
> > instead of ignoring them.
>
> What's the justification for ignoring it? Showing U+FFFD for invalid
> utf8 is pretty standard behaviour.
As I wrote earlier: there is nothing to "fix". This patch amounts to
OP's personal preference.
I wanted to link my previous message, but weirdly it seems to be
missing from the archives[1], so I'll just quote it here with a few
modifications:
> What is there to fix? IMHO, invalid UTF should show as a replacement
> character. At best it is a personal preference/opinion to silently
> ignore it. There are counterexamples which support the current
> behavior, however. For example, Linux console (VT, VC) also displays
> U+FFFD for invalid UTF characters when in UTF mode, even if they are
> part of escape sequences. Try
>
> printf '\e%%G\x9b1mBold\x9b0m\n'
>
> in Linux VC (Ctrl+Alt+Fn). In Artix Linux, this will output
>
> <black square>1mBold<black square>0m
>
> including the characters U+FFFD which represent invalid UTF, because
> the console is in UTF mode, but
>
> printf '\e%%_AT_\x9b1mBold\x9b0m\n'
>
> prints the text "Bold" in bold.
[1]:
https://lists.suckless.org/hackers/date.html
Received on Tue Mar 17 2026 - 15:51:50 CET