[dev] [dmenu] What's the expected behavior on invalid utf8? from NRK on 2024-07-04 (dev mail list archive)

From: NRK <nrk_AT_disroot.org>
Date: Thu, 4 Jul 2024 03:04:42 +0000

Hello all,

A couple days ago I was looking into how dmenu deals with invalid utf8
sequences and noticed a couple odd things. Here's the testcase for those
who want to follow along:

        $ printf "0\xef1234567\ntest" | dmenu

In drw.c::utf8decode(), invalid utf8 sequence is set to U+FFFD (�) and
drw_text continues on doing it's width calculation as if there was a
U+FFFD codepoint in the text.

However when it comes to actually rendering the text via
XftDrawStringUtf8(), we simply pass it `utf8str`; which obviously
doesn't have any U+FFFD but instead has invalid utf8 sequences.

I'm not sure if this is documented or not, but on my system xft
basically just cuts the text off at the error. In other words, only 0 is
rendered, followed by a large blank area (see pic0.png).

Is this actually the expected behavior? If yes, then why not break out
early on error instead of calculating width with a made up U+FFFD which
will never be rendered?

I have a rough patch which actually renders invalid utf8 as � instead of
cutting it off (see pic1.png). IMO it's a nicer behavior. But I wanted
to ask what everyone else expects before polishing the patch and sending
it over.

I also noticed that in utf8decode() there's this line:

        if (j < len)
                return 0;

Is this ever reachable? If yes, wouldn't it be a infinite loop since
`text` would never advance inside drw_text()?

- NRK

(image/png attachment: pic0.png)

(image/png attachment: pic1.png)

Received on Thu Jul 04 2024 - 05:04:42 CEST

This archive was generated by hypermail 2.3.0 : Thu Jul 04 2024 - 05:12:08 CEST