Re: [dev] [dmenu] What's the expected behavior on invalid utf8?

From: GMD Ephir <ephir_AT_anche.no>
Date: Thu, 4 Jul 2024 11:32:34 +0300

On Thu, Jul 04, 2024 at 03:04:42AM +0000, NRK wrote:
> Hello all,
>
> A couple days ago I was looking into how dmenu deals with invalid utf8
> sequences and noticed a couple odd things. Here's the testcase for those
> who want to follow along:
>
> $ printf "0\xef1234567\ntest" | dmenu
>
> In drw.c::utf8decode(), invalid utf8 sequence is set to U+FFFD (�) and
> drw_text continues on doing it's width calculation as if there was a
> U+FFFD codepoint in the text.
>
> However when it comes to actually rendering the text via
> XftDrawStringUtf8(), we simply pass it `utf8str`; which obviously
> doesn't have any U+FFFD but instead has invalid utf8 sequences.
>
> I'm not sure if this is documented or not, but on my system xft
> basically just cuts the text off at the error. In other words, only 0 is
> rendered, followed by a large blank area (see pic0.png).
>
> Is this actually the expected behavior? If yes, then why not break out
> early on error instead of calculating width with a made up U+FFFD which
> will never be rendered?
>
> I have a rough patch which actually renders invalid utf8 as � instead of
> cutting it off (see pic1.png). IMO it's a nicer behavior. But I wanted
> to ask what everyone else expects before polishing the patch and sending
> it over.
>
> I also noticed that in utf8decode() there's this line:
>
> if (j < len)
> return 0;
>
> Is this ever reachable? If yes, wouldn't it be a infinite loop since
> `text` would never advance inside drw_text()?
>
> - NRK

I'm pretty sure that this is how it should be done, so I recommend you to send this
patch to "hackers" mailing list.

If you are still in doubt whether you need to send this patch, please,
send it to this discussion. Personally, I would use this.

-- 
Gracefully, Ephir
Received on Thu Jul 04 2024 - 10:32:34 CEST

This archive was generated by hypermail 2.3.0 : Thu Jul 04 2024 - 10:36:08 CEST