Re: [hackers] [st][patch] replace utf8strchr with wcschr

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Lauri Tirkkonen <lotheac_AT_iki.fi>
Date: Thu, 14 Mar 2019 09:57:02 +0200

Hi,

On Wed, Mar 13 2019 20:35:09 +0100, Hiltjo Posthuma wrote:
> I don't like mixing of the existing functions with wchar_t.
> I think st should (at the very least internally) use utf-8.

I think I explained my position poorly, so let me try to clarify.
My apologies if this seems a bit pushy :)

First - I agree with using UTF-8. That's actually how I ended up with
this diff -- I was trying to configure U+3000 IDEOGRAPHIC SPACE as a
delimiter, but seeing that worddelimiters was char *, I started
wondering whether I could actually use unicode characters in it and had
to go read the code, thus finding utf8strchr().

utf8strchr() is a bit peculiar - on every call to ISDELIM(), it decodes
the worddelimiters utf-8 string into Runes (so that it can compare to
the Rune argument). It seems a little strange to me to be doing that --
the delimiters string cannot change at runtime, so storing the
codepoints instead of the multibyte string feels like a better fit. And
that's what wchar_t * is, with the added bonus that we can use libc
wcschr() instead of rolling our own search function.

I already mentioned that Rune is being passed to wcwidth(wchar_t), so it
seems like there is a builtin assumption that Rune and wchar_t hold
equivalent values. I actually don't understand why that typedef exists
instead of just using wchar_t; maybe I'm missing something.

Could you explain what it is that you don't like about wchar_t?

-- 
Lauri Tirkkonen | lotheac _AT_ IRCnet

Received on Thu Mar 14 2019 - 08:57:02 CET

This archive was generated by hypermail 2.3.0 : Thu Mar 14 2019 - 09:00:23 CET