Hi,
On Wed, Mar 13 2019 20:35:09 +0100, Hiltjo Posthuma wrote:
> I don't like mixing of the existing functions with wchar_t.
> I think st should (at the very least internally) use utf-8.
I think I explained my position poorly, so let me try to clarify.
My apologies if this seems a bit pushy :)
First - I agree with using UTF-8. That's actually how I ended up with
this diff -- I was trying to configure U+3000 IDEOGRAPHIC SPACE as a
delimiter, but seeing that worddelimiters was char *, I started
wondering whether I could actually use unicode characters in it and had
to go read the code, thus finding utf8strchr().
utf8strchr() is a bit peculiar - on every call to ISDELIM(), it decodes
the worddelimiters utf-8 string into Runes (so that it can compare to
the Rune argument). It seems a little strange to me to be doing that --
the delimiters string cannot change at runtime, so storing the
codepoints instead of the multibyte string feels like a better fit. And
that's what wchar_t * is, with the added bonus that we can use libc
wcschr() instead of rolling our own search function.
I already mentioned that Rune is being passed to wcwidth(wchar_t), so it
seems like there is a builtin assumption that Rune and wchar_t hold
equivalent values. I actually don't understand why that typedef exists
instead of just using wchar_t; maybe I'm missing something.
Could you explain what it is that you don't like about wchar_t?
--
Lauri Tirkkonen | lotheac _AT_ IRCnet
Received on Thu Mar 14 2019 - 08:57:02 CET