Re: [hackers] [st][patch] replace utf8strchr with wcschr

From: Laslo Hunhold <dev_AT_frign.de>
Date: Thu, 14 Mar 2019 11:44:18 +0100

On Thu, 14 Mar 2019 11:17:28 +0100
Jules Maselbas <jmaselbas_AT_kalray.eu> wrote:

Dear Jules,

> What about having an array of Rune to store worddelimiters and have a
> simple search function such as:
>
> Rune *
> utf8strchr(Rune *s, Rune u)
> {
> for (; *s; s++)
> if (*s == u)
> return s;
> return NULL;
> }
>
> The worddelimiters definition will become:
>
> Rune worddelimiters[] = { ' ', 0 };
>
> Which will allow adding unicode codepoint from wide char literal.
> Even if the wchar_t is 16 bits wide the constant will be stored
> into a Rune, which I belive is a 32 bits constant, and should work
> fine.

This would just be less efficient than the current solution, given
you'd have to convert everything to a Rune.

Now, to clear it up: A Rune literally is only a codepoint and just a
typedef for an (at least) 32-bit-integer. If we at any point decide to
support grapheme clusters (which can consist of multiple codepoints) in
st, we would have to implement worddelimiters as an array of arrays of
Runes.

This is why I proposed the offset-idea, because you don't have to
jiggle with codepoints or Runes at runtime. We should in some way
leverage the power UTF-8 gives us in this regard.

With best regards

Laslo

-- 
Laslo Hunhold <dev_AT_frign.de>

Received on Thu Mar 14 2019 - 11:44:18 CET

This archive was generated by hypermail 2.3.0 : Thu Mar 14 2019 - 11:48:21 CET