Re: [dev] [libgrapheme] Some questions about libgrapheme

From: NRK <nrk_AT_disroot.org>
Date: Sat, 3 Sep 2022 02:21:45 +0600

On Fri, Sep 02, 2022 at 02:08:03PM -0300, atrtarget_AT_cock.li wrote:
> Quite inefficient really, but I guess it's fine since my usage would be
> only user input (left arrow)

If efficiency is not a concern, then you can easily use something like
this (just a quick prototype, didn't verify if it's correct or not):

        /* returns an offset into `s` */
        static size_t
        prev_char_offset(const char *s, size_t slen, size_t off)
        {
                assert(s != NULL);
                assert(slen > 0);
                assert(off <= slen);
        
                size_t ret = 0;
                const char *const end = s + slen;
                while (s < end) {
                        size_t n = grapheme_next_character_break_utf8(s, end - s);
                        if (ret + n >= off)
                                return ret;
                        ret += n;
                        s += n;
                }
                return 0; /* unreachable (?) */
        }

If I was expecting a decent amount of non-ascii input, I would use the
bitvector approach described by Thomas Oltmann. 1bit per byte overhead
should be fine for most use-cases.

- NRK
Received on Fri Sep 02 2022 - 22:21:45 CEST

This archive was generated by hypermail 2.3.0 : Fri Sep 02 2022 - 22:24:09 CEST