Re: [hackers] [libgrapheme] Convert GRAPHEME_STATE to uint_least16_t and remove it || Laslo Hunhold

From: NRK <nrk_AT_disroot.org>
Date: Tue, 4 Oct 2022 05:07:13 +0600

On Mon, Oct 03, 2022 at 11:28:49PM +0200, git_AT_suckless.org wrote:
> The expression
>
> GRAPHEME_STATE state = 0;
>
> admittedly looks a little fishy, given you don't really know what happens
> behind the scenes unless you look in the header,

Another possibility is wrapping the integer inside a struct:

        typedef struct { unsigned internal_state; } GRAPHEME_STATE;

the benefit of this is that the type GRAPHEME_STATE clearly states the
purpose, whereas a `uint_least16_t` doesn't.

Wrapping an enum into a struct is also a common trick to get stronger
type-checking from the compiler; I don't think it matters in this case
though, since the state is always passed via pointer.

> and I want all of the semantics to be crystal clear to the end-user.

Other way of looking at it is that the state is an internal thing so the
user shouldn't be concerned about what's going on behind the scene.

> +static inline void
> +state_serialize(const struct character_break_state *in, uint_least16_t *out)
> +{
> + *out = (uint_least16_t)(in->prop & UINT8_C(0xFF)) | /* first 8 bits */
> + (uint_least16_t)(((uint_least16_t)(in->prop_set)) << 8) | /* 9th bit */
> + (uint_least16_t)(((uint_least16_t)(in->gb11_flag)) << 9) | /* 10th bit */
> + (uint_least16_t)(((uint_least16_t)(in->gb12_13_flag)) << 10); /* 11th bit */
> +}
> +
> +static inline void
> +state_deserialize(uint_least16_t in, struct character_break_state *out)
> +{
> + out->prop = in & UINT8_C(0xFF);
> + out->prop_set = in & (((uint_least16_t)(1)) << 8);
> + out->gb11_flag = in & (((uint_least16_t)(1)) << 9);
> + out->gb12_13_flag = in & (((uint_least16_t)(1)) << 10);
> +}

The `(uint_least16_t)1` casts don't really do much since `int` is
guaranteed to be 16bits anyways. But if you want to be explicit, you can
still use `UINT16_C(1)`, which is shorter thus less noisy, instead of
casting:

- out->prop_set = in & (((uint_least16_t)(1)) << 8);
+ out->prop_set = in & (UINT16_C(1) << 8);

I'd also return by value in these 2 functions. Expressions like these
are more clear compared to out pointers:

        *s = state_deserialize(&state);
        state = state_serialize(*s);

- NRK
Received on Tue Oct 04 2022 - 01:07:13 CEST

This archive was generated by hypermail 2.3.0 : Tue Oct 04 2022 - 01:12:35 CEST