Re: [hackers] [libgrapheme][PATCH] Simplify cp_decode and be more strict

From: Mattias Andrée <maandree_AT_kth.se>
Date: Thu, 28 May 2020 16:49:22 +0200

On Thu, 28 May 2020 16:45:20 +0200
Mattias Andrée <maandree_AT_kth.se> wrote:

> Hi Laslo,
>
> On Thu, 28 May 2020 15:53:32 +0200
> Laslo Hunhold <dev_AT_frign.de> wrote:
>
> > On Thu, 28 May 2020 13:48:18 +0200
> > Mattias Andrée <maandree_AT_kth.se> wrote:
> >
> > Dear Mattias,
> >
> > > Looks good, and I especially like the simplification it brings for
> > > using partially loaded strings.
> >
> > I'm glad to hear that. Thanks!
> >
> > > However, I have three minor comments:
> > >
> > > I preferred `lut[off].mask` over `(lut[off].upper - lut[off].lower)`.
> > > It is clearer what it means, and storing the mask in `lut` doesn't
> > > even increase its size since it is padded anyway because `mincp` is
> > > (atleast on x86-64 and i386) aligned to 4 bytes. An alternative,
> > > is to use `~lut[off].lower` which I think is clearer than
> > > `(lut[off].upper - lut[off].lower)`, but again, I prefer
> > > `lut[off].mask`. You could also write
> > > *cp = s[0] - lut[off].lower;
> > > I think this alternative is about as clear as using `lut[off].mask`.
> >
> > I was first vary of this way, because it would be problematic if s[0] <
> > lut[off].lower, but because we check this beforehand this is possible.
> > I'll note it and add it later.
> >
> > > In POSIX (but not Linux) `1 << 16` can be either 0, 1, or 2¹⁶,
> > > since `1` is an `int` which minimum width is 16, not 32. Similarly,
> > > `0x10FFFF` could overflow to 0xFFFF.
> >
> > So would you recommend an explicit cast to uint32_t, i.e.
> >
> > (uint32_t)1 << 16
> >
> > to overcome this?
>
> Yes.

Don't forget about 0x10FFFF, where you need UINT32_C(0x10FFFF).
Casting doesn't work here unless you also add the L suffix.

>
> >
> > > I think `(s[i] & 0xC0) != 0x80` is clearer than `!BETWEEN(s[i], 0x80,
> > > 0xBF)`, but since you changed this I assume you disagree.
> >
> > I don't disagree either way. The comment I added above is sufficient in
> > terms of readability. I'm not a big fan of micro-optimizations and
> > prefer higher "readability". Both solutions are readable enough, given
> > a proper comment, but I just went with the "BETWEEN"-approach as it is
> > similar to how we check it earlier.
> >
> > With best regards
> >
> > Laslo
> >
> > PS: No need to CC me, I am subscribed to the list. :P
> >
>
> Sorry, I pressing reply to all instead of reply to list
> without really looking, I need to remove the former option.
>
>
> Regards,
> Mattias Andrée
>
Received on Thu May 28 2020 - 16:49:22 CEST

This archive was generated by hypermail 2.3.0 : Thu May 28 2020 - 17:00:34 CEST