Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold

From: Mattias Andrée <maandree_AT_kth.se>
Date: Sat, 18 Dec 2021 21:19:59 +0100

It appears you are correct, I've been tricked by some tool that
checked for undefined behaviour during runtime (don't remember
which, it was an website that forced it upon the user). Casting
a signed value X to unsigned is for an N-bit integer shall
result in the number that that is congruent with X modulo
2^N. So, 2^N + X for negative numbers.

And yes, signed overflow is undefined, despite some LinkedIn
Learning course I took claiming otherwise (it even claimed that
C always used two's complement). (And no, LinkedIn Learning is
not worth your money, whatever it may cost; my employer pays
for it.)


On Sat, 18 Dec 2021 15:07:30 -0500
Ethan Sommer <e5ten.arch_AT_gmail.com> wrote:

> On Sat, Dec 18, 2021 at 3:02 PM Mattias Andrée <maandree_AT_kth.se> wrote:
>
> > (size_t)-1 is also undefined behaviour.
>
>
> It isn't, wrap-around with unsigned types is defined, it's only signed
> overflow that isn't.
>
>
> > On Sat, 18 Dec 2021 20:21:42 +0100
> > <git_AT_suckless.org> wrote:
> >
> > > commit cb7e9c00899ae0ed57a84991308b7f880f4ddef6
> > > Author: Laslo Hunhold <dev_AT_frign.de>
> > > AuthorDate: Sat Dec 18 20:21:04 2021 +0100
> > > Commit: Laslo Hunhold <dev_AT_frign.de>
> > > CommitDate: Sat Dec 18 20:21:04 2021 +0100
> > >
> > > Use SIZE_MAX instead of (size_t)-1
> > >
> > > This makes a bit clearer what we mean, and given the library is C99
> > > we can rely on this constant to exist.
> > >
> > > Signed-off-by: Laslo Hunhold <dev_AT_frign.de>
> > >
> > > diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
> > > index 26e3afb..d5c7c9d 100644
> > > --- a/man/grapheme_decode_utf8.3
> > > +++ b/man/grapheme_decode_utf8.3
> > > _AT_@ -31,8 +31,8 @@ Given NUL has a unique 1 byte representation, it is
> > safe to operate on
> > > NUL-terminated strings by setting
> > > .Va len
> > > to
> > > -.Dv (size_t)-1
> > > -and terminating when
> > > +.Dv SIZE_MAX
> > > +(stdint.h is already included by grapheme.h) and terminating when
> > > .Va cp
> > > is 0 (see
> > > .Sx EXAMPLES
> > > _AT_@ -87,7 +87,7 @@ print_cps_nul_terminated(const char *str)
> > > uint_least32_t cp;
> > >
> > > for (off = 0; (ret = grapheme_decode_utf8(str + off,
> > > - (size_t)-1, &cp)) > 0 &&
> > > + SIZE_MAX, &cp)) > 0 &&
> > > cp != 0; off += ret) {
> > > printf("%"PRIxLEAST32"\\n", cp);
> > > }
> > > diff --git a/src/character.c b/src/character.c
> > > index 015b4e0..8f1143f 100644
> > > --- a/src/character.c
> > > +++ b/src/character.c
> > > _AT_@ -197,19 +197,19 @@ grapheme_next_character_break(const char *str)
> > > * miss it, even if the previous UTF-8 sequence terminates
> > > * unexpectedly, as it would either act as an unexpected byte,
> > > * saved for later, or as a null byte itself, that we can catch.
> > > - * We pass (size_t)-1 to the length, as we will never read beyond
> > > + * We pass SIZE_MAX to the length, as we will never read beyond
> > > * the null byte for the reasons given above.
> > > */
> > >
> > > /* get first codepoint */
> > > - len += grapheme_decode_utf8(str, (size_t)-1, &cp0);
> > > + len += grapheme_decode_utf8(str, SIZE_MAX, &cp0);
> > > if (cp0 == GRAPHEME_INVALID_CODEPOINT) {
> > > return len;
> > > }
> > >
> > > while (cp0 != 0) {
> > > /* get next codepoint */
> > > - ret = grapheme_decode_utf8(str + len, (size_t)-1, &cp1);
> > > + ret = grapheme_decode_utf8(str + len, SIZE_MAX, &cp1);
> > >
> > > if (cp1 == GRAPHEME_INVALID_CODEPOINT ||
> > > grapheme_is_character_break(cp0, cp1, &state)) {
> > >
> >
> >
> >
Received on Sat Dec 18 2021 - 21:19:59 CET

This archive was generated by hypermail 2.3.0 : Sat Dec 18 2021 - 21:24:29 CET