On 2021-12-15, Laslo Hunhold <dev_AT_frign.de> wrote:
> thanks for clearing that up! After more thought I made the decision to
> go with uint8_t, though. I see the point regarding character types, but
> this notion is more of a smelly foot in the C standard. We are moving
> towards UTF-8 as _the_ default encoding format, so considering
> character strings as such is justified.
I think this is a mistake. It makes it very difficult to use the API
correctly if you have data in an array of char or unsigned char, which
is usually the case.
Here's an example of some real code that has a char * buffer:
https://git.sr.ht/~exec64/imv/tree/a83304d4d673aae6efed51da1986bd7315a4d642/item/src/console.c#L54-58
How would you suggest that this code be written for the new API? The
only thing I can think is
if (buffer[position] != 0) {
size_t bufferlen = strlen(buffer) + 1 - position;
uint8_t *newbuffer = malloc(bufferlen);
if (!newbuffer) ...
memcpy(newbuffer, buffer + position, bufferlen);
position += grapheme_bytelen(newbuffer);
free(newbuffer);
}
return position;
This sort of thing would turn me off of using the library entirely.
> Any other way would have introduced too many implicit assumptions.
Like what?
If you really want your code to break when CHAR_BIT != 8, you could
use a static assert (there are also ways to emulate this in C99). But
even if CHAR_BIT > 8, unsigned char is perfectly capable to represent
all the values used in UTF-8 encoding, so I don't see the problem.
> And even if all fails and there simply is no 8-bit-type, one can always
> use the lg_grapheme_isbreak()-function and roll his own de/encoding.
I'm still confused as to what you mean by rolling your own
de/encoding. What would that look like?
If there is no 8-bit type, libgrapheme could not be compiled or used
at all since uint8_t would be missing.
Received on Wed Dec 15 2021 - 21:24:21 CET