Re: [hackers] [PATCH v3][sbase] paste: Support -d '\0'

From: Michael Forney <mforney_AT_mforney.org>
Date: Tue, 7 Apr 2020 01:54:32 -0700

On 2020-03-27, Richard Ipsum <richardipsum_AT_vx21.xyz> wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.
> ---
> libutf/utf.c | 12 ++++++++++++
> libutf/utftorunestr.c | 12 ++++++++++++
> paste.c | 27 +++++++++++++++------------
> utf.h | 4 +++-
> 4 files changed, 42 insertions(+), 13 deletions(-)
>
> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..fc78f29 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> _AT_@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
> return i;
> }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> + const char *p = s, *end = s + len;
> + size_t i;
> + Rune r;
> +
> + for(i = 0; p < end; i++)
> + p += charntorune(&r, p, end - p);
> + return i;
> +}

It looks like charntorune can return 0 even if p < end if it
encounters a truncated UTF-8 sequence, which would cause this to
infinite loop.

I think something similar to utfntorunestr should work here.

> +
> char *
> utfrune(const char *s, Rune r)
> {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..d350c77 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> _AT_@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
> return i;
> }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> + int i, n;
> + const char *p = str, *end = str + len;
> +
> + for(i = 0; (n = charntorune(&r[i], p, end - p)) && p < end; i++)
> + p += n;

I don't think the `&& p < end` is necessary, since if p == end,
charntorune returns 0.

Also, I think this function should return size_t, not int. I pushed a
change to make utftorunestr return size_t as well.

> +
> + return i;
> +}
Received on Tue Apr 07 2020 - 10:54:32 CEST

This archive was generated by hypermail 2.3.0 : Tue Apr 07 2020 - 11:12:36 CEST