[hackers] [PATCH] strings: Correctly handle non-UTF-8 data

From: Santtu Lakkala <inz_AT_inz.fi>
Date: Thu, 1 Sep 2022 14:51:31 +0300

Make fgetrune() put the breaking byte back, if it was not the only one
read, as this may be the start of the next, valid, rune.

Also consider erroneous runes to break a valid sequence, otherwise there
will be false-positive matches.
---
 libutf/fgetrune.c | 3 +++
 strings.c         | 4 +---
 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/libutf/fgetrune.c b/libutf/fgetrune.c
index 8cd78c6..97536fa 100644
--- a/libutf/fgetrune.c
+++ b/libutf/fgetrune.c
_AT_@ -20,6 +20,9 @@ fgetrune(Rune *r, FILE *fp)
 	if (ferror(fp))
 		return -1;
 
+	if (*r == Runeerror && i > 1)
+		ungetc(buf[--i], fp);
+
 	return i;
 }
 
diff --git a/strings.c b/strings.c
index 8f5a154..13b54ea 100644
--- a/strings.c
+++ b/strings.c
_AT_@ -20,9 +20,7 @@ strings(FILE *fp, const char *fname, size_t min)
 
 	for (off = 0, i = 0; (bread = efgetrune(&r, fp, fname)); ) {
 		off += bread;
-		if (r == Runeerror)
-			continue;
-		if (!isprintrune(r)) {
+		if (r == Runeerror || !isprintrune(r)) {
 			if (i == min)
 				putchar('\n');
 			i = 0;
-- 
2.25.1
Received on Thu Sep 01 2022 - 13:51:31 CEST

This archive was generated by hypermail 2.3.0 : Fri Sep 02 2022 - 03:24:36 CEST