[hackers] [libgrapheme] Add a remark on standard conformance in README || Laslo Hunhold
commit 42e58c7d3a921540f5d901b80a0cc75e234b02e9
Author: Laslo Hunhold <dev_AT_frign.de>
AuthorDate: Wed Dec 22 15:20:27 2021 +0100
Commit: Laslo Hunhold <dev_AT_frign.de>
CommitDate: Wed Dec 22 15:20:27 2021 +0100
Add a remark on standard conformance in README
Signed-off-by: Laslo Hunhold <dev_AT_frign.de>
diff --git a/README b/README
index 3b82a29..4e6ee44 100644
--- a/README
+++ b/README
_AT_@ -7,6 +7,13 @@ up of user-perceived characters (so-called "grapheme clusters") that are
made up of one or more Unicode codepoints, which in turn are encoded in
one or more bytes in an encoding like UTF-8.
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant. While this may work in some cases,
+this assumption quickly breaks, especially for non-Western languages and
+decomposed Unicode strings where user-perceived characters are usually
+represented using multiple codepoints.
+
Despite the complicated multilevel structure of Unicode strings,
libgrapheme provides methods to work with them at the byte-level (i.e.
UTF-8 ‘char’ arrays) while also providing codepoint-level methods.
_AT_@ -28,6 +35,19 @@ Afterwards enter the following command to build and install libgrapheme
make install
+Conformance
+-----------
+The libgrapheme library is compliant with the Unicode 14.0.0
+specification (September 2021).
+
+To ensure conformance, libgrapheme includes hundreds of tests including
+all provided with the standard-provided test-data that is parsed
+automatically. The tests can be run with
+
+ make test
+
+to check standard conformance.
+
Usage
-----
Include the header grapheme.h in your code and link against libgrapheme
diff --git a/man/grapheme_decode_utf8.3 b/man/grapheme_decode_utf8.3
index 2536e72..0ca91eb 100644
--- a/man/grapheme_decode_utf8.3
+++ b/man/grapheme_decode_utf8.3
_AT_@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
.Dt GRAPHEME_DECODE_UTF8 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_encode_utf8.3 b/man/grapheme_encode_utf8.3
index 5e51ac2..cf90c5b 100644
--- a/man/grapheme_encode_utf8.3
+++ b/man/grapheme_encode_utf8.3
_AT_@ -1,4 +1,4 @@
-.Dd 2021-12-17
+.Dd 2021-12-22
.Dt GRAPHEME_ENCODE_UTF8 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_is_character_break.3 b/man/grapheme_is_character_break.3
index 507842c..f50eee3 100644
--- a/man/grapheme_is_character_break.3
+++ b/man/grapheme_is_character_break.3
_AT_@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
.Dt GRAPHEME_IS_CHARACTER_BREAK 3
.Os suckless.org
.Sh NAME
diff --git a/man/grapheme_next_character_break.3 b/man/grapheme_next_character_break.3
index 962b2ce..9e0245b 100644
--- a/man/grapheme_next_character_break.3
+++ b/man/grapheme_next_character_break.3
_AT_@ -1,4 +1,4 @@
-.Dd 2021-12-18
+.Dd 2021-12-22
.Dt GRAPHEME_NEXT_CHARACTER_BREAK 3
.Os suckless.org
.Sh NAME
diff --git a/man/libgrapheme.7 b/man/libgrapheme.7
index 2d33112..5d96e49 100644
--- a/man/libgrapheme.7
+++ b/man/libgrapheme.7
_AT_@ -1,4 +1,4 @@
-.Dd 2021-12-19
+.Dd 2021-12-22
.Dt LIBGRAPHEME 7
.Os suckless.org
.Sh NAME
_AT_@ -18,11 +18,22 @@ see
that are made up of one or more Unicode codepoints, which in turn
are encoded in one or more bytes in an encoding like UTF-8.
.Pp
+There is a widespread misconception that it was enough to simply
+determine codepoints in a string and treat them as user-perceived
+characters to be Unicode compliant.
+While this may work in some cases, this assumption quickly breaks,
+especially for non-Western languages and decomposed Unicode strings
+where user-perceived characters are usually represented using multiple
+codepoints.
+.Pp
Despite this complicated multilevel structure of Unicode strings,
.Nm
provides methods to work with them at the byte-level (i.e. UTF-8
.Sq char
arrays) while also offering codepoint-level methods.
+.Pp
+Every documented function's manual page provides a self-contained
+example illustrating the possible usage.
.Sh SEE ALSO
.Xr grapheme_decode_utf8 3 ,
.Xr grapheme_encode_utf8 3 ,
Received on Wed Dec 22 2021 - 15:20:49 CET
This archive was generated by hypermail 2.3.0
: Wed Dec 22 2021 - 15:24:34 CET