[wiki] [sites] Update libgrapheme-page and add manuals || Laslo Hunhold

From: <git_AT_suckless.org>
Date: Thu, 06 Oct 2022 22:09:41 +0200

commit c0322961a34af28595d3f6e21f92d5af3313063e
Author: Laslo Hunhold <dev_AT_frign.de>
Date: Thu Oct 6 22:08:10 2022 +0200

    Update libgrapheme-page and add manuals
    
    Signed-off-by: Laslo Hunhold <dev_AT_frign.de>

diff --git a/libs.suckless.org/libgrapheme/index.md b/libs.suckless.org/libgrapheme/index.md
index b26d3cb3..80226c1e 100644
--- a/libs.suckless.org/libgrapheme/index.md
+++ b/libs.suckless.org/libgrapheme/index.md
_AT_@ -1,60 +1,61 @@
 ![libgrapheme](libgrapheme.svg)
 
-libgrapheme is an extremely simple C99 library providing utilities for
-properly handling Unicode strings made up of user-perceived characters
-('grapheme clusters') according to the Unicode standard. While providing
-convenience functions to operate on UTF-8-encoded strings, you can also
-use libgrapheme for any other encoding as well.
-
-The necessary lookup-tables and test-data are automatically generated
-from the Unicode standard data, ensuring correctness and validation.
-A specialized 'Heisenstate' state-handling combined with
-O(log(n))-binary-search on the lookup-tables and data-recycling provides
-great processing-performance in the order of millions of codepoints per
-second.
+libgrapheme is an extremely simple freestanding C99 library providing
+utilities for properly handling strings according to the latest
+Unicode standard 15.0.0. It offers fully Unicode compliant
+
+* __grapheme cluster__ (i.e. user-perceived character) __segmentation__
+* __word segmentation__
+* __sentence segmentation__
+* detection of permissible __line break opportunities__
+* __case detection__ (lower-, upper- and title-case)
+* __case conversion__ (to lower-, upper- and title-case)
+
+on UTF-8 strings and codepoint arrays, which both can also be
+null-terminated.
+
+The necessary lookup-tables are automatically generated from the Unicode
+standard data (contained in the tarball) and heavily compressed. Over
+10,000 automatically generated conformance tests and over 150 unit tests
+ensure conformance and correctness.
 
 There is no complicated build-system involved and it's all done using
-one POSIX-compliant Makefile. All you need is a C99 compiler, because
-the data-generators are also written in C99.
+one POSIX-compliant Makefile. All you need is a C99 compiler, given
+the lookup-table-generators and compressors are also written in C99.
+The resulting library is freestanding and thus not even dependent on a
+standard library to be present at runtime.
 
-Motivation
-----------
-The goal of this project is to be a suckless and statically linkable
-alternative to the existing bloated, complicated and overscoped solutions
-for Unicode string handling (ICU, GNU's libunistring, etc.), motivating
-more hackers to properly handle Unicode strings in their projects and
-allowing this even in embedded applications.
+Development
+-----------
+You can [browse](//git.suckless.org/libgrapheme) the source code
+repository or get a copy with the following command:
 
-The problem can be easily seen when looking at the sizes of the respective
-libraries: The ICU library (libicudata.a, libicui18n.a, libicuio.a,
-libicutest.a, libicutu.a, libicuuc.a) is around 38MB and libunistring
-(libunistring.a) is around 2MB, which is unacceptable for static
-linking. Both take many minutes to compile even on a good computer and
-require a lot of dependencies, including Python for ICU. On
-the other hand libgrapheme (libgrapheme.a) only weighs in at around 40K
-and is compiled (including Unicode data parsing) in fractions of a
-second, requiring nothing but a C99 compiler and make(1).
+ git clone https://git.suckless.org/libgrapheme
 
-While ICU and libunistring offer a lot of functions and the weight mostly
-comes from locale-data provided by the Unicode standard, which is applied
-implementation-specifically (!) for some things, the same standard always
-defines a sane 'default' behaviour as an alternative in such cases that
-is satisfying in 99% of the cases and which you can rely on.
+Download
+--------
+libgrapheme follows the semantic versioning scheme.
 
-For some languages, for instance, it is necessary to have a dictionary
-on hand to always accurately determine when a word begins and ends. The
-defaults provided by the standard, though, already do a good job
-respecting the language's boundaries in the general case and are not too
-taxing in terms of performance.
+* [libgrapheme-1.0.0](//dl.suckless.org/libgrapheme/libgrapheme-1.tar.gz) (2021-12-22)
 
-Handling user-perceived characters is not locale-dependent, though, and
-does not require locale-data.
 
 Getting Started
 ---------------
-Installing libgrapheme will install the header grapheme.h and both the
-static library libgrapheme.a and the dynamic library libgrapheme.so in
-the respective folders. Access the manual under libgrapheme(7) by typing
+Installing libgrapheme via
+
+ make install
+
+will install the header grapheme.h and both the static library
+libgrapheme.a and the dynamic library libgrapheme.so (with symlinks) in
+the respective folders. The conformance and unit tests can be run with
+
+ make test
+
+and comparative benchmarks against libutf8proc can be run with
+
+ make benchmark
+
+You can access the manual via libgrapheme(7) by typing
 
         man libgrapheme
 
_AT_@ -109,16 +110,44 @@ and the output is
          6 bytes | நி
          1 bytes | !
 
-Development
------------
-You can [browse](//git.suckless.org/libgrapheme) the source code
-repository or get a copy with the following command:
 
- git clone https://git.suckless.org/libgrapheme
+Motivation
+----------
+The goal of this project is to be a suckless and statically linkable
+alternative to the existing bloated, complicated, overscoped and/or
+incorrect solutions for Unicode string handling (ICU, GNU's
+libunistring, libutf8proc, etc.), motivating more hackers to properly
+handle Unicode strings in their projects and allowing this even in
+embedded applications.
 
-Download
---------
-* [libgrapheme-1](//dl.suckless.org/libgrapheme/libgrapheme-1.tar.gz) (2021-12-22)
+The problem can be easily seen when looking at the sizes of the respective
+libraries: The ICU library (libicudata.a, libicui18n.a, libicuio.a,
+libicutest.a, libicutu.a, libicuuc.a) is around 38MB and libunistring
+(libunistring.a) is around 2MB, which is unacceptable for static
+linking. Both take many minutes to compile even on a good computer and
+require a lot of dependencies, including Python for ICU. On
+the other hand libgrapheme (libgrapheme.a) only weighs in at around 300K
+and is compiled (including Unicode data parsing and compression) in
+under a second, requiring nothing but a C99 compiler and POSIX make(1).
+
+Some libraries, like libutf8proc and libunistring, are incorrect by
+basing their API on assumptions that haven't been true for years
+(e.g. offering stateless grapheme cluster segmentation even though the
+underlying algorithm is not stateless). As an additional factor,
+libutf8proc's UTF-8-decoder is unsafe, as it allows overlong encodings
+that can be easily used for exploits.
+
+While ICU and libunistring offer a lot of functions and the weight mostly
+comes from locale-data provided by the Unicode standard, which is applied
+implementation-specifically (!) for some things, the same standard always
+defines a sane 'default' behaviour as an alternative in such cases that
+is satisfying in 99% of the cases and which you can rely on.
+
+For some languages, for instance, it is necessary to have a dictionary
+on hand to always accurately determine when a word begins and ends. The
+defaults provided by the standard, though, already do a great job
+respecting the language's boundaries in the general case and are not too
+taxing in terms of performance.
 
 Author
 ------
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_decode_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_decode_utf8\(3\)/index.md"
new file mode 100644
index 00000000..5d717677
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_decode_utf8\(3\)/index.md"
_AT_@ -0,0 +1,80 @@
+ GRAPHEME_DECODE_UTF8(3) Library Functions Manual GRAPHEME_DECODE_UTF8(3)
+
+ NAME
+ grapheme_decode_utf8 – decode first codepoint in UTF-8-encoded string
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_decode_utf8(const char *str, size_t len, uint_least32_t *cp);
+
+ DESCRIPTION
+ The grapheme_decode_utf8() function decodes the first codepoint in the
+ UTF-8-encoded string str of length len. If the UTF-8-sequence is invalid
+ (overlong encoding, unexpected byte, string ends unexpectedly, empty
+ string, etc.) the decoding is stopped at the last processed byte and the
+ decoded codepoint set to GRAPHEME_INVALID_CODEPOINT.
+
+ If cp is not NULL the decoded codepoint is stored in the memory pointed
+ to by cp.
+
+ Given NUL has a unique 1 byte representation, it is safe to operate on
+ NUL-terminated strings by setting len to SIZE_MAX (stdint.h is already
+ included by grapheme.h) and terminating when cp is 0 (see EXAMPLES for an
+ example).
+
+ RETURN VALUES
+ The grapheme_decode_utf8() function returns the number of processed bytes
+ and 0 if str is NULL or len is 0. If the string ends unexpectedly in a
+ multibyte sequence, the desired length (that is larger than len) is
+ returned.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <inttypes.h>
+ #include <stdio.h>
+
+ void
+ print_cps(const char *str, size_t len)
+ {
+ size_t ret, off;
+ uint_least32_t cp;
+
+ for (off = 0; off < len; off += ret) {
+ if ((ret = grapheme_decode_utf8(str + off,
+ len - off, &cp)) > (len - off)) {
+ /*
+ * string ended unexpectedly in the middle of a
+ * multibyte sequence and we have the choice
+ * here to possibly expand str by ret - len + off
+ * bytes to get a full sequence, but we just
+ * bail out in this case.
+ */
+ break;
+ }
+ printf("%"PRIxLEAST32"
", cp);
+ }
+ }
+
+ void
+ print_cps_nul_terminated(const char *str)
+ {
+ size_t ret, off;
+ uint_least32_t cp;
+
+ for (off = 0; (ret = grapheme_decode_utf8(str + off,
+ SIZE_MAX, &cp)) > 0 &&
+ cp != 0; off += ret) {
+ printf("%"PRIxLEAST32"
", cp);
+ }
+ }
+
+ SEE ALSO
+ grapheme_encode_utf8(3), libgrapheme(7)
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_encode_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_encode_utf8\(3\)/index.md"
new file mode 100644
index 00000000..7ecf0e33
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_encode_utf8\(3\)/index.md"
_AT_@ -0,0 +1,87 @@
+ GRAPHEME_ENCODE_UTF8(3) Library Functions Manual GRAPHEME_ENCODE_UTF8(3)
+
+ NAME
+ grapheme_encode_utf8 – encode codepoint into UTF-8 string
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_encode_utf8(uint_least32_t cp, char *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_encode_utf8() function encodes the codepoint cp into a
+ UTF-8-string. If str is not NULL and len is large enough it writes the
+ UTF-8-string to the memory pointed to by str. Otherwise no data is
+ written.
+
+ RETURN VALUES
+ The grapheme_encode_utf8() function returns the length (in bytes) of the
+ UTF-8-string resulting from encoding cp, even if len is not large enough
+ or str is NULL.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stddef.h>
+ #include <stdlib.h>
+
+ size_t
+ cps_to_utf8(const uint_least32_t *cp, size_t cplen, char *str, size_t len)
+ {
+ size_t i, off, ret;
+
+ for (i = 0, off = 0; i < cplen; i++, off += ret) {
+ if ((ret = grapheme_encode_utf8(cp[i], str + off,
+ len - off)) > (len - off)) {
+ /* buffer too small */
+ break;
+ }
+ }
+
+ return off;
+ }
+
+ size_t
+ cps_bytelen(const uint_least32_t *cp, size_t cplen)
+ {
+ size_t i, len;
+
+ for (i = 0, len = 0; i < cplen; i++) {
+ len += grapheme_encode_utf8(cp[i], NULL, 0);
+ }
+
+ return len;
+ }
+
+ char *
+ cps_to_utf8_alloc(const uint_least32_t *cp, size_t cplen)
+ {
+ char *str;
+ size_t len, i, ret, off;
+
+ len = cps_bytelen(cp, cplen);
+
+ if (!(str = malloc(len))) {
+ return NULL;
+ }
+
+ for (i = 0, off = 0; i < cplen; i++, off += ret) {
+ if ((ret = grapheme_encode_utf8(cp[i], str + off,
+ len - off)) > (len - off)) {
+ /* buffer too small */
+ break;
+ }
+ }
+ str[off] = '+
+ return str;
+ }
+
+ SEE ALSO
+ grapheme_decode_utf8(3), libgrapheme(7)
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_character_break\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_character_break\(3\)/index.md"
new file mode 100644
index 00000000..dd4c323c
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_character_break\(3\)/index.md"
_AT_@ -0,0 +1,69 @@
+ GRAPHEME_IS_CHARACTER_BREAK(3) Library Functions Manual
+
+ NAME
+ grapheme_is_character_break – test for a grapheme cluster break between
+ two codepoints
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_character_break(uint_least32_t cp1, uint_least32_t cp2,
+ uint_least16_t *state);
+
+ DESCRIPTION
+ The grapheme_is_character_break() function determines if there is a
+ grapheme cluster break (see libgrapheme(7)) between the two codepoints
+ cp1 and cp2. By specification this decision depends on a state that can
+ at most be completely reset after detecting a break and must be reset
+ every time one deviates from sequential processing.
+
+ If state is NULL grapheme_is_character_break() behaves as if it was
+ called with a fully reset state.
+
+ RETURN VALUES
+ The grapheme_is_character_break() function returns true if there is a
+ grapheme cluster break between the codepoints cp1 and cp2 and false if
+ there is not.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stdint.h>
+ #include <stdio.h>
+ #include <stdlib.h>
+
+ int
+ main(void)
+ {
+ uint_least16_t state = 0;
+ uint_least32_t s1[] = ..., s2[] = ...; /* two input arrays */
+ size_t i;
+
+ for (i = 0; i + 1 < sizeof(s1) / sizeof(*s1); i++) {
+ if (grapheme_is_character_break(s[i], s[i + 1], &state)) {
+ printf("break in s1 at offset %zu0, i);
+ }
+ }
+ memset(&state, 0, sizeof(state)); /* reset state */
+ for (i = 0; i + 1 < sizeof(s2) / sizeof(*s2); i++) {
+ if (grapheme_is_character_break(s[i], s[i + 1], &state)) {
+ printf("break in s2 at offset %zu0, i);
+ }
+ }
+
+ return 0;
+ }
+
+ SEE ALSO
+ grapheme_next_character_break(3), grapheme_next_character_break_utf8(3),
+ libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_character_break() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase\(3\)/index.md"
new file mode 100644
index 00000000..465748b1
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_IS_LOWERCASE(3) Library Functions Manual GRAPHEME_IS_LOWERCASE(3)
+
+ NAME
+ grapheme_is_lowercase – check if codepoint array is lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_lowercase(const uint_least32_t *str, size_t len,
+ size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_lowercase() function checks if the codepoint array str is
+ lowercase and writes the length of the matching lowercase-sequence to the
+ integer pointed to by caselen, unless caselen is set to NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_is_lowercase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_is_lowercase() function returns true if the codepoint array
+ str is lowercase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_lowercase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_lowercase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..50098741
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_lowercase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,38 @@
+ GRAPHEME_IS_LOWERCASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_is_lowercase_utf8 – check if UTF-8-encoded string is lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_lowercase_utf8(const char *str, size_t len, size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_lowercase_utf8() function checks if the UTF-8-encoded
+ string str is lowercase and writes the length of the matching lowercase-
+ sequence to the integer pointed to by caselen, unless caselen is set to
+ NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_is_lowercase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_is_lowercase_utf8() function returns true if the
+ UTF-8-encoded string str is lowercase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_lowercase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_lowercase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase\(3\)/index.md"
new file mode 100644
index 00000000..13dada25
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_IS_TITLECASE(3) Library Functions Manual GRAPHEME_IS_TITLECASE(3)
+
+ NAME
+ grapheme_is_titlecase – check if codepoint array is titlecase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_titlecase(const uint_least32_t *str, size_t len,
+ size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_titlecase() function checks if the codepoint array str is
+ titlecase and writes the length of the matching titlecase-sequence to the
+ integer pointed to by caselen, unless caselen is set to NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_is_titlecase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_is_titlecase() function returns true if the codepoint array
+ str is titlecase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_titlecase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_titlecase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..d5a842f2
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_titlecase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,38 @@
+ GRAPHEME_IS_TITLECASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_is_titlecase_utf8 – check if UTF-8-encoded string is titlecase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_titlecase_utf8(const char *str, size_t len, size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_titlecase_utf8() function checks if the UTF-8-encoded
+ string str is titlecase and writes the length of the matching titlecase-
+ sequence to the integer pointed to by caselen, unless caselen is set to
+ NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_is_titlecase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_is_titlecase_utf8() function returns true if the
+ UTF-8-encoded string str is titlecase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_titlecase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_titlecase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase\(3\)/index.md"
new file mode 100644
index 00000000..b31f19b5
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_IS_UPPERCASE(3) Library Functions Manual GRAPHEME_IS_UPPERCASE(3)
+
+ NAME
+ grapheme_is_uppercase – check if codepoint array is uppercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_uppercase(const uint_least32_t *str, size_t len,
+ size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_uppercase() function checks if the codepoint array str is
+ uppercase and writes the length of the matching uppercase-sequence to the
+ integer pointed to by caselen, unless caselen is set to NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_is_uppercase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_is_uppercase() function returns true if the codepoint array
+ str is uppercase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_uppercase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_uppercase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..50098741
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_is_uppercase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,38 @@
+ GRAPHEME_IS_LOWERCASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_is_lowercase_utf8 – check if UTF-8-encoded string is lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_is_lowercase_utf8(const char *str, size_t len, size_t *caselen);
+
+ DESCRIPTION
+ The grapheme_is_lowercase_utf8() function checks if the UTF-8-encoded
+ string str is lowercase and writes the length of the matching lowercase-
+ sequence to the integer pointed to by caselen, unless caselen is set to
+ NULL.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_is_lowercase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_is_lowercase_utf8() function returns true if the
+ UTF-8-encoded string str is lowercase, otherwise false.
+
+ SEE ALSO
+ grapheme_is_lowercase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_is_lowercase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_character_break\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_character_break\(3\)/index.md"
new file mode 100644
index 00000000..37bc2c89
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_character_break\(3\)/index.md"
_AT_@ -0,0 +1,42 @@
+ GRAPHEME_NEXT_CHARACTER_BREAK(3) Library Functions Manual
+
+ NAME
+ grapheme_next_character_break – determine codepoint-offset to next
+ grapheme cluster break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_character_break(const uint_least32_t *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_character_break() function computes the offset (in
+ codepoints) to the next grapheme cluster break (see libgrapheme(7)) in
+ the codepoint array str of length len. If a grapheme cluster begins at
+ str this offset is equal to the length of said grapheme cluster.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a codepoint with the value 0 is encountered.
+
+ For UTF-8-encoded input data grapheme_next_character_break_utf8(3) can be
+ used instead.
+
+ RETURN VALUES
+ The grapheme_next_character_break() function returns the offset (in
+ codepoints) to the next grapheme cluster break in str or 0 if str is
+ NULL.
+
+ SEE ALSO
+ grapheme_is_character_break(3), grapheme_next_character_break_utf8(3),
+ libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_character_break() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_character_break_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_character_break_utf8\(3\)/index.md"
new file mode 100644
index 00000000..f884edf4
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_character_break_utf8\(3\)/index.md"
_AT_@ -0,0 +1,77 @@
+ GRAPHEME_NEXT_CHARACTER_BREAK_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_next_character_break_utf8 – determine byte-offset to next
+ grapheme cluster break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_character_break_utf8(const char *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_character_break_utf8() function computes the offset (in
+ bytes) to the next grapheme cluster break (see libgrapheme(7)) in the
+ UTF-8-encoded string str of length len. If a grapheme cluster begins at
+ str this offset is equal to the length of said grapheme cluster.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_is_character_break(3) and
+ grapheme_next_character_break(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_next_character_break_utf8() function returns the offset (in
+ bytes) to the next grapheme cluster break in str or 0 if str is NULL.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stdint.h>
+ #include <stdio.h>
+
+ int
+ main(void)
+ {
+ /* UTF-8 encoded input */
+ char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
+ "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
+ "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
+ "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
+ size_t ret, len, off;
+
+ printf("Input: \"%s\"
", s);
+
+ /* print each grapheme cluster with byte-length */
+ printf("grapheme clusters in NUL-delimited input:
");
+ for (off = 0; s[off] != '+ ret = grapheme_next_character_break_utf8(s + off, SIZE_MAX);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+ printf("
");
+
+ /* do the same, but this time string is length-delimited */
+ len = 17;
+ printf("grapheme clusters in input delimited to %zu bytes:
", len);
+ for (off = 0; off < len; off += ret) {
+ ret = grapheme_next_character_break_utf8(s + off, len - off);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+
+ return 0;
+ }
+
+ SEE ALSO
+ grapheme_next_character_break(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_character_break_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_line_break\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_line_break\(3\)/index.md"
new file mode 100644
index 00000000..74984b37
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_line_break\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_NEXT_LINE_BREAK(3) Library Functions Manual
+
+ NAME
+ grapheme_next_line_break – determine codepoint-offset to next possible
+ line break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_line_break(const uint_least32_t *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_line_break() function computes the offset (in
+ codepoints) to the next possible line break (see libgrapheme(7)) in the
+ codepoint array str of length len.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a codepoint with the value 0 is encountered.
+
+ For UTF-8-encoded input data grapheme_next_line_break_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_next_line_break() function returns the offset (in
+ codepoints) to the next possible line break in str or 0 if str is NULL.
+
+ SEE ALSO
+ grapheme_next_line_break_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_line_break() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_line_break_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_line_break_utf8\(3\)/index.md"
new file mode 100644
index 00000000..c558caca
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_line_break_utf8\(3\)/index.md"
_AT_@ -0,0 +1,75 @@
+ GRAPHEME_NEXT_LINE_BREAK_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_next_line_break_utf8 – determine byte-offset to next possible
+ line break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_line_break_utf8(const char *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_line_break_utf8() function computes the offset (in
+ bytes) to the next possible line break (see libgrapheme(7)) in the
+ UTF-8-encoded string str of length len.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_next_line_break(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_next_line_break_utf8() function returns the offset (in
+ bytes) to the next possible line break in str or 0 if str is NULL.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stdint.h>
+ #include <stdio.h>
+
+ int
+ main(void)
+ {
+ /* UTF-8 encoded input */
+ char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
+ "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
+ "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
+ "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
+ size_t ret, len, off;
+
+ printf("Input: \"%s\"
", s);
+
+ /* print each possible line with byte-length */
+ printf("possible lines in NUL-delimited input:
");
+ for (off = 0; s[off] != '+ ret = grapheme_next_line_break_utf8(s + off, SIZE_MAX);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+ printf("
");
+
+ /* do the same, but this time string is length-delimited */
+ len = 17;
+ printf("possible lines in input delimited to %zu bytes:
", len);
+ for (off = 0; off < len; off += ret) {
+ ret = grapheme_next_line_break_utf8(s + off, len - off);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+
+ return 0;
+ }
+
+ SEE ALSO
+ grapheme_next_line_break(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_line_break_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break\(3\)/index.md"
new file mode 100644
index 00000000..13bc08c5
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break\(3\)/index.md"
_AT_@ -0,0 +1,40 @@
+ GRAPHEME_NEXT_SENTENCE_BREAK(3) Library Functions Manual
+
+ NAME
+ grapheme_next_sentence_break – determine codepoint-offset to next
+ sentence break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_sentence_break(const uint_least32_t *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_sentence_break() function computes the offset (in
+ codepoints) to the next sentence break (see libgrapheme(7)) in the
+ codepoint array str of length len. If a sentence begins at str this
+ offset is equal to the length of said sentence.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a codepoint with the value 0 is encountered.
+
+ For UTF-8-encoded input data grapheme_next_sentence_break_utf8(3) can be
+ used instead.
+
+ RETURN VALUES
+ The grapheme_next_sentence_break() function returns the offset (in
+ codepoints) to the next sentence break in str or 0 if str is NULL.
+
+ SEE ALSO
+ grapheme_next_sentence_break_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_sentence_break() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break_utf8\(3\)/index.md"
new file mode 100644
index 00000000..875f134d
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_sentence_break_utf8\(3\)/index.md"
_AT_@ -0,0 +1,77 @@
+ GRAPHEME_NEXT_SENTENCE_BREAK_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_next_sentence_break_utf8 – determine byte-offset to next
+ sentence break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_sentence_break_utf8(const char *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_sentence_break_utf8() function computes the offset (in
+ bytes) to the next sentence break (see libgrapheme(7)) in the
+ UTF-8-encoded string str of length len. If a sentence begins at str this
+ offset is equal to the length of said sentence.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_next_sentence_break(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_next_sentence_break_utf8() function returns the offset (in
+ bytes) to the next sentence break in str or 0 if str is NULL.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stdint.h>
+ #include <stdio.h>
+
+ int
+ main(void)
+ {
+ /* UTF-8 encoded input */
+ char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
+ "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
+ "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
+ "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
+ size_t ret, len, off;
+
+ printf("Input: \"%s\"
", s);
+
+ /* print each sentence with byte-length */
+ printf("sentences in NUL-delimited input:
");
+ for (off = 0; s[off] != '+ ret = grapheme_next_sentence_break_utf8(s + off, SIZE_MAX);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+ printf("
");
+
+ /* do the same, but this time string is length-delimited */
+ len = 17;
+ printf("sentences in input delimited to %zu bytes:
", len);
+ for (off = 0; off < len; off += ret) {
+ ret = grapheme_next_sentence_break_utf8(s + off, len - off);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+
+ return 0;
+ }
+
+ SEE ALSO
+ grapheme_next_sentence_break(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_sentence_break_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_word_break\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_word_break\(3\)/index.md"
new file mode 100644
index 00000000..f59f1cc5
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_word_break\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_NEXT_WORD_BREAK(3) Library Functions Manual
+
+ NAME
+ grapheme_next_word_break – determine codepoint-offset to next word break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_word_break(const uint_least32_t *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_word_break() function computes the offset (in
+ codepoints) to the next word break (see libgrapheme(7)) in the codepoint
+ array str of length len. If a word begins at str this offset is equal to
+ the length of said word.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a codepoint with the value 0 is encountered.
+
+ For UTF-8-encoded input data grapheme_next_word_break_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_next_word_break() function returns the offset (in
+ codepoints) to the next word break in str or 0 if str is NULL.
+
+ SEE ALSO
+ grapheme_next_word_break_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_word_break() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_next_word_break_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_next_word_break_utf8\(3\)/index.md"
new file mode 100644
index 00000000..c77ca5dd
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_next_word_break_utf8\(3\)/index.md"
_AT_@ -0,0 +1,75 @@
+ GRAPHEME_NEXT_WORD_BREAK_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_next_word_break_utf8 – determine byte-offset to next word break
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_next_word_break_utf8(const char *str, size_t len);
+
+ DESCRIPTION
+ The grapheme_next_word_break_utf8() function computes the offset (in
+ bytes) to the next word break (see libgrapheme(7)) in the UTF-8-encoded
+ string str of length len. If a word begins at str this offset is equal
+ to the length of said word.
+
+ If len is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the string str is interpreted to be NUL-terminated and processing stops
+ when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_next_word_break(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_next_word_break_utf8() function returns the offset (in
+ bytes) to the next word break in str or 0 if str is NULL.
+
+ EXAMPLES
+ /* cc (-static) -o example example.c -lgrapheme */
+ #include <grapheme.h>
+ #include <stdint.h>
+ #include <stdio.h>
+
+ int
+ main(void)
+ {
+ /* UTF-8 encoded input */
+ char *s = "T\xC3\xABst \xF0\x9F\x91\xA8\xE2\x80\x8D\xF0"
+ "\x9F\x91\xA9\xE2\x80\x8D\xF0\x9F\x91\xA6 \xF0"
+ "\x9F\x87\xBA\xF0\x9F\x87\xB8 \xE0\xA4\xA8\xE0"
+ "\xA5\x80 \xE0\xAE\xA8\xE0\xAE\xBF!";
+ size_t ret, len, off;
+
+ printf("Input: \"%s\"
", s);
+
+ /* print each word with byte-length */
+ printf("words in NUL-delimited input:
");
+ for (off = 0; s[off] != '+ ret = grapheme_next_word_break_utf8(s + off, SIZE_MAX);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+ printf("
");
+
+ /* do the same, but this time string is length-delimited */
+ len = 17;
+ printf("words in input delimited to %zu bytes:
", len);
+ for (off = 0; off < len; off += ret) {
+ ret = grapheme_next_word_break_utf8(s + off, len - off);
+ printf("%2zu bytes | %.*s
", ret, (int)ret, s + off, ret);
+ }
+
+ return 0;
+ }
+
+ SEE ALSO
+ grapheme_next_word_break(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_next_word_break_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase\(3\)/index.md"
new file mode 100644
index 00000000..31d4f097
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase\(3\)/index.md"
_AT_@ -0,0 +1,40 @@
+ GRAPHEME_TO_LOWERCASE(3) Library Functions Manual GRAPHEME_TO_LOWERCASE(3)
+
+ NAME
+ grapheme_to_lowercase – convert codepoint array to lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_lowercase(const uint_least32_t *src, size_t srclen,
+ uint_least32_t *dest, size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_lowercase() function converts the codepoint array str to
+ lowercase and writes the result to dest up to destlen, unless dest is set
+ to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_to_lowercase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_to_lowercase() function returns the number of codepoints in
+ the array resulting from converting src to lowercase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_lowercase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_lowercase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..6ee79dc2
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_lowercase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_TO_LOWERCASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_to_lowercase_utf8 – convert UTF-8-encoded string to lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_lowercase_utf8(const char *src, size_t srclen, char *dest,
+ size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_lowercase_utf8() function converts the UTF-8-encoded
+ string str to lowercase and writes the result to dest up to destlen,
+ unless dest is set to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_to_lowercase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_to_lowercase_utf8() function returns the number of bytes in
+ the array resulting from converting src to lowercase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_lowercase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_lowercase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase\(3\)/index.md"
new file mode 100644
index 00000000..f51ad420
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase\(3\)/index.md"
_AT_@ -0,0 +1,40 @@
+ GRAPHEME_TO_TITLECASE(3) Library Functions Manual GRAPHEME_TO_TITLECASE(3)
+
+ NAME
+ grapheme_to_titlecase – convert codepoint array to titlecase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_titlecase(const uint_least32_t *src, size_t srclen,
+ uint_least32_t *dest, size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_titlecase() function converts the codepoint array str to
+ titlecase and writes the result to dest up to destlen, unless dest is set
+ to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_to_titlecase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_to_titlecase() function returns the number of codepoints in
+ the array resulting from converting src to titlecase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_titlecase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_titlecase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..d86fd96e
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_titlecase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_TO_TITLECASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_to_titlecase_utf8 – convert UTF-8-encoded string to titlecase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_titlecase_utf8(const char *src, size_t srclen, char *dest,
+ size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_titlecase_utf8() function converts the UTF-8-encoded
+ string str to titlecase and writes the result to dest up to destlen,
+ unless dest is set to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_to_titlecase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_to_titlecase_utf8() function returns the number of bytes in
+ the array resulting from converting src to titlecase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_titlecase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_titlecase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase\(3\)/index.md"
new file mode 100644
index 00000000..6e6bfd38
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase\(3\)/index.md"
_AT_@ -0,0 +1,40 @@
+ GRAPHEME_TO_UPPERCASE(3) Library Functions Manual GRAPHEME_TO_UPPERCASE(3)
+
+ NAME
+ grapheme_to_uppercase – convert codepoint array to uppercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_uppercase(const uint_least32_t *src, size_t srclen,
+ uint_least32_t *dest, size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_uppercase() function converts the codepoint array str to
+ uppercase and writes the result to dest up to destlen, unless dest is set
+ to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the codepoint array src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For UTF-8-encoded input data grapheme_to_uppercase_utf8(3) can be used
+ instead.
+
+ RETURN VALUES
+ The grapheme_to_uppercase() function returns the number of codepoints in
+ the array resulting from converting src to uppercase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_uppercase_utf8(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_uppercase() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase_utf8\(3\)/index.md" "b/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase_utf8\(3\)/index.md"
new file mode 100644
index 00000000..6ee79dc2
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/grapheme_to_uppercase_utf8\(3\)/index.md"
_AT_@ -0,0 +1,39 @@
+ GRAPHEME_TO_LOWERCASE_UTF8(3) Library Functions Manual
+
+ NAME
+ grapheme_to_lowercase_utf8 – convert UTF-8-encoded string to lowercase
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ size_t
+ grapheme_to_lowercase_utf8(const char *src, size_t srclen, char *dest,
+ size_t destlen);
+
+ DESCRIPTION
+ The grapheme_to_lowercase_utf8() function converts the UTF-8-encoded
+ string str to lowercase and writes the result to dest up to destlen,
+ unless dest is set to NULL.
+
+ If srclen is set to SIZE_MAX (stdint.h is already included by grapheme.h)
+ the UTF-8-encoded string src is interpreted to be NUL-terminated and
+ processing stops when a NUL-byte is encountered.
+
+ For non-UTF-8 input data grapheme_to_lowercase(3) can be used instead.
+
+ RETURN VALUES
+ The grapheme_to_lowercase_utf8() function returns the number of bytes in
+ the array resulting from converting src to lowercase, even if destlen is
+ not large enough or dest is NULL.
+
+ SEE ALSO
+ grapheme_to_lowercase(3), libgrapheme(7)
+
+ STANDARDS
+ grapheme_to_lowercase_utf8() is compliant with the Unicode 15.0.0
+ specification.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
diff --git "a/libs.suckless.org/libgrapheme/man/libgrapheme\(7\)/index.md" "b/libs.suckless.org/libgrapheme/man/libgrapheme\(7\)/index.md"
new file mode 100644
index 00000000..d97f46ec
--- /dev/null
+++ "b/libs.suckless.org/libgrapheme/man/libgrapheme\(7\)/index.md"
_AT_@ -0,0 +1,122 @@
+ LIBGRAPHEME(7) Miscellaneous Information Manual LIBGRAPHEME(7)
+
+ NAME
+ libgrapheme – unicode string library
+
+ SYNOPSIS
+ #include <grapheme.h>
+
+ DESCRIPTION
+ The libgrapheme library provides functions to properly handle Unicode
+ strings according to the Unicode specification in regard to character,
+ word, sentence and line segmentation and case detection and conversion.
+
+ Unicode strings are made up of user-perceived characters (so-called
+ “grapheme clusters”, see MOTIVATION) that are composed of one or more
+ Unicode codepoints, which in turn are encoded in one or more bytes in an
+ encoding like UTF-8.
+
+ There is a widespread misconception that it was enough to simply
+ determine codepoints in a string and treat them as user-perceived
+ characters to be Unicode compliant. While this may work in some cases,
+ this assumption quickly breaks, especially for non-Western languages and
+ decomposed Unicode strings where user-perceived characters are usually
+ represented using multiple codepoints.
+
+ Despite this complicated multilevel structure of Unicode strings,
+ libgrapheme provides methods to work with them at the byte-level (i.e.
+ UTF-8 ‘char’ arrays) while also offering codepoint-level methods.
+ Additionally, it is a “freestanding” library (see ISO/IEC 9899:1999
+ section 4.6) and thus does not depend on a standard library. This makes
+ it easy to use in bare metal environments.
+
+ Every documented function's manual page provides a self-contained example
+ illustrating the possible usage.
+
+ SEE ALSO
+ grapheme_decode_utf8(3), grapheme_encode_utf8(3),
+ grapheme_is_character_break(3), grapheme_is_lowercase(3),
+ grapheme_is_lowercase_utf8(3), grapheme_is_titlecase(3),
+ grapheme_is_titlecase_utf8(3), grapheme_is_uppercase(3),
+ grapheme_is_uppercase_utf8(3), grapheme_next_character_break(3),
+ grapheme_next_character_break_utf8(3), grapheme_next_line_break(3),
+ grapheme_next_line_break_utf8(3), grapheme_next_sentence_break(3),
+ grapheme_next_sentence_break_utf8(3), grapheme_next_word_break(3),
+ grapheme_next_word_break_utf8(3), grapheme_to_lowercase(3),
+ grapheme_to_lowercase_utf8(3), grapheme_to_titlecase(3),
+ grapheme_to_titlecase_utf8(3) grapheme_to_uppercase(3),
+ grapheme_to_uppercase_utf8(3),
+
+ STANDARDS
+ libgrapheme is compliant with the Unicode 15.0.0 specification.
+
+ MOTIVATION
+ The idea behind every character encoding scheme like ASCII or Unicode is
+ to express abstract characters (which can be thought of as shapes making
+ up a written language). ASCII for instance, which comprises the range 0
+ to 127, assigns the number 65 (0x41) to the abstract character ‘A’. This
+ number is called a “codepoint”, and all codepoints of an encoding make up
+ its so-called “code space”.
+
+ Unicode's code space is much larger, ranging from 0 to 0x10FFFF, but its
+ first 128 codepoints are identical to ASCII's. The additional code points
+ are needed as Unicode's goal is to express all writing systems of the
+ world. To give an example, the abstract character ‘Ä’ is not expressable
+ in ASCII, given no ASCII codepoint has been assigned to it. It can be
+ expressed in Unicode, though, with the codepoint 196 (0xC4).
+
+ One may assume that this process is straightfoward, but as more and more
+ codepoints were assigned to abstract characters, the Unicode Consortium
+ (that defines the Unicode standard) was facing a problem: Many (mostly
+ non-European) languages have such a large amount of abstract characters
+ that it would exhaust the available Unicode code space if one tried to
+ assign a codepoint to each abstract character. The solution to that
+ problem is best introduced with an example: Consider the abstract
+ character ‘Ǟ’, which is ‘A’ with an umlaut and a macron added to it. In
+ this sense, one can consider ‘Ǟ’ as a two-fold modification (namely “add
+ umlaut” and “add macron”) of the “base character” ‘A’.
+
+ The Unicode Consortium adapted this idea by assigning codepoints to
+ modifications. For example, the codepoint 0x308 represents adding an
+ umlaut and 0x304 represents adding a macron, and thus, the codepoint
+ sequence “0x41 0x308 0x304”, namely the base character ‘A’ followed by
+ the umlaut and macron modifiers, represents the abstract character ‘Ǟ’.
+ As a side-note, the single codepoint 0x1DE was also assigned to ‘Ǟ’,
+ which is a good example for the fact that there can be multiple
+ representations of a single abstract character in Unicode.
+
+ Expressing a single abstract character with multiple codepoints solved
+ the code space exhaustion-problem, and the concept has been greatly
+ expanded since its first introduction (emojis, joiners, etc.). A sequence
+ (which can also have the length 1) of codepoints that belong together
+ this way and represents an abstract character is called a “grapheme
+ cluster”.
+
+ In many applications it is necessary to count the number of user-
+ perceived characters, i.e. grapheme clusters, in a string. A good
+ example for this is a terminal text editor, which needs to properly align
+ characters on a grid. This is pretty simple with ASCII-strings, where
+ you just count the number of bytes (as each byte is a codepoint and each
+ codepoint is a grapheme cluster). With Unicode-strings, it is a common
+ mistake to simply adapt the ASCII-approach and count the number of code
+ points. This is wrong, as, for example, the sequence “0x41 0x308 0x304”,
+ while made up of 3 codepoints, is a single grapheme cluster and
+ represents the user-perceived character ‘Ǟ’.
+
+ The proper way to segment a string into user-perceived characters is to
+ segment it into its grapheme clusters by applying the Unicode grapheme
+ cluster breaking algorithm (UAX #29). It is based on a complex ruleset
+ and lookup-tables and determines if a grapheme cluster ends or is
+ continued between two codepoints. Libraries like ICU and libunistring,
+ which also offer this functionality, are often bloated, not correct,
+ difficult to use or not reasonably statically linkable.
+
+ Analogously, the standard provides algorithms to separate strings by
+ words, sentences and lines, convert cases and compare strings. The
+ motivation behind libgrapheme is to make unicode handling suck less and
+ abide by the UNIX philosophy.
+
+ AUTHORS
+ Laslo Hunhold <dev_AT_frign.de>
+
+ suckless.org 2022-10-06 suckless.org
Received on Thu Oct 06 2022 - 22:09:41 CEST

This archive was generated by hypermail 2.3.0 : Thu Oct 06 2022 - 22:12:54 CEST