Thorsten Glaser
Date: Sat, 30 Nov 2013

Silvan Jegen dixit:

>That sounds reasonable but requires that we convert UTF-8 to UTF-32
>which should not be strictly necessary when we only map one UTF-8 value
>to another.

Arrgh, no. UTF-8 and UTF-32/UCS-4 are encodings of numerical Unicode
codepoints. When working with text documents, you always operate on
those codepoints. This was true for single-byte encodings as well,
except there, the codepoints always fit into bytes.

