Silvan Jegen dixit:
>Wouldn't a 16-bit wchar_t be non-standard-conform when using a UTF-8
>locale?
Nope. UTF-8 is just an encoding for Unicode, and as long as I take
care to #define __STDC_ISO_10646__ 200009L (and no later date) this
is perfectly permissible.
(And please do not language-lawyer me, I’ve had enough of those,
and since I can prove that 100% POSIX compliance is probably illegal
in my country, I don’t care, even.)
>So the problem seems to be that binary files contain bytes that are not
>valid UTF-8 and that using tools on them that expect UTF-8 will mangle
>these files.
No. The problem is that “using tools that use the wchar_t API” will
mangle them _iff_ the locale is UTF-8.
So if your C locale is UTF-8, you *will* break all kinds of things,
since “env LC_ALL=C tr x x <binfile” is supposed to retain the binary
input unchanged.
This just means that your C locale cannot be strictly UTF-8. All
others can, but the C locale is precisely for this. This is because
the C locale is special like that.
bye,
//mirabilos
--
13:37⎜«Natureshadow» Deep inside, I hate mirabilos. I mean, he's a good
guy. But he's always right! In every fsckin' situation, he's right. Even
with his deeply perverted taste in software and borked ambition towards
broken OSes - in the end, he's damn right about it :(! […] works in mksh
Received on Thu Dec 26 2013 - 00:01:13 CET