On Fri, Jan 9, 2015, at 16:44, Nick wrote:
> Quoth FRIGN:
> > - UTF-8: not allowed in POSIX, but in my opinion a must. This
> > finally allows you to work with UTF-8 streams without
> > problems or unexpected behaviour.
>
> I fully agree (unsurprisingly). Anything that relies on the POSIX
> behaviour to do weird things involving multibyte characters is
> insane.
Er...
http://pubs.opengroup.org/onlinepubs/009696899/utilities/tr.html
has very little mention of the issue one way or another, but does use
the term "characters" rather than "bytes" in all relevant places, and
talks about "multi-byte characters" in a tone that suggests they should
be supported properly when LC_CTYPE has them.
The only _questionable_ bits are some of the language surrounding the
use of octal sequences:
For single characters: "Multi-byte characters require multiple,
concatenated escape sequences of this type, including the leading '\'
for each byte."
I read this as meaning that multi-byte characters are supported, and in
fact that "tr '\303\266o' 'o\303\266' means that \303\266 [two escape
sequences representing one multi-byte character] and o will be swapped -
and that it is not possible to specify multibyte characters with octal
values a dash-separated range specification (but they can be included as
literals).
Or, is it possible that FRIGN misinterpreted the prohibition on
"multi-character collating elements" ?
Received on Fri Jan 09 2015 - 23:41:19 CET