Re: [dev] sed breaks utf8 in [ ]

From: Alex Pilon <alp_AT_alexpilon.ca>
Date: Mon, 30 Mar 2015 21:49:58 -0400

On Mon, Mar 30, 2015 at 07:09:41PM -0400, Roger wrote:
> I thought non-ASCII characters required 16 bits within UTF-8, versus
> just 8 bits for ASCII.

1. ASCII is a 7-bit encoding that we store in 8-bit bytes.
2. You don't encode non-ASCII with ASCII. That seems to be your logic.
   "I thought non-ASCII […] versus just 8 bits for ASCII". Beg your
   pardon? Try `printf '\xc3\xa9\n' | LANG=C LOCALE=C less`. Don't you
   see gibberish?
3. You're thinking of encodings like latin-1 (iso 8859-1). Please go
   read up. The Linux man-pages project provides charsets(7), utf-8(7),
   ascii(7), and iso_8859-1(7). I'm sure Wikipedia also has the
   information you require.

Just because *you*, the anglophone American don't use characters œ
(LATIN SMALL LIGATURE OE), ñ (LATIN SMALL LETTER N WITH TILDE), or ß
(LATIN SMALL LETTER S SHARP) doesn't mean that you should be allowed to
restrict the encodings that your software can handle. Having to support
multiple encodings brought on this mess. I, the francophone very much
wants to be able to spell eggs properly and a large number of other
words, the subset of 37 million Spanish speakers in your country should
be enabled to properly spell any word with énye, and the same goes for
the majority of the world, which just so happens to NOT speak English.

So my non-ASCII characters look incorrect in your MUA because you're
stuck in an insane locale? Too bad. That pain is self-inflicted.

Received on Tue Mar 31 2015 - 03:49:58 CEST

This archive was generated by hypermail 2.3.0 : Tue Mar 31 2015 - 04:00:12 CEST