Re: [dev] sed breaks utf8 in [ ]

From: Evan Gates <evan.gates_AT_gmail.com>
Date: Mon, 30 Mar 2015 08:50:11 -0700

On Sat, Mar 28, 2015 at 2:04 PM, Dimitris Papastamos <sin_AT_2f30.org> wrote:
> On Sat, Mar 28, 2015 at 09:48:08PM +0100, isabella parakiss wrote:
>> Please fix
>>
>> $ sed 's/[à]/x/' <<< è
>> x¨
>
> Interestingly, sbase sed linked with musl gives the correct result.
>
> Will look into it.
>

The problem is using glibc's regex engine without first calling
setlocale to ensure a UTF-8 locale. This causes it to remain in the
C/POSIX locale. This will effect the same problem in all tools that
use the libc's regex engine (expr, grep, nl, sed). No good clean
solution comes to mind yet, I'll keep thinking about it. Any ideas?

-emg
Received on Mon Mar 30 2015 - 17:50:11 CEST

This archive was generated by hypermail 2.3.0 : Mon Mar 30 2015 - 18:00:12 CEST