> from the source code. xterm has several related options (-lc to adapt the
> encoding to the locale used, -u8 to force UTF-8, and others). I glanced
As far as I know xterm uses an external program for it luit(1).
> st does not use termios(3), and does not seem to do anything special
> depending on the locale encoding. I'm certainly missing something, because I
> don't understand how you can have iutf8 enabled in st by default!
I think it depends of what are the default stty flags in your system.
I know that linux vt begins in non unicode mode, and the startup
scripts have to call unicode_start to switch to unicode.
> Yes, I saw after I had sent my message that iutf8 is not POSIX. Does the
> erase character work correctly with multi-byte characters in cat or ed on
> your OpenBSD machine?
This is even more strange. If I run t from dwm shortcut:
static const char *termcmd[] = { "/usr/local/bin/st", "-e", "utmp", NULL };
when I try to put a non ascii leter I get them in latin1 encode. If
I try running /usr/local/bin/st -e /usr/local/bin/utmp from command
line (from a st terminal opened with dwm) I get them in utf8
encode. If I execute xterm from dmenu shortcut of dwm I get
latin1 encode, but if I execute it from command line I get
ut8 encode. If I execute (in command line execution of st or xterm):
$ touch f.txt
$ ed f.txt <<EOF
> a
> á
> .
> w
> q
> EOF
0
3
$ hexdump f.txt
0000000 a1c3 000a
0000003
$
That is correct.
If I execute (again in command line execution):
$ stty erase ^H
$ touch f.txt
$ ed f.txt <<EOF
> a
> á^H
> .
> w
> q
> EOF
0
4
$ hexdump f.txt
0000000 a1c3 0a08
0000004
$
is not interpreted, and it is correct because
the input of ed doesn't travel across the line driver.
If I execute ed without the here document (again in command line
execution).
$ stty erase ^H
$ touch f.txt
$ ed f.txt
0
a
.
w
2
q
$ hexdump f.txt
0000000 0ac3
0000002
$
That is incorrect, but I get the same output with st and
with xterm.
If I try the program using the terminal emulator of the OpenBSD
kernel I get:
$ hexdump f.txt
0000000 000a
0000001
$
That is correct, but this terminal emulator runs in latin1
encode (and as far as I know, there is no way of changing it).
I am not sure what is happening, but I have two things clear:
- dwm is doing something wrong because terminals launched by it
get an incorrect encoding in input characters.
- OpenBSD tty driver doesn't handle utf8 encoding correctly.
I will repeat these test tomorrow with linux.
Regards,
--
Roberto E. Vargas Caballero
Received on Mon Aug 04 2014 - 23:00:55 CEST