[dev] [PATCH] [st] Fix issues with wcwidth() returning -1 for unsupported unicode chars

From: dequis <dx_AT_dxzone.com.ar>
Date: Sat, 25 Oct 2014 23:15:24 -0300

Hi suckless! First, thanks for st. Been using it for a long while,
still impressed at how it gets a lot of stuff right - stuff that urxvt
failed miserably at. There's only one issue that has been bothering me

The issue itself:

Unicode characters added since unicode 5.2 (released in 2009, the
latest revision[1] is 7.0) are not supported by the wcwidth()
implementation of glibc, and as a result, they behave weirdly in st.
The man page of wcwidth() specifies that -1 is returned for invalid
unicode characters. I found a stack overflow question[2] about this
same issue.

How st handles it:

I made a gif[3] showing its behavior.

It just offsets the columns by the value returned by wcwidth,
expecting either 1 or 2, not -1. So each unsupported unicode character
behaves like a printable backspace.

Picked U+0524[4] for the tests. The st on the top shows the current
behavior, the st on the bottom is my patched version. The first two
lines typed in the gif are spaces followed by that character. Third
line is the letter 'a' just to show how it overlaps.

Then I used a tmux keybinding that is supposed to scan for URLs, but
the main effect here is refreshing the terminal contents, which makes
those characters vanish. That z^H is a typo, ignore that.

My patch:

Just wcwidth(...) -> abs(wcwidth(...))

In other words: if wcwidth returns -1, interpret that as a column
width of 1. It's a bit dirty and lazy, but it works wonderfully for
most characters.

I'm not sure what the "correct" solution would be, but it's definitely
not something as simple as this - would mean fixing the libc to
support unicode up to 7.0, or implementing our own version of it.


[1]: http://www.fileformat.info/info/unicode/version/index.htm
[2]: http://stackoverflow.com/questions/16371418/why-does-wcwidth-return-1-with-a-sign-that-i-can-print-on-the-terminal
[3]: http://i.imgur.com/MDzMJJH.gif
[4]: http://www.fileformat.info/info/unicode/char/0524/index.htm

Received on Sun Oct 26 2014 - 03:15:24 CET

This archive was generated by hypermail 2.3.0 : Sun Oct 26 2014 - 03:24:07 CET