Hi,
On Fri, Jun 27, 2014 at 04:54:08PM -0500, Eric Pruitt wrote:
> I noticed that in st, combined Unicode characters don't seem to be
> preserved in memory. For example, if I run "printf 'AB\xcd\x9dCDE\n'" in
> a Xterm then select the resulting line, I the clipboard data includes
> the Unicode sequence:
>
> ~% echo $TERM
> xterm-256color
> ~% printf 'AB\xcd\x9dCDE\n'
> AB͝CDE
> ~% xclip -o | xxd
> 0000000: 4142 cd9d 4344 450a AB..CDE.
>
> However, with st, the sequence vanishes:
>
> ~% echo $TERM
> st-256color
> ~% printf 'AB\xcd\x9dCDE\n'
> ABCDE
> ~% xclip -o | xxd
> 0000000: 4142 4344 450a ABCDE.
>
> Urxvt's behaviour is also the same as Xterm with an added bonus: it
> actually renders the combined Unicode sequence where as on Xterm and st,
> the tie character is not visible (although if you paste "AB\u035d" into
> st with no other trailing characters, the tie appears albeit glitchily).
>
> I don't have a patch or any immediate plans to look into patching it but
> perhaps improve Unicode support could be added to the TODO list.
From what I see, these kind of characters are simply ignored by st. I
don’t know if this is by design or by default but, in tputc(), wcwidth()
on these kind of characters will return 0, the character will be copied
at the current position and then the cursor will move of wcwidth, which
is 0. So then it will be overwritten by the next character.
I don’t know if this has already been discussed but this kind of
characters really seems to be a hassle to support: variable terminal
line, potentially more than twice as long as they actually are
(potentially arbitrary long?), huge pain to draw correctly…
So this seems to be a really good example of the kind of sucky feature
we don’t want to add. But I agree with you, if we want to have a true
and accurate support of Unicode, we should have this.
--
Ivan "Colona" Delalande
Received on Sat Jun 28 2014 - 06:08:08 CEST