Re: [dev] [libgrapheme] Some questions about libgrapheme

From: <atrtarget_AT_cock.li>
Date: Fri, 02 Sep 2022 20:04:35 -0300

Hi!


This is a really good suggestion, but I think it may add a lot of
overhead
since it would need to go through the entire buffer, and since moving
the
cursor is not very frequent (not more than changing you position or
opening a new buffer), I think it would be better to do it the "lazy"
way.
However, thanks for pointing out a solution, I guess it would be really
good for some other situations
> 1. Regarding stepping backwards throught the graphemes:
>
> As Laslo explained, trying to find the starting point of the previous
> grapheme is simply not possible.
> In your situation, if scanning from the front of the string is too
> inefficient for you, you could try keeping
> a bitfield in addition to the string, with one bit for each char of the
> string.
> A 1 in the bitfield means 'this char is the start of a new grapheme',
> 0 is the opposite.
> Every time the string changes, the bitfield is recomputed.
> This way, moving the cursor left or right in a text editor is just a
> matter of finding the next
> or previous set bit in the bitfield, which is extremely cheap.


https://github.com/vim/vim/blob/master/src/libvterm/find-wide-chars.pl
https://github.com/vim/vim/blob/master/src/libvterm/src/fullwidth.inc

I am not 100% sure but it looks like vim goes by the old way. There are
also some comments on this file about it:

https://github.com/vim/vim/blob/master/src/libvterm/src/unicode.c


https://github.com/tmux/tmux/blob/master/utf8.c

tmux seems to go even lazier by using `wcwidth` itself and btw, they
seem to have dropped support for systems who don't support it too:

https://github.com/tmux/tmux/pull/3003


Even neovim seems to use the hack:

https://github.com/neovim/neovim/blob/master/src/unicode/EastAsianWidth.txt


> I guess the only robust approach is to render the character on the
> terminal, and then read back by how much the
> cursor was advanced.

This looks like a good idea, the problem is that I'm not sure if most
terminals will return the actual position in the grid or the number
of graphemes or code points, since it seems like it is not specified
in VT* or in xterm. But as long as this applies to /most/ terminals I
think it's fine, or at least better than wcwidth

> 2. Regarding the avoidance of terminal linewrap:
>
> AFAIK there's no proper way to query the display width of a character.
> It definitely depends on the font though.
> I guess the only robust approach is to render the character on the
> terminal, and then read back by how much the
> cursor was advanced.
> So perhaps you could try to render the whole line, detect when a line
> overflow happens in the terminal based on
> the cursor position, and then react accordingly.
> It would be interesting to know how (or even if!) other software such
> as tmux or vim has solved this issue.


Thank you a lot for helping me!
Received on Sat Sep 03 2022 - 01:04:35 CEST

This archive was generated by hypermail 2.3.0 : Sat Sep 03 2022 - 01:12:08 CEST