Re: [hackers] st and combining characters from Joakim Sindholt on 2015-09-17 (hackers mail list archive)

From: Joakim Sindholt <opensource_AT_zhasha.com>
Date: Thu, 17 Sep 2015 12:20:12 +0200

On Mon, 2015-06-15 at 10:25 +0200, Roberto E. Vargas Caballero wrote:
> Hi,

Well, it's been a really long time and I thought I'd finally get around
to summarizing what was agreed upon way back when.

> I like the idea, but I think the patch needs some evolution.
> A patch of 500 lines is usually hard of reading, and in in this case
> the change is not trivial.
>
> On Wed, Jun 10, 2015 at 11:39:25PM +0200, Joakim Sindholt wrote:
> > glyph now holds a union of two combining characters and a pointer to a zero
> > terminated array. This is purely to make the most of the space. No length is
> > kept and every character added past 2 causes a realloc.
>
> And why 2 characters?. I think it is more logical to have only one, which is
> the most common case, isn't it?. Maybe I'm missing something.

Because of space. I would wager that most computers have 64-bit pointers
nowadays, so to optimally use the space before going full malloc, we can
have one base character plus two combining characters. However one base
plus one combining is by far the most common in western scripts.

> > There's currently a problem with rendering where Xft doesn't draw the
> > combining characters as a part of the character they combine to, which causes
> > things like Hangul Jamo to be completely unusable stacks of characters instead
> > of the proper combined form.
>
> Uhmm, I don't understand this point, can you explain it a bit better?,
> or better, put an example.

I used 㐻o͜ò떫 as my test characters (see http://sprunge.us/BIac if it
didn't render right for you). It should be a double width Chinese
character, followed by two o's, with a double width combining horizontal
end parenthesis under them both (not just the left one as with my
patch) and a ` over the second one, followed by a combining Hangul Jamo
character (not a box with a line through it as it renders with my
patch).

Since I really don't have time to tackle this properly, I'll leave
anyone who wants to pick up the torch with the same patch updated to git
head. Maybe it could be put on the wiki or something. Sorry for wasting
everyone's time with this.

I would say the criticisms below are spot on:

> > _AT_@ -98,6 +98,7 @@ enum glyph_attribute {
> > ATTR_WRAP = 1 << 8,
> > ATTR_WIDE = 1 << 9,
> > ATTR_WDUMMY = 1 << 10,
> > + ATTR_DECOMPPTR = 1 << 11,
> > ATTR_BOLD_FAINT = ATTR_BOLD | ATTR_FAINT,
> > };
> >
> > _AT_@ -190,6 +191,10 @@ typedef XftColor Color;
> >
> > typedef struct {
> > Rune u; /* character code */
> > + union {
> > + Rune c[2];
> > + Rune *pc;
> > + } dc;
>
> We already have the field u, so why is not the union made with u and pc?
>
> > Line *alt; /* alternate screen */
> > bool *dirty; /* dirtyness of lines */
> > XftGlyphFontSpec *specbuf; /* font spec buffer used for rendering */
> > + int specbuflen;
>
>
> > _AT_@ -419,8 +426,9 @@ static void ttywrite(const char *, size_t);
> > static void tstrsequence(uchar);
> >
> > static inline ushort sixd_to_16bit(int);
> > -static int xmakeglyphfontspecs(XftGlyphFontSpec *, const Glyph *, int, int, int);
> > -static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, int);
> > +static int xmakeglyphfontspecs(XftGlyphFontSpec *, int, const Glyph *, int, int, int);
> > +static int xmakeglyphfontspecsintermbuf(const Glyph *, int, int, int);
> > +static void xdrawglyphfontspecs(const XftGlyphFontSpec *, Glyph, int, int, int, int);
>
> Uhmmm, these names begin to seem like java names. We must find a better way to name them,
> because with names so long the only lower case style doesn't work.
>
> > /* append every set & selected glyph to the selection */
> > _AT_@ -984,6 +993,16 @@ getsel(void) {
> > continue;
> >
> > ptr += utf8encode(gp->u, ptr);
> > + if(gp->mode & ATTR_DECOMPPTR) {
> > + pc = gp->dc.pc;
> > + for(pc = gp->dc.pc; *pc; pc++)
> > + ptr += utf8encode(*pc, ptr);
> > + } else {
> > + if(gp->dc.c[0])
> > + ptr += utf8encode(gp->dc.c[0], ptr);
> > + if(gp->dc.c[1])
> > + ptr += utf8encode(gp->dc.c[1], ptr);
> > + }
> > }
>
> If we define the union with u and pc we can remove the else in this code, no?
>
> > + term.dirty[y] = 1;
> > + if(term.line[y][x].mode & ATTR_DECOMPPTR) {
> > + p = term.line[y][x].dc.pc;
> > + while(*p)
> > + p++;
> > + sz = (p - term.line[y][x].dc.pc) + 2;
> > + term.line[y][x].dc.pc = xrealloc(term.line[y][x].dc.pc, sz * sizeof(Rune));
> > + term.line[y][x].dc.pc[sz-2] = u;
> > + term.line[y][x].dc.pc[sz-1] = 0;
> > + } else {
> > + if(!term.line[y][x].dc.c[0]) {
> > + term.line[y][x].dc.c[0] = u;
> > + } else if(!term.line[y][x].dc.c[1]) {
> > + term.line[y][x].dc.c[1] = u;
> > + } else {
> > + p = xmalloc(4 * sizeof(Rune));
> > + p[0] = term.line[y][x].dc.c[0];
> > + p[1] = term.line[y][x].dc.c[1];
> > + p[2] = u;
> > + p[3] = 0;
> > + term.line[y][x].dc.pc = p;
> > + term.line[y][x].mode |= ATTR_DECOMPPTR;
> > + }
> > + }
> > +}
>
> The same. If the union is made with u almost all off this code disappear.
>
> > _AT_@ -1711,7 +1786,7 @@ tclearregion(int x1, int y1, int x2, int y2) {
> >
> > void
> > tdeletechar(int n) {
> > - int dst, src, size;
> > + int dst, src, size, i;
> > Glyph *line;
> >
> > LIMIT(n, 0, term.col - term.c.x);
> > _AT_@ -1721,13 +1796,17 @@ tdeletechar(int n) {
> > size = term.col - src;
> > line = term.line[term.c.y];
> >
> > + for(i = dst; i < src; i++) {
> > + if(line[i].mode & ATTR_DECOMPPTR)
> > + line[i].dc.pc = NULL;
> > + }
>
> ...
>
> > _AT_@ -1737,6 +1816,10 @@ tinsertblank(int n) {
> > size = term.col - dst;
> > line = term.line[term.c.y];
> >
> > + for(i = src; i < dst; i++) {
> > + if(line[i].mode & ATTR_DECOMPPTR)
> > + line[i].dc.pc = NULL;
> > + }
> > memmove(&line[dst], &line[src], size * sizeof(Glyph));
> > tclearregion(src, term.c.y, dst - 1, term.c.y);
> > }
>
>
> Maybe we should write a blank in u here, because if we have NULL and blank
> everything becomes harder.
>
>
> > _AT_@ -2774,6 +2857,19 @@ tputc(Rune u) {
> > if(sel.ob.x != -1 && BETWEEN(term.c.y, sel.ob.y, sel.oe.y))
> > selclear(NULL);
> >
> > + /* combining chars are added to the previous cell */
> > + if(width == 0) {
> > + if(term.c.x == 0) {
> > + if(term.c.y == 0)
> > + taddcchar(u, 0, 0);
> > + else
> > + taddcchar(u, term.col-1, term.c.y-1);
> > + } else {
> > + taddcchar(u, term.c.x-1, term.c.y);
> > + }
> > + return;
> > + }
> > +
>
> You repeat this logic of term.c.x and term.c.y in several places of the code,
> maybe is better to add some function whose name is clear enough to make obvious
> this logic.
>
> > _AT_@ -2834,12 +2936,19 @@ tresize(int col, int row) {
> > memmove(term.alt, term.alt + i, row * sizeof(Line));
> > }
> > for(i += row; i < term.row; i++) {
> > + for(j = 0; j < term.col; j++) {
> > + if(term.line[i][j].mode & ATTR_DECOMPPTR)
> > + free(term.line[i][j].dc.pc);
> > + if(term.alt[i][j].mode & ATTR_DECOMPPTR)
> > + free(term.alt[i][j].dc.pc);
> > + }
>
> ...
>
> > _AT_@ -2849,14 +2958,26 @@ tresize(int col, int row) {
> >
> > /* resize each row to new width, zero-pad if needed */
> > for(i = 0; i < minrow; i++) {
> > + for(j = col; j < term.col; j++) {
> > + if(term.line[i][j].mode & ATTR_DECOMPPTR)
> > + free(term.line[i][j].dc.pc);
> > + if(term.alt[i][j].mode & ATTR_DECOMPPTR)
> > + free(term.alt[i][j].dc.pc);
> > + }
>
> The same with this logic.
>
> > int
> > -xmakeglyphfontspecs(XftGlyphFontSpec *specs, const Glyph *glyphs, int len, int x, int y)
> > +xmakeglyphfontspecs(XftGlyphFontSpec *specs, int nspecs, const Glyph *glyphs, int len, int x, int y)
> > {
> > - float winx = borderpx + x * xw.cw, winy = borderpx + y * xw.ch, xp, yp;
> > + float winx = borderpx + x * xw.cw, winy = borderpx + y * xw.ch;
>
> I think the very long name of this function and the number of parameters are very good
> indicators we are doing something wrong here, and we should do some rewriting instead
> of patching the code.
>
> > _AT_@ -3326,11 +3464,12 @@ xmakeglyphfontspecs(XftGlyphFontSpec *specs, const Glyph *glyphs, int len, int x
> > /* Lookup character index with default font. */
> > glyphidx = XftCharIndex(xw.dpy, font->match, rune);
> > if(glyphidx) {
> > - specs[numspecs].font = font->match;
> > - specs[numspecs].glyph = glyphidx;
> > - specs[numspecs].x = (short)xp;
> > - specs[numspecs].y = (short)yp;
> > - xp += runewidth;
> > + if(numspecs < nspecs) {
> > + specs[numspecs].font = font->match;
> > + specs[numspecs].glyph = glyphidx;
> > + specs[numspecs].x = (short)xp;
> > + specs[numspecs].y = (short)yp;
> > + }
> > numspecs++;
> > continue;
> > }
> > _AT_@ -3401,20 +3540,43 @@ xmakeglyphfontspecs(XftGlyphFontSpec *specs, const Glyph *glyphs, int len, int x
> > FcCharSetDestroy(fccharset);
> > }
> >
> > - specs[numspecs].font = frc[f].font;
> > - specs[numspecs].glyph = glyphidx;
> > - specs[numspecs].x = (short)xp;
> > - specs[numspecs].y = (short)(winy + frc[f].font->ascent);
> > - xp += runewidth;
> > + if(numspecs < nspecs) {
> > + specs[numspecs].font = frc[f].font;
> > + specs[numspecs].glyph = glyphidx;
> > + specs[numspecs].x = (short)xp;
> > + specs[numspecs].y = (short)(winy + frc[f].font->ascent);
> > + }
> > numspecs++;
> > }
> >
> > return numspecs;
> > }
>
> Again, I think we could try some way to merge these two sections of code that are
> identical (and they are in the same function).
>
>
> I would like continue the discussion of this patch, but please, if you can find some
> way to split it in small patches it will make it easier my life.
>
> Regards,
>
>

text/x-patch attachment: st-0.6-combining-characters-first-try.patch

Received on Thu Sep 17 2015 - 12:20:12 CEST

This archive was generated by hypermail 2.3.0 : Thu Sep 17 2015 - 12:24:11 CEST