Re: [hackers] [libsl|dmenu][PATCH v2] Fix truncation issues and improve performance

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Fri, 25 Mar 2022 23:01:04 +0100

On Fri, Mar 25, 2022 at 03:34:29PM +0600, NRK wrote:
> Hi,
>
> Fixed a couple bugs/issues with the previous patches and attached the
> amended ones. I'm confident that these patches shouldn't have any
> remaining logic issues, but feel free to point them out in case there is.
>
> Some notable changes from the previous patches:
>
> * The drw related patches have separated. This is so that they can be
> cleanly applied to libsl (and dwm). Should additionally help with the
> review process.
>
> * The truncation condition as well as remaining width to override has been
> corrected.
>
> * The TEXTW_CLAMP macro has been converted to a function to avoid
> calculating width twice.
>
> * Since the `invert` parameter in drw_text() does not get used when not
> rendering, it's now being internally reused for specifying a minimum width.
>
> Additionally, there's one *new* patch attached, which fixes the
> performance issue in case we have no font match for a given codepoint.
>
> - NRK

> From 1425e94d0eb3cfbb4a9b73d433b9133bcb1b824c Mon Sep 17 00:00:00 2001
> From: NRK <nrk_AT_disroot.org>
> Date: Thu, 24 Mar 2022 00:37:55 +0600
> Subject: [PATCH 1/6] drw_text: improve both performance and correctness
>
> this patch makes some non-trivial changes, which significantly improves
> the performance of drawing large strings as well as fixes any issues
> regarding the printing of the ellipsis when string gets truncated.
>
> * performance:
>
> before there were two O(n) loops, one which finds how long we can go
> without changing font, and the second loop would (incorrectly) truncate
> the string if it's too big.
>
> this patch merges the overflow calculation into the first loop and exits
> out when overflow is detected. when dumping lots of emojies into dmenu,
> i see some noticeable startup time improvement:
>
> before -> after
> 460ms -> 360ms
>
> input latency when scrolling up/down is also noticeably better and can
> be tested with the following:
>
> for _ in $(seq 20); do
> cat /dev/urandom | base64 | tr -d '\n' | head -c 1000000
> echo
> done | ./dmenu -l 10
>
> * correctness:
>
> the previous version would incorrectly assumed single byte chars and
> would overwrite them with '.' , this caused a whole bunch of obvious
> problems, including the ellipsis not getting rendered if then font
> changed.
>
> in addition to exiting out when we detect overflow, this patch also
> keeps track of the last x-position where the ellipsis would fit. if we
> detect overflow, we simply make a recursing call to drw_text() at the
> ellipsis_x position and overwrite what was there.
>
> so now the ellipsis will always be printed properly, regardless of
> weather the font changes or if the string is single byte char or not.
>
> the idea of rendering the ellipsis on top incase of overflow was
> from Bakkeby <bakkeby_AT_gmail.com>, thanks! however the original patch had
> some issues incorrectly truncating the prompt (-p flag) and cutting off
> emojies. those have been fixed in here.
> ---
> drw.c | 56 ++++++++++++++++++++++++++++----------------------------
> 1 file changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/drw.c b/drw.c
> index 4cdbcbe..e65d069 100644
> --- a/drw.c
> +++ b/drw.c
> _AT_@ -251,12 +251,10 @@ drw_rect(Drw *drw, int x, int y, unsigned int w, unsigned int h, int filled, int
> int
> drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lpad, const char *text, int invert)
> {
> - char buf[1024];
> - int ty;
> - unsigned int ew;
> + int ty, ellipsis_x = 0;
> + unsigned int tmpw, ew, ellipsis_w = 0, ellipsis_len, ellipsis_width;
> XftDraw *d = NULL;
> Fnt *usedfont, *curfont, *nextfont;
> - size_t i, len;
> int utf8strlen, utf8charlen, render = x || y || w || h;
> long utf8codepoint = 0;
> const char *utf8str;
> _AT_@ -264,7 +262,7 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> FcPattern *fcpattern;
> FcPattern *match;
> XftResult result;
> - int charexists = 0;
> + int charexists = 0, overflow = 0;
>
> if (!drw || (render && !drw->scheme) || !text || !drw->fonts)
> return 0;
> _AT_@ -282,8 +280,9 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> }
>
> usedfont = drw->fonts;
> + drw_font_getexts(usedfont, "...", 3, &ellipsis_width, NULL);
> while (1) {
> - utf8strlen = 0;
> + ew = ellipsis_len = utf8strlen = 0;
> utf8str = text;
> nextfont = NULL;
> while (*text) {
> _AT_@ -291,9 +290,21 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> for (curfont = drw->fonts; curfont; curfont = curfont->next) {
> charexists = charexists || XftCharExists(drw->dpy, curfont->xfont, utf8codepoint);
> if (charexists) {
> - if (curfont == usedfont) {
> + drw_font_getexts(curfont, text, utf8charlen, &tmpw, NULL);
> + if (ew + ellipsis_width <= w) {
> + /* keep track where the ellipsis still fits */
> + ellipsis_x = x + ew;
> + ellipsis_w = w - ew;
> + ellipsis_len = utf8strlen;
> + }
> +
> + if (ew + tmpw > w) {
> + overflow = 1;
> + utf8strlen = ellipsis_len;
> + } else if (curfont == usedfont) {
> utf8strlen += utf8charlen;
> text += utf8charlen;
> + ew += tmpw;
> } else {
> nextfont = curfont;
> }
> _AT_@ -301,36 +312,25 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> }
> }
>
> - if (!charexists || nextfont)
> + if (overflow || !charexists || nextfont)
> break;
> else
> charexists = 0;
> }
>
> if (utf8strlen) {
> - drw_font_getexts(usedfont, utf8str, utf8strlen, &ew, NULL);
> - /* shorten text if necessary */
> - for (len = MIN(utf8strlen, sizeof(buf) - 1); len && ew > w; len--)
> - drw_font_getexts(usedfont, utf8str, len, &ew, NULL);
> -
> - if (len) {
> - memcpy(buf, utf8str, len);
> - buf[len] = '\0';
> - if (len < utf8strlen)
> - for (i = len; i && i > len - 3; buf[--i] = '.')
> - ; /* NOP */
> -
> - if (render) {
> - ty = y + (h - usedfont->h) / 2 + usedfont->xfont->ascent;
> - XftDrawStringUtf8(d, &drw->scheme[invert ? ColBg : ColFg],
> - usedfont->xfont, x, ty, (XftChar8 *)buf, len);
> - }
> - x += ew;
> - w -= ew;
> + if (render) {
> + ty = y + (h - usedfont->h) / 2 + usedfont->xfont->ascent;
> + XftDrawStringUtf8(d, &drw->scheme[invert ? ColBg : ColFg],
> + usedfont->xfont, x, ty, (XftChar8 *)utf8str, utf8strlen);
> }
> + x += ew;
> + w -= ew;
> }
> + if (render && overflow)
> + drw_text(drw, ellipsis_x, y, ellipsis_w, h, 0, "...", invert);
>
> - if (!*text) {
> + if (!*text || overflow) {
> break;
> } else if (nextfont) {
> charexists = 0;
> --
> 2.34.1
>

> From cebbfe70f25d52c290172640614cb592955ec67f Mon Sep 17 00:00:00 2001
> From: NRK <nrk_AT_disroot.org>
> Date: Thu, 24 Mar 2022 02:00:00 +0600
> Subject: [PATCH 2/6] introduce drw_fontset_getwidth_clamp()
>
> getting the width of a string is an O(n) operation, and in many cases
> users only care about getting the width upto a certain number.
>
> instead of calling drw_fontset_getwidth() and *then* clamping the
> result, this patch introduces drw_fontset_getwidth_clamp() function,
> similar to strnlen(), which will stop once we reach n.
>
> the `invert` parameter was overloaded internally to preserve the API,
> however library users should be calling drw_fontset_getwidth_clamp() and
> not depend upon internal behavior of drw_text().
> ---
> drw.c | 19 +++++++++++++++++--
> drw.h | 1 +
> 2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drw.c b/drw.c
> index e65d069..7d985b1 100644
> --- a/drw.c
> +++ b/drw.c
> _AT_@ -268,7 +268,7 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> return 0;
>
> if (!render) {
> - w = ~w;
> + w = invert ? invert : ~invert;
> } else {
> XSetForeground(drw->dpy, drw->gc, drw->scheme[invert ? ColFg : ColBg].pixel);
> XFillRectangle(drw->dpy, drw->drawable, drw->gc, x, y, w, h);
> _AT_@ -300,7 +300,13 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
>
> if (ew + tmpw > w) {
> overflow = 1;
> - utf8strlen = ellipsis_len;
> + /* called from drw_fontset_getwidth_clamp():
> + * it wants the width AFTER the overflow
> + */
> + if (!render)
> + x += tmpw;
> + else
> + utf8strlen = ellipsis_len;
> } else if (curfont == usedfont) {
> utf8strlen += utf8charlen;
> text += utf8charlen;
> _AT_@ -397,6 +403,15 @@ drw_fontset_getwidth(Drw *drw, const char *text)
> return drw_text(drw, 0, 0, 0, 0, 0, text, 0);
> }
>
> +unsigned int
> +drw_fontset_getwidth_clamp(Drw *drw, const char *text, unsigned int n)
> +{
> + unsigned int tmp = 0;
> + if (drw && drw->fonts && text && n)
> + tmp = drw_text(drw, 0, 0, 0, 0, 0, text, n);
> + return MIN(n, tmp);
> +}
> +
> void
> drw_font_getexts(Fnt *font, const char *text, unsigned int len, unsigned int *w, unsigned int *h)
> {
> diff --git a/drw.h b/drw.h
> index 4c67419..fd7631b 100644
> --- a/drw.h
> +++ b/drw.h
> _AT_@ -35,6 +35,7 @@ void drw_free(Drw *drw);
> Fnt *drw_fontset_create(Drw* drw, const char *fonts[], size_t fontcount);
> void drw_fontset_free(Fnt* set);
> unsigned int drw_fontset_getwidth(Drw *drw, const char *text);
> +unsigned int drw_fontset_getwidth_clamp(Drw *drw, const char *text, unsigned int n);
> void drw_font_getexts(Fnt *font, const char *text, unsigned int len, unsigned int *w, unsigned int *h);
>
> /* Colorscheme abstraction */
> --
> 2.34.1
>

> From 557a6ecd4aae5b8baf4f1e8f324ad17efce72df0 Mon Sep 17 00:00:00 2001
> From: NRK <nrk_AT_disroot.org>
> Date: Thu, 24 Mar 2022 02:04:04 +0600
> Subject: [PATCH 3/6] significantly improve performance on large strings
>
> this replaces inefficient pattern of `MIN(TEXTW(..), n)` with
> drw_fontset_getwidth_clamp() instead, which is far more efficient when
> we only want up to a certain width.
>
> dumping a decently sized (unicode) emoji file into dmenu, I see the
> startup time drop significantly with this patch.
>
> before -> after
> 360ms -> 160ms
>
> this should also noticeably improve input latency (responsiveness) given
> that calcoffsets() and drawmenu() are pretty hot functions.
> ---
> dmenu.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/dmenu.c b/dmenu.c
> index eca67ac..cde394b 100644
> --- a/dmenu.c
> +++ b/dmenu.c
> _AT_@ -58,6 +58,13 @@ static Clr *scheme[SchemeLast];
> static int (*fstrncmp)(const char *, const char *, size_t) = strncmp;
> static char *(*fstrstr)(const char *, const char *) = strstr;
>
> +static unsigned int
> +textw_clamp(const char *str, unsigned int n)
> +{
> + unsigned int w = drw_fontset_getwidth_clamp(drw, str, n) + lrpad;
> + return MIN(w, n);
> +}
> +
> static void
> appenditem(struct item *item, struct item **list, struct item **last)
> {
> _AT_@ -82,10 +89,10 @@ calcoffsets(void)
> n = mw - (promptw + inputw + TEXTW("<") + TEXTW(">"));
> /* calculate which items will begin the next page and previous page */
> for (i = 0, next = curr; next; next = next->right)
> - if ((i += (lines > 0) ? bh : MIN(TEXTW(next->text), n)) > n)
> + if ((i += (lines > 0) ? bh : textw_clamp(next->text, n)) > n)
> break;
> for (i = 0, prev = curr; prev && prev->left; prev = prev->left)
> - if ((i += (lines > 0) ? bh : MIN(TEXTW(prev->left->text), n)) > n)
> + if ((i += (lines > 0) ? bh : textw_clamp(prev->left->text, n)) > n)
> break;
> }
>
> _AT_@ -172,7 +179,7 @@ drawmenu(void)
> }
> x += w;
> for (item = curr; item != next; item = item->right)
> - x = drawitem(item, x, 0, MIN(TEXTW(item->text), mw - x - TEXTW(">")));
> + x = drawitem(item, x, 0, textw_clamp(item->text, mw - x - TEXTW(">")));
> if (next) {
> w = TEXTW(">");
> drw_setscheme(drw, scheme[SchemeNorm]);
> --
> 2.34.1
>

> From 2c1da37e87e219014d838aecc05bd38c4f66fd83 Mon Sep 17 00:00:00 2001
> From: NRK <nrk_AT_disroot.org>
> Date: Thu, 24 Mar 2022 00:37:55 +0600
> Subject: [PATCH 4/6] inputw: improve correctness and startup performance
>
> a massive amount of time inside readstdin() is spent trying to get the
> max input width and then put it into inputw, only for it to get clamped
> down to mw/3 inside setup().
>
> it makes more sense to calculate inputw inside setup() once we have mw
> available. similar to the last patch, i see noticeable startup
> performance improvement:
>
> before -> after
> 160ms -> 60ms
>
> additionally this will take fallback fonts into account compared to the
> previous version, so it's not only more performant but also more correct.
> ---
> dmenu.c | 19 +++++++++----------
> 1 file changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/dmenu.c b/dmenu.c
> index cde394b..d989d39 100644
> --- a/dmenu.c
> +++ b/dmenu.c
> _AT_@ -547,8 +547,7 @@ static void
> readstdin(void)
> {
> char buf[sizeof text], *p;
> - size_t i, imax = 0, size = 0;
> - unsigned int tmpmax = 0;
> + size_t i, size = 0;
>
> /* read each line from stdin and add it to the item list */
> for (i = 0; fgets(buf, sizeof buf, stdin); i++) {
> _AT_@ -560,15 +559,9 @@ readstdin(void)
> if (!(items[i].text = strdup(buf)))
> die("cannot strdup %u bytes:", strlen(buf) + 1);
> items[i].out = 0;
> - drw_font_getexts(drw->fonts, buf, strlen(buf), &tmpmax, NULL);
> - if (tmpmax > inputw) {
> - inputw = tmpmax;
> - imax = i;
> - }
> }
> if (items)
> items[i].text = NULL;
> - inputw = items ? TEXTW(items[imax].text) : 0;
> lines = MIN(lines, i);
> }
>
> _AT_@ -614,12 +607,13 @@ static void
> setup(void)
> {
> int x, y, i, j;
> - unsigned int du;
> + unsigned int du, tmp;
> XSetWindowAttributes swa;
> XIM xim;
> Window w, dw, *dws;
> XWindowAttributes wa;
> XClassHint ch = {"dmenu", "dmenu"};
> + struct item *item;
> #ifdef XINERAMA
> XineramaScreenInfo *info;
> Window pw;
> _AT_@ -677,7 +671,12 @@ setup(void)
> mw = wa.width;
> }
> promptw = (prompt && *prompt) ? TEXTW(prompt) - lrpad / 4 : 0;
> - inputw = MIN(inputw, mw/3);
> + for (item = items; item && item->text; ++item) {
> + if ((tmp = textw_clamp(item->text, mw/3)) > inputw) {
> + if ((inputw = tmp) == mw/3)
> + break;
> + }
> + }
> match();
>
> /* create menu window */
> --
> 2.34.1
>

> From 7dcadf228f2dc48aaf3e9416116e06cceca91cb8 Mon Sep 17 00:00:00 2001
> From: NRK <nrk_AT_disroot.org>
> Date: Thu, 24 Mar 2022 00:37:55 +0600
> Subject: [PATCH 5/6] drw_text: improve performance when there's no match
>
> this was the last piece of the puzzle, the case where we can't find any
> font to draw the codepoint.
>
> in such cases, we use XftFontMatch() which is INSANELY slow. but that's
> not the real problem. the real problem was we were continuously trying
> to match the same thing over and over again.
>
> this patch introduces a small cache, which keeps track a couple
> codepoints for which we know we won't find any matches.
>
> with this, i can dump lots of emojies into dmenu where some of them
> don't have any matching font, and still not have dmenu lag insanely or
> FREEZE completely when scrolling up and down.
>
> this also improves startup time, which will of course depend on the
> system and all installed fonts; but on my system and test case i see the
> following startup time drop:
>
> before -> after
> 60ms -> 34ms
> ---
> drw.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drw.c b/drw.c
> index 7d985b1..a50c9ee 100644
> --- a/drw.c
> +++ b/drw.c
> _AT_@ -251,7 +251,7 @@ drw_rect(Drw *drw, int x, int y, unsigned int w, unsigned int h, int filled, int
> int
> drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lpad, const char *text, int invert)
> {
> - int ty, ellipsis_x = 0;
> + int i, ty, ellipsis_x = 0;
> unsigned int tmpw, ew, ellipsis_w = 0, ellipsis_len, ellipsis_width;
> XftDraw *d = NULL;
> Fnt *usedfont, *curfont, *nextfont;
> _AT_@ -263,6 +263,9 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> FcPattern *match;
> XftResult result;
> int charexists = 0, overflow = 0;
> + /* keep track of a couple codepoints for which we have no match. */
> + enum { nomatches_len = 64 };
> + static struct { long codepoint[nomatches_len]; unsigned int idx; } nomatches;
>
> if (!drw || (render && !drw->scheme) || !text || !drw->fonts)
> return 0;
> _AT_@ -346,6 +349,12 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> * character must be drawn. */
> charexists = 1;
>
> + for (i = 0; i < nomatches_len; ++i) {
> + /* avoid calling XftFontMatch if we know we won't find a match */
> + if (utf8codepoint == nomatches.codepoint[i])
> + goto no_match;
> + }
> +
> fccharset = FcCharSetCreate();
> FcCharSetAddChar(fccharset, utf8codepoint);
>
> _AT_@ -374,6 +383,8 @@ drw_text(Drw *drw, int x, int y, unsigned int w, unsigned int h, unsigned int lp
> curfont->next = usedfont;
> } else {
> xfont_free(usedfont);
> + nomatches.codepoint[++nomatches.idx % nomatches_len] = utf8codepoint;
> +no_match:
> usedfont = drw->fonts;
> }
> }
> --
> 2.34.1
>

Hi NRK,

I applied all the patches. Thanks for your work on this!

Now people can test the master branch for a while or see if there are any
issues. If there are issues it's not a problem. It can be iterated on.

After that we can also sync the drw.{c,h} code to dwm, sent and libsl.

Thanks,

-- 
Kind regards,
Hiltjo
Received on Fri Mar 25 2022 - 23:01:04 CET

This archive was generated by hypermail 2.3.0 : Fri Mar 25 2022 - 23:12:34 CET