Re: [hackers] [dwm][patch] ISO 8859-1-only window name bugfix

From: Страхиња Радић <contact_AT_strahinja.org>
Date: Thu, 13 Jul 2023 08:14:22 +0200

On 23/07/12 10:28PM, Hiltjo Posthuma wrote:
> Unless I'm missing something. It seems like an application or environment
> issue.
>
> For dwm it is assumed the environment is utf-8 and application should use it.

Sorry, I forgot to list my locale-related variables. They are set up as such:

LANG=sr_RS.UTF-8
LC_ALL=
LC_CTYPE=
LC_NUMERIC=
LC_TIME=
LC_COLLATE=
LC_MONETARY=
LC_MESSAGES=

and I still experience the behavior from this issue. I use Alpine Linux, so
it's also not related to libc or distro.


> I think it makes sense if the application uses utf-8 or the same encoding as
> the environment. It shouldn't pick some encoding an expect the window manager
> to autodetect and handle all of them.

If I understand the process correctly, there are currently two cases,
differentiated by the encoding field. If it's set to XA_STRING, no conversion
is made, otherwise it is passed to XmbTextPropertyToTextList which does the
conversion from locale ("Multi Byte") to UTF-8. From what I've observed, the
encoding field cannot be used to make this distinction. This is a table of
cases I've encountered ("Passed to Xmb...?" means the value being passed to
XmbTextPropertyToTextList under the current upstream, unaltered dwm):

   Actual Encoding encoding field Source Passed to Xmb...?
-------------------------------------------------------------------------------
1. ISO 8859-1 31 (=XA_STRING) LibreOffice No
2. (COMPOUND_TEXT?) 385 LibreOffice Yes
3. UTF-8 31 (=XA_STRING) slstatus No

Now that I look at it again, I am not sure what is the actual encoding in case
#2. When I use od(1) to take a look at the bytes in the value field, I get
this for "thisátestњ.odt - LibreOffice Writer":

$ od -t c value.log
0000000 t h i s 341 t e s t 033 - L 372 . o d
0000020 t - L i b r e O f f i c e
0000040 W r i t e r \n
0000047

Octal 341 (=225=0xE1) for "á" is ISO 8859-1, but the sequence "033 - L 372"
for "њ" is... COMPOUND_TEXT[1]? Ah--now I see[2]:


> For supported locales, existence of a converter from COMPOUND_TEXT, STRING,
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> UTF8_STRING or the encoding of the current locale is guaranteed if
  ^^^^^^^^^^^
> XSupportsLocale returns True for the current locale (but the actual text may

> contain unconvertible characters).


So I guess that explains case #2 then. Still, I think case #1 should be handled
somehow. I'm not sure whether LibreOffice or X.Org should be blamed for setting
WM_NAME to unconverted ISO 8859-1 bytes. As stated, contents of my LANG ends in
.UTF-8, so "current locale" should not be ISO 8859-1, unless hardcoded.


[1]: https://www.x.org/releases/X11R7.6/doc/xorg-docs/specs/CTEXT/ctext.html
[2]: https://linux.die.net/man/3/xmbtextpropertytotextlist

Received on Thu Jul 13 2023 - 08:14:22 CEST

This archive was generated by hypermail 2.3.0 : Thu Jul 13 2023 - 08:24:37 CEST