Re: [wmii] moving liblitz on from Denis Grelich on 2006-05-25 (wmii mail list archive)

From: Denis Grelich <denisg_AT_ueberl33t.info>
Date: Thu, 25 May 2006 15:22:05 +0200

On Thu, 25 May 2006 11:16:45 +0200
"Anselm R. Garbe" <garbeam_AT_wmii.de> wrote:

> On Thu, May 25, 2006 at 02:06:55AM +0200, Denis Grelich wrote:
> > On Wed, 24 May 2006 19:16:42 +0200
> > "Anselm R. Garbe" <garbeam_AT_wmii.de> wrote:
> > > On Wed, May 24, 2006 at 06:40:00PM +0200, Denis Grelich wrote:
> > > > > That is crap in my eyes. This makes it necessary that you
> > > > > parse/format your data stream all the time... My proposal is
> > > > > not duplicating the data stream, the Rune in Glyph simply
> > > > > points to the correct place within the data stream.
> > > >
> > > > Not at all. You parse it once on input and format it once on
> > > > output. While editing, manipulating, rendering, markup is not
> > > > used and not neccessary for styles and font.
> > >
> > > Which means I parse everything on every keypress or what? A
> > > syntax highlighting algorithm needs to be updated on keypress
> > > (it might effect a complete C file to enter /*...
> > >
> >
> > Of course not! I hope you don't think that I'm that stupid. It looks
> > rather like that:
> >
> > File: "RED{Some red text} BLUE{Some blue text}"
> >
> > READ File
> > PARSE File
> > We now get two structures, Text and Style
> > Text: "Some red text Some blue text"
> > Style: ((style=RED, start=0, len=12),
> > (style=BLUE, start=13, len=13))
>
> Such kind of file data looks more like info which should be
> rendered as label, but not as text widget. Thus I think we talk
> about the same, that you cannot do syntax highlighting with
> putting control chars to the text stream.

Yes, control characters are not for syntax highlighting. They are for
(pre-)defining highlighting. One could send the output of shells that
send ANSI-formatted text through a filter that translates it into the
widgets formatting language, for example.

> > > I agree that you can save it in blocks. But I never claimed I
> > > would save format information for every character, I'd only do
> > > that for every _visible_ character. About 10 years ago, I know
> > > someone in the BBS scene who write a neat ANSI editor. And it
> > > consumed not much memory to safe a bunch of info for each
> > > visible character. It is much less, than doing a screenshot.
> >
> > Heh. And the style information of the invisible text is lost or
> > what? Or do you keep it in ANSI-madness and parse it on scrolling?
>
> It is done dynamically. I don't think it makes much sense
> (except for bar label titles maybe) to interpret style markups
> in text files. In most cases a colorization should be done
> dynamically. I never expected in the past discussion files which
> contain extra meta-info beside text.

Okay, maybe that is not needed at all and should be made completely
dynamic. One could still use filters that write directly to the
widget's style metadata structure if one needs predefined colours and
styling.

> > > > The overhead for richtext, so I estimate at (but I don't have
> > > > much of a clue, so to say) somwhere around 50–60 SLOC. One
> > > > third for
> > >
> > > I think richtext will be 5kSLOC at least.
> >
> > Depends on the implementation. There are one million ways to
> > implement it, and with line arrays and the parallel structure for
> > styles, I don't see where those 5 kSLOC might be. It's not much
> > more than loading the fonts and colour values into some
> > configuration structure, parsing the text and building the style
> > structure, reflect changes on the text in it (several lines), and
> > rendering does Xlib.
>
> With richtext you need to arrange your layout in sane ways,
> that's the hard job. That's why I always told not bothering with
> such stuff. A fixed-font widget suffices.

That also depends on how much richtext we allow. If we only allow as
about much richtext as terminal emulators do, we don't need to
rearrange too much to cope with that, I suppose.

> > > If you implement an editor with syntax highlighting you need to
> > > parse on each input.
> >
> > We are not talking about that. Or are we talking at cross
> > purposes /that/ badly?
>
> Actually an editor with syntax highlighting speaking the sam
> language is on my TODO (which should replace the vim insanity
> one day). That's why I have this in mind already. And actually
> such 'editor' will be more a terminal than an editor. (OT: The
> editor should be a 9P server which serves all buffers for
> arbitrary terminal-9P clients accessing them, but those
> terminals should also be usable to run bash in them. This will
> work together with wmii like a charm.)

I thought terminals should die a bloody and painful death? That's at
least what they should do in my opinion. They are too restricted and
primitive in my eyes, and they make any sane text handling a mess.
(How many terminal emulators do you know that, for example, break text
correctly on resize?)

> > Well, Unicode /can't/ order the alphabets »correctly.« There's no
> > such thing as »correctly!« The philosophy of Unicode is to encode
> > scripts, not languages. And as many, many scripts share subsets of
> > languages, you simply can't put them into the right order. In
> > addition, Unicode had to stay backwards compatible with legacy
> > encodings, so it kept the orderings of national standards intact.
> > So just look at the latin scripts: you have full ASCII, then comes
> > the Latin-1 block, then come blocks with loads and loads of
> > additional variations of latin letters. So you mingle ASCII- and
> > Latin-1-non-letter-characters with the latin alphabet, and you sort
> > all those »a«'s with accents and stuff /after/ the normal »z!«
> > Next, look at combining characters. If you would only
>
> Yes, I think that is good, if accented a's come after z. The
> only important thing is, that it is predictable and not random.
> And Unicode's order is predictable in my eyes. Thus stick with
> it. Each alphabetical order has been randomly choosen. Are there
> any reasons to order our alphabet like those greeks did
> beginning with alpha beta delta ...? Did the greeks had reasons
> for doing so? Or has it been defined by some drunk druid who saw
> the problem to teach it the children? He choosed randomly. Thus
> I don't care if special chars appear after z in Unicode. The
> most important thing is, that each alphabet is not randomly
> encoded, I dunno Russish, but I expect they got an alphabet as
> well, and my hope is, that Unicode at least orders their
> alphabet with a ... z .... first cyrillic rune ... last cyrillic
> rune ... old german runes ... thai symbols ...
>
> > care about code point values, you would put things like an a-umlaut
> > hell knows where, as it is represented as:
> > U+0061 LATIN SMALL LETTER A + U+0308 COMBINING DIAERESIS
> > Furthermore, scripts like the indic ones or korean are /much/ more
> > complex than latin or cyrillic. They combine heavily, and they are
> > even displayed in a different order than they are stored! They
> > characters by themselves are even divided into several code points.
> > (Yes, this all /does/ make sense, but explaining this would blast
> > the scope of this discussion.)
>
> Sounds like a mess, I expected this. Thus I say, don't care
> about it. If someone needs an order of thai symbols, he will
> write a thai_strcmp or invent another square wheel. But we
> should not care about it, because we only know 3-5 languages and
> cannot decide.

If we made ordering and line breaking and stuff also a filter's job, it
would be absolutely no problem to implement a locale and language
sensitive strcmp some day.

> > The C standard library string functions die when used with unicode.
> > Don't do that, it does not work. (As UTF-8 is backwards compatible
> > with ASCII, it works to some extent. But if you use some advanced
> > features of it, it breaks.) You /need/ special unicode-aware
> > functions, either by writing them yourself or using some other
> > library like ICU (there is libunicode or so for Linux, but it's
> > rather a joke.)
>
> Don't use those crappy libs. Don't care about sorting. strncmp is
> for comparing strings, you can easily compare any unicode
> string, no C function fails. If you need a Unicode-capable sort
> function, use this as function pointer for qsort, the C standard
> library does not fail. I can't imagine any function which will
> fail with unicode miserably - and I won't expect so, because
> UTF8 was invented by the C inventors.

Uhm, strncmp doesn't work either. Don't use that. Also, don't forget
that two strings that are /not/ byte-to-byte identical still could need
to compare identical in Unicode; C's string functions work only on
that assumption! It's not that simple.

Greetings,
Denis

application/pgp-signature attachment: stored

Received on Thu May 25 2006 - 15:26:15 UTC

This archive was generated by hypermail 2.2.0 : Sun Jul 13 2008 - 16:06:49 UTC