Re: [wmii] moving liblitz on from Anselm R. Garbe on 2006-05-25 (wmii mail list archive)

From: Anselm R. Garbe <garbeam_AT_wmii.de>
Date: Thu, 25 May 2006 11:16:45 +0200

On Thu, May 25, 2006 at 02:06:55AM +0200, Denis Grelich wrote:
> On Wed, 24 May 2006 19:16:42 +0200
> "Anselm R. Garbe" <garbeam_AT_wmii.de> wrote:
> > On Wed, May 24, 2006 at 06:40:00PM +0200, Denis Grelich wrote:
> > > > That is crap in my eyes. This makes it necessary that you
> > > > parse/format your data stream all the time... My proposal is not
> > > > duplicating the data stream, the Rune in Glyph simply points to
> > > > the correct place within the data stream.
> > >
> > > Not at all. You parse it once on input and format it once on output.
> > > While editing, manipulating, rendering, markup is not used and not
> > > neccessary for styles and font.
> >
> > Which means I parse everything on every keypress or what? A
> > syntax highlighting algorithm needs to be updated on keypress
> > (it might effect a complete C file to enter /*...
> >
>
> Of course not! I hope you don't think that I'm that stupid. It looks
> rather like that:
>
> File: "RED{Some red text} BLUE{Some blue text}"
>
> READ File
> PARSE File
> We now get two structures, Text and Style
> Text: "Some red text Some blue text"
> Style: ((style=RED, start=0, len=12),
> (style=BLUE, start=13, len=13))

Such kind of file data looks more like info which should be
rendered as label, but not as text widget. Thus I think we talk
about the same, that you cannot do syntax highlighting with
putting control chars to the text stream.

> I hope it is now clear. Of course it is just an example. The style

Yes

> > I agree that you can save it in blocks. But I never claimed I
> > would save format information for every character, I'd only do
> > that for every _visible_ character. About 10 years ago, I know
> > someone in the BBS scene who write a neat ANSI editor. And it
> > consumed not much memory to safe a bunch of info for each
> > visible character. It is much less, than doing a screenshot.
>
> Heh. And the style information of the invisible text is lost or what?
> Or do you keep it in ANSI-madness and parse it on scrolling?

It is done dynamically. I don't think it makes much sense
(except for bar label titles maybe) to interpret style markups
in text files. In most cases a colorization should be done
dynamically. I never expected in the past discussion files which
contain extra meta-info beside text.

> In any case, the gapped array is never worse than the Glyph array. You
> have to move only small blocks on cursor movement (and no, not each
> time you move the cursor, but each time you start editing after cursor
> movement!), and you have to reallocate all the block if the gap is used
> up. Subsequent changes in the file need no movements and no
> reallocations at all (until the gap is full.)

Ok then.

> > > The overhead for richtext, so I estimate at (but I don't have much
> > > of a clue, so to say) somwhere around 50–60 SLOC. One third for
> >
> > I think richtext will be 5kSLOC at least.
>
> Depends on the implementation. There are one million ways to implement
> it, and with line arrays and the parallel structure for styles, I don't
> see where those 5 kSLOC might be. It's not much more than loading the
> fonts and colour values into some configuration structure, parsing the
> text and building the style structure, reflect changes on the text in
> it (several lines), and rendering does Xlib.

With richtext you need to arrange your layout in sane ways,
that's the hard job. That's why I always told not bothering with
such stuff. A fixed-font widget suffices.

> > If you implement an editor with syntax highlighting you need to
> > parse on each input.
>
> We are not talking about that. Or are we talking at cross
> purposes /that/ badly?

Actually an editor with syntax highlighting speaking the sam
language is on my TODO (which should replace the vim insanity
one day). That's why I have this in mind already. And actually
such 'editor' will be more a terminal than an editor. (OT: The
editor should be a 9P server which serves all buffers for
arbitrary terminal-9P clients accessing them, but those
terminals should also be usable to run bash in them. This will
work together with wmii like a charm.)

> Well, Unicode /can't/ order the alphabets »correctly.« There's no such
> thing as »correctly!« The philosophy of Unicode is to encode scripts,
> not languages. And as many, many scripts share subsets of languages,
> you simply can't put them into the right order. In addition, Unicode
> had to stay backwards compatible with legacy encodings, so it kept the
> orderings of national standards intact. So just look at the latin
> scripts: you have full ASCII, then comes the Latin-1 block, then come
> blocks with loads and loads of additional variations of latin letters.
> So you mingle ASCII- and Latin-1-non-letter-characters with the latin
> alphabet, and you sort all those »a«'s with accents and stuff /after/
> the normal »z!« Next, look at combining characters. If you would only

Yes, I think that is good, if accented a's come after z. The
only important thing is, that it is predictable and not random.
And Unicode's order is predictable in my eyes. Thus stick with
it. Each alphabetical order has been randomly choosen. Are there
any reasons to order our alphabet like those greeks did
beginning with alpha beta delta ...? Did the greeks had reasons
for doing so? Or has it been defined by some drunk druid who saw
the problem to teach it the children? He choosed randomly. Thus
I don't care if special chars appear after z in Unicode. The
most important thing is, that each alphabet is not randomly
encoded, I dunno Russish, but I expect they got an alphabet as
well, and my hope is, that Unicode at least orders their
alphabet with a ... z .... first cyrillic rune ... last cyrillic
rune ... old german runes ... thai symbols ...

> care about code point values, you would put things like an a-umlaut
> hell knows where, as it is represented as:
> U+0061 LATIN SMALL LETTER A + U+0308 COMBINING DIAERESIS
> Furthermore, scripts like the indic ones or korean are /much/ more
> complex than latin or cyrillic. They combine heavily, and they are even
> displayed in a different order than they are stored! They characters by
> themselves are even divided into several code points. (Yes, this
> all /does/ make sense, but explaining this would blast the scope of
> this discussion.)

Sounds like a mess, I expected this. Thus I say, don't care
about it. If someone needs an order of thai symbols, he will
write a thai_strcmp or invent another square wheel. But we
should not care about it, because we only know 3-5 languages and
cannot decide.

> The C standard library string functions die when used with unicode.
> Don't do that, it does not work. (As UTF-8 is backwards compatible with
> ASCII, it works to some extent. But if you use some advanced features
> of it, it breaks.) You /need/ special unicode-aware functions, either
> by writing them yourself or using some other library like ICU (there is
> libunicode or so for Linux, but it's rather a joke.)

Don't use those crappy libs. Don't care about sorting. strncmp is
for comparing strings, you can easily compare any unicode
string, no C function fails. If you need a Unicode-capable sort
function, use this as function pointer for qsort, the C standard
library does not fail. I can't imagine any function which will
fail with unicode miserably - and I won't expect so, because
UTF8 was invented by the C inventors.

Regards,

-- 
 Anselm R. Garbe  ><><  www.ebrag.de  ><><  GPG key: 0D73F361

Received on Thu May 25 2006 - 11:16:45 UTC

This archive was generated by hypermail 2.2.0 : Sun Jul 13 2008 - 16:06:48 UTC