Re: [wmii] moving liblitz on

From: Anselm R. Garbe <garbeam_AT_wmii.de>
Date: Wed, 24 May 2006 19:16:42 +0200

On Wed, May 24, 2006 at 06:40:00PM +0200, Denis Grelich wrote:
> > That is crap in my eyes. This makes it necessary that you
> > parse/format your data stream all the time... My proposal is not
> > duplicating the data stream, the Rune in Glyph simply points to
> > the correct place within the data stream.
>
> Not at all. You parse it once on input and format it once on output.
> While editing, manipulating, rendering, markup is not used and not
> neccessary for styles and font.

Which means I parse everything on every keypress or what? A
syntax highlighting algorithm needs to be updated on keypress
(it might effect a complete C file to enter /*...

> > What? A rune will take up to 21bit at maximum (which means
> > 32bit, same with an unsinged int). In average it will only need
> > 2byte like wchar_t or Rune from P9 (though in P9 is possible to
> > encode a single glyph with two Runes, but that is rare).
>
> A /glyph/ can take as many bytes as it wants. You can add a dozen
> combining characters to a character. The letter ä for example would
> consist of two Runes. On the other side, you save format information
> for every character. This is crazy! It is enough to save it for blocks
> of text, as it is the case with a parallel structure. It is very
> lightweight, as it only had about seven entries for the example from
> above.

I agree that you can save it in blocks. But I never claimed I
would save format information for every character, I'd only do
that for every _visible_ character. About 10 years ago, I know
someone in the BBS scene who write a neat ANSI editor. And it
consumed not much memory to safe a bunch of info for each
visible character. It is much less, than doing a screenshot.

> > This costs some memory, yes, the more memory you use, the faster
> > your rendering. My rendering would be very fast and allow
> > everything, without reallocations. Why should one need to
> > reallocate? If your window is 120 cols x 70 rows, you just fill
> > all glyphs which don't represent a symbol with blanks (like in
> > your terminal). Your widget_data would be of size
> > Glyph[70][120], this is 70*120*(sizeof(Glyph)) = 445200 byte
> > (~450kb) - not much if you perform a top and checkout that xterm
> > uses 4MB.
>
> Are you nuts? oO
> Every time you /insert a character/ at the beginning of your text, you
> move 4 MB of data! On every key press! With a gapped array, you would
> move nothing at all, except from 250 bytes /once/ when starting to type.

That is a cheap operation with a sane MMU, because you would use
memcpy with a size of < 128k for example - and you would never
need to memcpy more stuff than simply the visible data - and
only in the case of line break. If you want to do it with a
list, no problem, you can also dynamically allocate/reallocate
blocks or lines. I doubt this will be faster though (because a
memcpy < 128kb is known to be faster nowadays than allocating
small bunches of mem all the time) - though you shouldn't care
about either case, your OS should know it better than any app
programmer).

> It's no big deal with a gapped array. Random access can be made to be
> O(1) with a similar large constant (with some buffering) as with the
> Glyph array; worst case would be somwhat near O(log n). As already
> stated several times, handling the array would mean updating everything
> on insertion/deletion. This is just insane!

No, only on linbreaks... in special cases.

> With a line array you get a much faster, /WORKING/ widget with free
> support for non-fixed fonts.
> The overhead for richtext, so I estimate at (but I don't have much of a
> clue, so to say) somwhere around 50–60 SLOC. One third for

I think richtext will be 5kSLOC at least.

> > I'm not sure I fully understand your gapped array structure. But
> > if so, it is pretty much the same I told you about all the time.
>
> It is explained in for example
> http://www.bluemug.com/research/text.pdf beginning from page 8.

Ok will have a look at it.

> You don't have to parse anything if you don't want to. That is the
> beauty of this technique. Those characters have no width, no glyph,
> they are ignored by any unicode-aware algorithm. Firstly, we strip them
> off when parsing. It's a matter of less than half-a-dozen SLOC. Then
> they're gone. The information is preserved in a parallel structure.
> Very fast and efficient. NO PARSING ANYMORE FROM HERE ON. When we
> output the text, for example writing it to a file or to STDOUT, we
> format the data again with them. We don't even need to reformat for
> rendering.

If you implement an editor with syntax highlighting you need to
parse on each input.

> Yes, paging is neccessary in any case, with the gapped array too. The
> reallocations and copying occur on text editing and insertion, not on
> scrolling.

Copying must occur, at least in form of XMapArea, which is
nothing else than memcpy (with about 4MB with 1024x768_AT_24bit)...

> I don't see where it could be slow and flaky. It is about as fast as
> your approach. And even if it /was/, it is better to have flaky text
> selection than flaky text editing.

No, flaky text selection is no option.

> As I stated several times, it's not. Maybe there are some
> misunderstandings between us, but it is definitely not.

Well, might be, I think we agree on most things already.

> > I don't care. UTF-8 is 7bit ASCII compliant, if other alphabets
> > are not sorted within UTF-8 that is not our problem. Each value
> > is a number between 0 and 1.xM, so why do you care?
>
> I have enough of computers that behave like they were build in the
> 60ies. Text is such a basic thing in our world that it is ridiculous
> that most programs still can't handle it properly.

Yes, but this has nothing todo with orders. If UTF8 does not
order the alphabets it supports correctly, then UTF8 is broken.
No one should need to implement sorting for all Unicode
languages... otherwise Unicode is broken.

> > not need to sort. And I don't plan to implement any sorting
> > algorithm which needs a specific mapping or reorder of UTF-8
> > glyphs.
>
> Yes, it is irrelevant for a text widget, but not for a list widget, for
> example.

It is even irrelevant for a list widget, because the widget
would call qsort and strcmp or whatever equivalents as interface
funtions.

> > I don't give a shit. Wether an expression matches or not. Regexp
> > implementations don't need anything else than using ranges
> > between start and end char. They should use the order defined by
> > UTF-8.
>
> Sorry, but this is totally ridiculous.

No it is not, see libregexp9, it is UTF8-compliant for about 15
years. And it works quite well. I also doubt that Unicode does
not order the alphabets it supports correctly. If so, I won't
care unless Latin is not ordered correctly.

Regards,

-- 
 Anselm R. Garbe  ><><  www.ebrag.de  ><><  GPG key: 0D73F361
Received on Wed May 24 2006 - 19:16:42 UTC

This archive was generated by hypermail 2.2.0 : Sun Jul 13 2008 - 16:06:46 UTC