On Thu, May 07, 2015 at 10:15:34AM +0200, Silvan Jegen wrote:
> On Wed, May 6, 2015 at 9:47 PM, Marc André Tanner <mat_AT_brain-dump.org> wrote:
> > Thanks, this will require more time to properly review than I currently
> > have. What follows are therefore only a few general remarks.
> >
> >> The code for the read/write loops communicating through a pipe with the
> >> external process is very ugly. There must be a better way to implement
> >> the pipe communication but I could not figure it out.
> >
> > This is where a proper libuv based main loop would come in handy. Ideally
> > the editor core should remain responsive while the asynchronous I/O is beeing
> > done. There should be some kind of progress indication and a way to cancel
> > those long running tasks. Also while doing so the text should essentially
> > be marked read only.
>
> I agree that having a responsive editor is of utmost importance and a
> progress indication would be nice to have. What I am not so sure about
> is the following points:
>
> * If the text is marked read only anyways, is it actually useful to
> have a "responsive" editor at all?
> * Measuring progress is hard when you do not know the goal. In the sed
> case you can only guess how much progress has been done by comparing
> the count of input and output bytes for example. Even then, since you
> can do something like 's/a/AAAAAAAAAAAA/g' there is no direct
> correspondance. That said, there is the 'pv'[0] program that does
> something similar. I assume it just checks the amount of data sent
> through the pipe (but depending on the buffering and processing speed
> of the output end of the pipe, predictions on running time will
> probably be quite inaccurate)?
Maybe progress indication is the wrong word. I wasn't thinking along the
lines of percentage completed but rather an indication that there is a
background task running and a way to cancel it.
> * Is it worth adding the libuv dependency in that case?
If I ever find the time to experiment with a client/server architecture
then a proper mainloop will be needed. Using something like libuv doesn't
seem like the worst idea ...
> >> This approach uses a unnecessary amount of memory too since it keeps a lot
> >> of the input data for the external process
> >
> > This should not be necessary, it should be possible to repeatedly call
> > text_range_write (ignoring for now that it always takes a snapshot) with
> > a resonable range which is adjusted each time.
> >
> >> as well as its output data in memory at the same time
> >
> > Similarly it should be possible to insert the data from the external process
> > as it becomes available. One idea is to write the new data after the
> > specified range which is being written out.
>
> This is of course true. At the same time that would mean that the
> external command and the text_range_write have to be called several
> times which has some overhead (especially with big ranges of course)
> while requiring more elaborate logic. It's a tradeoff.
I'm not sure why you think the external command would be called multiple
times? Instead of your current approach of first copying the data into
a contiguous memory block you would simply repeatedly call text_range_write
in a loop with different ranges. Similarly to how you are currently doing
it with write(2).
> > Assuming we want to process the range [10, 100] it would work as follows:
> >
> > 1) text_snapshot
> >
> > 2) while (data to write or data to read)
> > if (data to write)
> > write range [10, 20] (on next iteration [20, 30] etc ...)
> > if (data to read)
> > read range and insert into [100, 110] ([110, 120], ...)
> >
> > 3) read command exit status depending on success
> > - either delete original range [10, 100] and snapshot
> > - or revert to last snapshot (text_undo)
>
> I think that would work but in case that there is some text after
> position 100 already you would have to either
>
> 1. handle the existing text by moving it on each insertion
This is of course done implicitly. However no data is moved around.
Upon the first insertion a new piece is created and linked in at the
respective position. Since the inserts are contiguous the piece is
then simply expanded by appending the new data to the existing buffer.
That is further inserts basically amount to a memcpy(3) (ok that is
a bit simplified, but true in principle).
The main overhead multiple text_inserts have is the mapping of a file
position i.e. a byte offset to the corresponding place in the piece
chain. Currently this has linear complexity in the number of non
consecutive editing operations in the file. Since this mapping is
used by almost everything it is a natural candidate for optimizations
(if it ever turns out to be a bottleneck).
--
Marc André Tanner >< http://www.brain-dump.org/ >< GPG key: CF7D56C0
Received on Thu May 14 2015 - 00:30:54 CEST