Re: [hackers] A better mailing list web archiver for suckless.org ... ?

From: Thomas Oltmann <thomas.oltmann.hhg_AT_gmail.com>
Date: Fri, 12 Aug 2022 17:46:46 +0200

Hi Storkman,

I've now looked through your patches. In order:

1: I agree that the current code is ugly, but I don't want to run the
preprocessing code
   on all the fields that we don't even look at.
2: Applied!
3: Applied!
4: Both my original code and your patch were able to overrun the given
memory area.
   Using memchr() instead of strchr() should fix this.
   Does that clear up your confusion or is there still a logic error
in there that I'm not seeing?

Thanks a lot!

On Fri, Aug 12, 2022 at 7:35 AM Storkman <storkman_AT_storkman.nl> wrote:
>
> On Wed, Aug 10, 2022 at 09:29:43PM +0200, Thomas Oltmann wrote:
> > Hi all!
> >
> > I think we can all agree that the current web archive over at
> > lists.suckless.org isn't all that great;
> > Author names get mangled, the navigation is terrible, some messages
> > are duplicated, some missing.
> >
> > That's why I've started looking into #3 of the 'Project Ideas' page
> > (https://suckless.org/project_ideas/) -- "Write a decent mailing list
> > Web archive system".
> > I see lots of potential to build something better than hypermail:
> >
> > - We could take text encodings more seriously.
> > hypermail just copies the 'charset' notice over into the HTML
> > file, which doesn't work when listing multiple messages.
> >
> > - We could use maildir instead of the really brittle mbox format for mailboxes.
> > This might also help avoid message dropping/duplication, but I'm not
> > sure about that.
> >
> > - We could try a different navigation scheme. Perhaps flat threads
> > instead of a hierarchy?
> > I don't really know how people here feel about this, but it's
> > mentioned on the 'Project Ideas' page
> > and I'm in favour of it. Navigating message trees is really confusing.
> >
> > - Bonus: We can ignore CGI, uuencode, HTML mail and all that cruft.
> >
> > Is there currently any interest in such a project here?
> >
> > So far, I've gone ahead and implemented a sort of proof-of-concept (at
> > https://www.github.com/tomolt/mailarchiver).
> > Of course I can't guarantee that this will go anywhere, as I only have
> > limited time and patience myself, but I can give it a try.
> >
> > Cheers,
> > Thomas Oltmann
> >
>
> Hi!
>
> When you list all these features, it sounds like everything a mailing list
> archive front-end does just replicates things our mail clients already
> do better, and without going through a web browser.
>
> So I thought, why not just serve the maildir files as-is, with monthly
> and yearly tarballs, and perhaps metadata files so you don't need to
> download everything just to make sure you've got an entire thread?
> But then, that would require additional instrumentation and would make e.g.
> referencing mailing list threads in commit messages slightly less convenient.
>
> In any case, I messed with the code a bit, running it on my own archive
> maildir. I've constructed a very crude threaded view[1], and came up with a
> few fixes in the process.
>
> Patch 2 is a rewrite of collapse_ws(), because I found it really hard to
> figure out what exactly it does and how. Your mileage may vary, but I
> think the original code would overflow the buffer backwards when given
> an empty input.
>
> For patch 3, I've found some e-mails in the wild that used a lowercase
> encoding in encoded-words, and the RFC says it's okay.
>
> Patch 4 might not be correct, because I'm not sure how decode_qprintable()
> can ever return without error when parsing an encoded-word in a header.
> It seems that it would just find the last "=" in "?=", set length to -2,
> and return NULL. Maybe I'm just not getting it. It did manage to process
> a few dozen more e-mails in my test runs, though.
>
> Hopefully I did this correctly and you can cherry-pick these commits
> to your taste.
>
> -- Storkman
>
> [1]: https://imgur.com/a/EbOblHt
Received on Fri Aug 12 2022 - 17:46:46 CEST

This archive was generated by hypermail 2.3.0 : Fri Aug 12 2022 - 18:24:36 CEST