Re: [hackers] A better mailing list web archiver for suckless.org ... ?

From: Storkman <storkman_AT_storkman.nl>
Date: Fri, 12 Aug 2022 04:17:34 +0200

On Wed, Aug 10, 2022 at 09:29:43PM +0200, Thomas Oltmann wrote:
> Hi all!
>
> I think we can all agree that the current web archive over at
> lists.suckless.org isn't all that great;
> Author names get mangled, the navigation is terrible, some messages
> are duplicated, some missing.
>
> That's why I've started looking into #3 of the 'Project Ideas' page
> (https://suckless.org/project_ideas/) -- "Write a decent mailing list
> Web archive system".
> I see lots of potential to build something better than hypermail:
>
> - We could take text encodings more seriously.
> hypermail just copies the 'charset' notice over into the HTML
> file, which doesn't work when listing multiple messages.
>
> - We could use maildir instead of the really brittle mbox format for mailboxes.
> This might also help avoid message dropping/duplication, but I'm not
> sure about that.
>
> - We could try a different navigation scheme. Perhaps flat threads
> instead of a hierarchy?
> I don't really know how people here feel about this, but it's
> mentioned on the 'Project Ideas' page
> and I'm in favour of it. Navigating message trees is really confusing.
>
> - Bonus: We can ignore CGI, uuencode, HTML mail and all that cruft.
>
> Is there currently any interest in such a project here?
>
> So far, I've gone ahead and implemented a sort of proof-of-concept (at
> https://www.github.com/tomolt/mailarchiver).
> Of course I can't guarantee that this will go anywhere, as I only have
> limited time and patience myself, but I can give it a try.
>
> Cheers,
> Thomas Oltmann
>

Hi!

When you list all these features, it sounds like everything a mailing list
archive front-end does just replicates things our mail clients already
do better, and without going through a web browser.

So I thought, why not just serve the maildir files as-is, with monthly
and yearly tarballs, and perhaps metadata files so you don't need to
download everything just to make sure you've got an entire thread?
But then, that would require additional instrumentation and would make e.g.
referencing mailing list threads in commit messages slightly less convenient.

In any case, I messed with the code a bit, running it on my own archive
maildir. I've constructed a very crude threaded view[1], and came up with a
few fixes in the process.

Patch 2 is a rewrite of collapse_ws(), because I found it really hard to
figure out what exactly it does and how. Your mileage may vary, but I
think the original code would overflow the buffer backwards when given
an empty input.

For patch 3, I've found some e-mails in the wild that used a lowercase
encoding in encoded-words, and the RFC says it's okay.

Patch 4 might not be correct, because I'm not sure how decode_qprintable()
can ever return without error when parsing an encoded-word in a header.
It seems that it would just find the last "=" in "?=", set length to -2,
and return NULL. Maybe I'm just not getting it. It did manage to process
a few dozen more e-mails in my test runs, though.

Hopefully I did this correctly and you can cherry-pick these commits
to your taste.

-- Storkman

  [1]: https://imgur.com/a/EbOblHt

Received on Fri Aug 12 2022 - 04:17:34 CEST

This archive was generated by hypermail 2.3.0 : Fri Aug 12 2022 - 07:36:34 CEST