Re: [hackers] A better mailing list web archiver for ... ?

From: Thomas Oltmann <>
Date: Thu, 11 Aug 2022 13:41:00 +0200


On Thu, Aug 11, 2022 at 11:10 AM NRK <> wrote:
> On Wed, Aug 10, 2022 at 09:29:43PM +0200, Thomas Oltmann wrote:
> > I think we can all agree that the current web archive over at
> > isn't all that great;
> > Author names get mangled, the navigation is terrible, some messages
> > are duplicated, some missing.
> I've noticed the missing mails too.
> > Is there currently any interest in such a project here?
> If it'd be an improvement over the current system then I don't see why
> not.
> > So far, I've gone ahead and implemented a sort of proof-of-concept (at
> >
> Hmm, interesting source code. A couple observations:
> 0. `.POSIX` needs to be first non-comment line in the Makefile

That one always trips me up.

> 1. L277: pointer arithmetic is only valid as long as the result is
> within the array or just 1 past it.

This concern is probably completely esoteric, but I can see
how the pointer could theoretically overflow on some weird system
where the kernel doesn't sit in the higher half of the address space ...

> 2. L36: `mail` should be declared `static` as it's not used outside of
> the TU.

A lot of the functions are also non-static right now.
I haven't fully decided whether to split the code into multiple TUs yet,
so it doesn't matter for now.

> Usage of memcpy for string copying is good to see. I think more C
> programmers should start thinking of strings as buffers and tracking
> their length as necessary. Which can both improve efficiency and reduce
> chances of buffer mishandling.
> But in the case of `encode_html()`, stpcpy is probably the proper
> function to use.

Good idea. I've never used stpcpy() before but it looks useful for this project.

> Anyways, I've attached patches for all the above. The stpcpy change is
> opinionated, so feel free to reject that.
> And one more thing:
> /* TODO we should probably handle EINTR and partial reads */
> Best thing to do here is not using `read()` to begin with. Instead use
> `mmap()` to map the file into a private buffer. Otherwise I think using
> `fread` is also an (inferior) option, don't think you need to worry
> about EINTR with fread.

Yeah, I was already considering mmap() because it *might* behave a bit
nicer with large inputs.

> - NRK
Received on Thu Aug 11 2022 - 13:41:00 CEST

This archive was generated by hypermail 2.3.0 : Thu Aug 11 2022 - 14:00:38 CEST