Re: [dev] sites linkrot from Christoph Lohmann on 2013-01-13 (dev mail list archive)

From: Christoph Lohmann <20h_AT_r-36.net>
Date: Sun, 13 Jan 2013 10:12:40 +0100

Greetings.

On Sun, 13 Jan 2013 10:12:40 +0100 Kai Hendry <hendry_AT_iki.fi> wrote:
> Hi guys,
>
> Please rip this to shreds https://github.com/kaihendry/linkrot and
> perhaps guide me to a better script. Something that can do the http
> requests in parallel and hence much faster?
>
> I ran it over sites/
> for i in *; do test -d "$i" || continue; linkrot $i > $i.linkrot; done
>
> and the output is over here:
> http://s.natalian.org/2013-01-13/
>
> 000 means the domain didn't resolve. Definitely have some false
> negatives, for e.g. on cat-v. I guess sites sometimes aren't working
> and the failures need to be counted/recorded and when it hits a
> threshold (e.g. 10 consecutive failures in 10 day daily check), only
> then an admin needs to manually intervene?

Could you please make the output of your script more readable?

Something like

$domain$path: Link to %s is not found.
$domain$path: Link to %s does redirect to %s.

The repeated sed strings need the human read to repeatedly parse the
same over and over again, which makes it tiresome to follow and in con‐
junction with the unnatural error codes this is the same as you would
have outputted long XML subtrees that need the reading of many lines
just to grasp simple metadata.

Could you please adapt your script to be more readable? The sed commands
are useless, so don’t output them.

Thanks for your efforts.

Sincerely,

Christoph Lohmann
Received on Sun Jan 13 2013 - 10:12:40 CET

This archive was generated by hypermail 2.3.0 : Sun Jan 13 2013 - 10:24:04 CET