Re: [dev] Some questions about sdhcp

From: Michael Forney <mforney_AT_mforney.org>
Date: Thu, 11 Oct 2018 11:00:55 -0700

On 2018-10-11, Markus Wichmann <nullplan_AT_gmx.net> wrote:
> Hi all,
>
> I recently read the source code of sdhcp, and it does seem to be a nice
> tool, but I couldn't help but notice a few things. Now, maybe I'm dense,
> and none of the things I'm about to bring up matter for some reason, and
> if so, that would be really cool. But some of these really do seem like
> major issues, so here goes:

Hey Markus,

I have a branch of sdhcp[0] I'm using for oasis that fixes some of
these issues. I didn't send them to the ML though since they were
linux-specific (timerfd, CLOCK_BOOTTIME, ...), and I wasn't sure if
sdhcp was meant to work on other systems as well.

It's been a while since I looked at it, but sdhcp has mostly been
working fine for me since then. Though, there have been a couple of
occasions where I wasn't able to connect for whatever reason, so I
keep a build of ISC dhclient as a backup. It'd be nice to get any
remaining issues resolved.

[0] https://github.com/michaelforney/sdhcp

> 1. Timeout based on alarm()
> ---------------------------
>
> That is, in principle, not a bad way to do it, but unsuitable for a
> laptop application, which might see extended time spent in a sleep
> state. This is mostly an issue during the Bound state, as I have no idea
> if the alarms get adjusted for the time slept, and what happens if the
> alarm times out during sleep.
>
> That said, I also have no idea what happens to monotonic clocks, so what
> the heck.

> 2. Signal handlers established with signal()
> --------------------------------------------
>
> Unfortunately, the BSD-v-SysV war has destroyed that interface beyond
> usability. On Linux, it depends on the libc, and for glibc also on the
> compiler flags, whether you get oneshot behavior or persistent behavior.
> So now we have to use sigaction() to establish signal handlers, or
> always re-establish the handlers from within, just in case.
>
> musl gives the BSD behavior always, BTW.
>
> 3. Broken timing model
> ----------------------
>
> The code reads the value from the Lease Time option into a variable
> called "t1", and proceeds to use that variable as T1 (i.e. the time
> after which the lease should be renewed). That is not what the lease
> time is!
>
> The RFC states that after the lease time has elapsed, the client must no
> longer use the address granted to it, and failing to do so was the
> reason why many Apple devices where banned from some Ivy League campus
> network or other, some years back; which was a great source of humour
> for me and others. Yet here we are, doing the same thing wrong.
>
> The RFC states that T1 is specified in option 58, and T2 in option 59.
> Failing these, T1 is 1/2 lease time, and T2 is 7/8 lease time. Apropos
> failing:
>
> 4. Christian search routines
> ----------------------------
>
> The Bible states "Seek and you will find" but it might not be a
> great idea to implement this in a computer program. Unfortunately,
> optget() has absolutely no way of communicating failure to find a given
> option, and its callers don't seem to care much, either way. "Be lenient
> in what you accept" is all well and good, but not when the most
> important pieces of data are missing.
>
> 5. Absolutely no timeout in Renewing and Rebinding states
> ---------------------------------------------------------
>
> I couldn't find info on what happens after an alarm fires, but the only
> behaviors that make any sense are (a) alarm() is oneshot, and (b)
> alarm() is cyclical. Once sdhcp has made it into Bound state, an alarm()
> for the lease time is registered. When that fires, no further alarm is
> registered as sdhcp sends out a single request for renewal. If that
> packet, or the acknowledgment of it, drops for any reason, we're stuck
> in Renewing with either no timeout, if (a) is true, or one lease time
> (typically something like 1 day) as timeout, if (b) is true.
>
> The RFC states that the client should set the packet timeout in these
> states to half the remaining time (until T2 or lease time), down to one
> minute.
>
> 6. No XID reroll or comparison
> ------------------------------
>
> The XID is rolled once, at the start of the process, and never rolled
> again. Shouldn't a different XID be used for each separate transaction?
> That might be down to taste, but this one is not: XID is never compared
> to the XID of a received packet. And neither is the CID. Thus, sdhcp
> ends up stealing people's leases! I mean, the whole protocol is
> broadcast, we're bound to see a packet that wasn't meant for us.
>
> 7. Use of the realtime clock for time measurement
> -------------------------------------------------
>
> For some reason I don't quite get, DHCP clients are required to send the
> time they spent working in the packet header. Well, not required; we
> could just null the field. But since we do send something, we might as
> well do it right. Currently, the time is measured by way of the time()
> function, which returns the current realtime clock value. Thus, if we
> were starting up on a system without battery backup, we would get a time
> in 1970. If after getting a lease, the clock is then corrected with NTP,
> suddenly we will tell the DHCP server that we spent almost 50 years
> talking to it, which it might take exception to.
>
> All of this could be solved by way of clock_gettime(CLOCK_MONOTONIC).
> Just wrap it with an interface similar to time() first.

I think what you want here is CLOCK_BOOTTIME, which is like
CLOCK_MONOTONIC, but also counts time sleeping. This way, if you
resume your computer from sleep long after the lease expired, sdhcp
will immediately attempt to get a new lease.

> 8. Race condition for timeouts
> ------------------------------
>
> If an alarm should fire while we are not blocked in a syscall, it will
> be ignored, and the program will hang. That could be solved with a
> global flag that is set from the signal handler, but that will
> significantly increase complexity (since we need to reset the flag after
> taking notice of it) and is not without its pitfalls (cf. ppoll()).
> Might be better just to use a self-pipe.
>
> 9. poll() with 1 descriptor?
> ----------------------------
>
> That one I just don't get. Wouldn't it be simpler to park the program in
> a blocking recv()? Or am I missing something. I mean, as written, the
> poll is conceptually useless. Add another descriptor, that's another
> matter.
>
> 10. UDP recvfrom() misunderstanding
> -----------------------------------
>
> udprecv() seems to suffer from the misconception that the socket address
> supplied to recvfrom() is an input parameter. It is not. A UDP socket
> will receive every packet sent to the right port, and the socket address
> in recvfrom() is an output parameter, identifying the source of that
> particular datagram.
>
> 11. Failing to set "up" flag at start
> -------------------------------------
>
> Most other popular Linux DHCP clients set the "up" flag on the interface
> before starting. I'm not 100% on what happens if it is missing, but I
> guess that would mean no sending and no receiving packets. And I am all
> but certain the flag is missing on startup. Might be worth documenting
> at least.
>
> Want a patch? Might take a while, though.
>
> Ciao,
> Markus
Received on Thu Oct 11 2018 - 20:00:55 CEST

This archive was generated by hypermail 2.3.0 : Thu Oct 11 2018 - 20:12:07 CEST