Re: [dev] URI Parser

From: Laslo Hunhold <dev_AT_frign.de>
Date: Thu, 13 Jun 2019 08:49:58 +0200

On Thu, 13 Jun 2019 01:22:48 +0300
Adrian Grigore <adrian.emil.grigore_AT_gmail.com> wrote:

Dear Adrian,

> I started writing a RFC3986 compliant URI parser. It's not done yet.
> Can I get some feedback?
>
> https://adi.tilde.institute/tmp/uri.h
> https://adi.tilde.institute/tmp/parseuri.c

please find my feedback and suggestions below:

1) First off, you might want to think about hosting your own git or (if
that is not possible) at least upload it to GitHub or Sourcehut. Given
such links tend to die and the files themselves aren't too large, I've
attached them to this response so they are preserved for posterity.

2) Don't forget to add a LICENSE.

3) Don't typedef non-opaque structs!!! Remove the typedefs and
rename the structs to lowercase equivalents, e.g. "struct Uri -> struct
uri" etc. (uppercase names are reserved for opaque types, and your
structs are not opaque)

4) For parsers, it always makes sense to write a test-program that
parses a number of test-cases. Take a look at [0] for a set of
test-cases (especially uris.xslt). Judging from the filename, you are
also probably planning to write a resolver. [0] also provides
test-cases for that.

5) If you are not planning to write a resolver. better rename
parseuri.c to uri.c, so it matches the header. Even if you are in fact
planning to write a resolver, I'd just put it all inside the single
uri.c so it is easier to handle.

6)
#define EURINOSCHEME 96
#define EURIBADSCHEME 97
#define EURIBADUSERINFO 98
#define EURIBADHOST 99
#define EURIBADIPLIT 100

This can be written as an enum, starting with 96. Also maybe the names
could be a bit better to read, as follows. You can then use this enum
as a type as well, in case you pass these errors around as arguments.

enum uri_error {
        EURI_NOSCHEME = 96,
        EURI_BADSCHEME,
        EURI_BADUSERINFO,
        EURI_BADHOST,
        EURI_BADIPLIT,
}

7) I don't see any reason why you would allocate memory anywhere in the
code. There is no need for that. Also, if you insist on allocating
memory, check your malloc()'s and strdup()'s.

8) Regarding the memory management, the way you do it is questionable.
Your function syntax for the main function is

        Uri *parseuri(const char *uri);

and you proceed to allocate a struct Uri within parseuri() that you
later return as a pointer. This is very very bad practice, and you
should rather make it such that the user passes a pointer to an
allocated struct to your parseuri()-function. Then you can change the
return value to an int and reflect error-conditions as follows:

        0 -> all good
        1,2,3 -> error, could be matched with the uri_error enum above

9) Most importantly, don't you think this could be done simpler? Your
code has a lot of commented out stuff. I don't know why you would send
your code to the mailing list for feedback when it contains so much
commented out stuff. Try to clean it up and refactor for more
simplicity.

I hope this feedback is helpful.

With best regards

Laslo Hunhold

-- 
Laslo Hunhold <dev_AT_frign.de>




Received on Thu Jun 13 2019 - 08:49:58 CEST

This archive was generated by hypermail 2.3.0 : Thu Jun 13 2019 - 09:00:09 CEST