[dev] Some misc tools that could help others

From: Hadrien Lacour <hadrien.lacour_AT_posteo.net>
Date: Wed, 21 Sep 2022 16:24:28 +0000

Hello,
I've decided to post about some miscellaneous tools (C, sh, AWK) I made to ease
my day to day computing/programming; some of them finally being in a shape I
won't be embarrassed too much by.

┌────────────────────────────────────────────────────────────────────────────┐
https://git.sr.ht/~q3cpma/misc-tools
│ │
│ * genhtab Generate static C99 hash tables │
│ * htmldecode HTML decoding to UTF-8 │
│ * htmlencode HTML encoding from UTF-8 │
│ * mbcut Multibyte aware string trimming │
│ * natsort Natural sorting for UTF-8 │
│ * urldecode URL decoding │
│ * urlencode URL encoding │
│ * wcswidth wcswidth(3) wrapper │
│ │
│ Note: for simplicity, Unicode handling is limited to code points, treating │
│ combining characters and emoji as a sequence of code points instead of a │
│ complete grapheme. │
└────────────────────────────────────────────────────────────────────────────┘
Most useful one would be natsort, due to the constant need for it and lack of
alternatives baring scripting languages packages like Python (natsort), Perl
(Sort::Naturally) or Tcl (lsort -dictionary, built-in).

The other "interesting" one is genhtab, an alternative to gperf that is much
simpler to use and produces smaller binaries for big tables (but slower, cf
genhtab_bench) using a simple chained hashing table (with some tricks due to
being built AOT) using fnv1a; I'll probably try with XXH3 soon, to see if the
speed issue can be mitigated without too much binary bloating.

┌──────────────────────────────────────────────────────────────────────────────┐
https://git.sr.ht/~q3cpma/scripts
│ │
│ Collection of sh, AWK scripts with an exhaustive description of dependencies │
│ and POSIX compliance; a few bash scripts too, when arrays are needed. │
└──────────────────────────────────────────────────────────────────────────────┘
I'm sure everyone here has his script collection following his hacker life, so
here's mine and here's the stuff that might interest people:
* archive_*.sh
        Wrappers for common operations on archives.

* bwrap.bash, bwrap_auto.bash
        Only Bash scripts of the repo, a much saner alternative to Firejail.

* drop_priv.sh
    Nugget I found somewhere to avoid sudo/doas dependencies, but still limit
    privileged areas in scripts.

* map.sh, filter.sh
        Anyone who programs a bit knows what these are.

* hotplug_[u]mount.sh
        Since I don't want and udev automounting crap, these are what I use for
    the usual USB stick/drive.

* mass_rename.sh, rename.sh (should find better names)
        Command and file component based renaming. Simple, but the dry run feature
    is really useful.

* tabulate.sh
    Exactly what it says on the tin, a configurable AWK based table viewer.

* web_man.sh (see .web_man.conf too)
        A man that fetches from the web to get the same page for different OSes.
    Used mainly to check the portability of certain tools/options.

* util.sh
        My enormous collection of sh functions. I suggest you check these:
        https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L29
                        Pure sh fallback for readlink -f

                https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L406
                        Portable head -n#

                https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L413
                        Help text formatting use for all my script help messages.

                https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L479
                        Portable "rand"

                https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L591
                        Like atexit(3), allows the stacking trap '...' EXIT

                https://git.sr.ht/~q3cpma/scripts/tree/master/item/util.sh#L600
                        Portable and simple mktemp [-d] fallback, which is just a C
            program built built and aliased if needed

And this one from another repo: https://git.sr.ht/~q3cpma/posix-build/tree/master/item/build_util.sh#L1300
        Add a C preprocessor directive of the form `#embed_as_string "path" var`
        which creates a `static const char var[]` containing the content of path as
        a proper C string literal

        If that file isn't fully ASCII, it is then considered UTF-8 encoded and
        embedded via a C11 u8"..." string literal (TODO: automatic conversion if
        uchardet/iconv are available, C23/C++20 char8_t handling)


And that's all. Hope it wasn't spam to you. If you too have some small potatoes
stuff that doesn't warrant a project announcement but certainly make your life
easier, don't hesitate to follow.

Regards,
Hadrien Lacour
Received on Wed Sep 21 2022 - 18:24:28 CEST

This archive was generated by hypermail 2.3.0 : Wed Sep 21 2022 - 18:24:36 CEST