* Silvan Jegen <s.jegen_AT_gmail.com> [2014-01-15 20:43:54 +0100]:
> Note, though, that GNU's tr does not seem to handle Unicode at all[1]
> while this version of tr, according to "perf record/report", seems to
> spend most of its running time in the Unicode handling functions of glibc.
multi-byte string decoding is known to be slow in glibc
eg see the utf8 decoding benchmark in
http://www.etalabs.net/compare_libcs.html
> By no means was this any serious benchmarking but eliminating the function
> pointer did not seem to make an obvious difference.
note that recent gcc (4.7?) can do function pointer inlining
if it can infere that the function is in the same tu
(and with lto it can probably do cross-tu inlining)
> +void
> +handleescapes(char *s)
> +{
> + switch(*s) {
> + case 'n':
> + *s = '\x0A';
> + break;
> + case 't':
> + *s = '\x09';
> + break;
> + case '\\':
> + *s = '\x5c';
what's wrong with '\n' etc here?
btw a fully posix conformant tr implementation is available here:
http://git.musl-libc.org/cgit/noxcuse/tree/src/tr.c
(but this is probably outside of the scope of sbase)
Received on Wed Jan 15 2014 - 21:36:07 CET