Re: [dev][sbase][RFC] "tr" with -d option or without?

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Hiltjo Posthuma <hiltjo_AT_codemadness.org>
Date: Sat, 12 Apr 2014 19:19:45 +0200

On Sat, Apr 12, 2014 at 6:58 PM, Silvan Jegen <s.jegen_AT_gmail.com> wrote:
>
>> I'll also probably rewrite the mmap code to use malloc since it causes
>> issues on some machines.
>
> The reason we used mmap was that it allocates memory only on use. So
> even if we mmap space for 1'114'112 ints (one for each unicode point)
> to do the mapping, we do not use
>
> (1'114'112 * 4) / (1024 * 1024) = 8.5MB
>
> of memory but only the few characters we are actually mapping to set2.
>
> Using malloc would result in the memory being allocated regardless
> of whether we actually map all unicode characters or not, leading to
> 8.5MB of memory being used in any case (additionally we would need to
> initialise the memory to zero to make the current algorithm work IIRC,
> since mmap already does that automatically on access).
>
> If there are issues that cannot be worked around though we may not have
> another choice. What kind of issues are you experiencing?
>

Yeah I'm aware of how mmap works and in that way it's quite clever :).
In the future we might want to support character classes ([:lower:],
[:alnum:] etc). I'm not sure how well it would fit in with the current
parsemapping code. For now I'll leave it as is and focus on more
important things or please feel free to help and improve :)

Offtopic but related to tr: mbtowc() return code isn't checked at the
moment, but can be -1 if an invalid character is given as an argument.

Kind regards,
Hiltjo
Received on Sat Apr 12 2014 - 19:19:45 CEST

This archive was generated by hypermail 2.3.0 : Sat Apr 12 2014 - 19:24:21 CEST