Re: [dev] stderr: unnecessary?

From: David Tweed <david.tweed_AT_gmail.com>
Date: Sat, 12 Jun 2010 15:18:57 +0100

On Sat, Jun 12, 2010 at 12:06 PM, Kris Maglione <maglione.k_AT_gmail.com> wrote:
> On Sat, Jun 12, 2010 at 12:53:27PM +0200, pancake wrote:
>>
>> On Jun 12, 2010, at 9:27 AM, Connor Lane Smith <cls_AT_lubutu.com> wrote:
>>>
>>> On 12 June 2010 08:00, Kris Maglione <maglione.k_AT_gmail.com> wrote:
>>> Except it can actually fetch as much data as is addressable in memory
>>> in a single call, if the kernel and library are tailored to.
>>
>> That's why mmap is for. Using read is just stupid.
>
> mmap is silly. If you want that much data mapped, it's because you want fast
> access to it. If you just want random access to it, you read it as you need
> it. mmap doesn't offer any performance advantage. When you touch a page that
> wasn't already there, the kernel has to fault it in, which is already as
> expensive as the read system call, and even more so because of the coarse
> granularity. It needs to read in an entire page, even if all you need is a
> byte. And if you need a dword across a page boundary, you get two faults and
> two pages read in. There's really just no point.

I just know I'm going to regret getting involved in this but...

My understanding is that on Linux at least, reading causes the data to
be moved into the kernel's page cache (which I believe has a page
level granularity even if you "read only a byte"), and then a copy is
made from the page cache into the processes memory space. Mmapping it
means your process gets the page cache page mapped into its address
space, so the data is only in memory once rather than an average of
1.x times where x depends on pagecache discard policy. So IF you are
genuinely moving unpredictably around accessing a truly huge file,
mmapping it means that you can fit more of it in memory rather than
having both your program and the page cache trying to figure out which
bits to discard in an attempt to keep memory usage down. This effect
is actually much more important with huge files than smaller files
where the page cache duplication doesn't have as much effect on system
memory usage as a whole.

-- 
cheers, dave tweed__________________________
computer vision reasearcher: david.tweed_AT_gmail.com
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
Received on Sat Jun 12 2010 - 14:18:57 UTC

This archive was generated by hypermail 2.2.0 : Sat Jun 12 2010 - 14:24:01 UTC