Downsides to madvise/fadvise(willneed) for application startup [Kernel]

Prev: CONFIDENTIAL
Next: asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()

From: Minchan Kim on 7 Apr 2010 04:10

On Wed, Apr 7, 2010 at 4:47 PM, Wu Fengguang <fengguang.wu(a)intel.com> wrote:
> On Wed, Apr 07, 2010 at 03:33:52PM +0800, Minchan Kim wrote:
>> On Wed, Apr 7, 2010 at 4:14 PM, Wu Fengguang <fengguang.wu(a)intel.com> wrote:
>> > On Wed, Apr 07, 2010 at 12:06:07PM +0800, Minchan Kim wrote:
>> >> On Wed, Apr 7, 2010 at 11:54 AM, Taras Glek <tglek(a)mozilla.com> wrote:
>> >> > On 04/06/2010 07:24 PM, Wu Fengguang wrote:
>> >> >>
>> >> >> Hi Taras,
>> >> >>
>> >> >> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
>> >> >>
>> >> >>>
>> >> >>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
>> >> >>>
>> >> >>>>
>> >> >>>> Hello,
>> >> >>>> I am working on improving Mozilla startup times. It turns out that page
>> >> >>>> faults(caused by lack of cooperation between user/kernelspace) are the
>> >> >>>> main cause of slow startup. I need some insights from someone who
>> >> >>>> understands linux vm behavior.
>> >> >>>>
>> >> >>
>> >> >> How about improve Fedora (and other distros) to preload Mozilla (and
>> >> >> other apps the user run at the previous boot) with fadvise() at boot
>> >> >> time? This sounds like the most reasonable option.
>> >> >>
>> >> >
>> >> > That's a slightly different usecase. I'd rather have all large apps startup
>> >> > as efficiently as possible without any hacks. Though until we get there,
>> >> > we'll be using all of the hacks we can.
>> >> >>
>> >> >> As for the kernel readahead, I have a patchset to increase default
>> >> >> mmap read-around size from 128kb to 512kb (except for small memory
>> >> >> systems). This should help your case as well.
>> >> >>
>> >> >
>> >> > Yes. Is the current readahead really doing read-around(ie does it read pages
>> >> > before the one being faulted)? From what I've seen, having the dynamic
>> >> > linker read binary sections backwards causes faults.
>> >> > http://sourceware.org/bugzilla/show_bug.cgi?id=11447
>> >> >>
>> >> >>
>> >> >>>>
>> >> >>>> Current Situation:
>> >> >>>> The dynamic linker mmap()s executable and data sections of our
>> >> >>>> executable but it doesn't call madvise().
>> >> >>>> By default page faults trigger 131072byte reads. To make matters worse,
>> >> >>>> the compile-time linker + gcc lay out code in a manner that does not
>> >> >>>> correspond to how the resulting executable will be executed(ie the
>> >> >>>> layout is basically random). This means that during startup 15-40mb
>> >> >>>> binaries are read in basically random fashion. Even if one orders the
>> >> >>>> binary optimally, throughput is still suboptimal due to the puny
>> >> >>>> readahead.
>> >> >>>>
>> >> >>>> IO Hints:
>> >> >>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
>> >> >>>> reads and a binary that tends to take 110 page faults(ie program stops
>> >> >>>> execution and waits for disk) can be reduced down to 6. This has the
>> >> >>>> potential to double application startup of large apps without any clear
>> >> >>>> downsides.
>> >> >>>>
>> >> >>>> Suse ships their glibc with a dynamic linker patch to fadvise()
>> >> >>>> dynamic libraries(not sure why they switched from doing madvise
>> >> >>>> before).
>> >> >>>>
>> >> >>
>> >> >> This is interesting. I wonder how SuSE implements the policy.
>> >> >> Do you have the patch or some strace output that demonstrates the
>> >> >> fadvise() call?
>> >> >>
>> >> >
>> >> > glibc-2.3.90-ld.so-madvise.diff in
>> >> > http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0:
>> >> >
>> >> > As I recall they just fadvise the filedescriptor before accessing it.
>> >> >>
>> >> >>
>> >> >>>>
>> >> >>>> I filed a glibc bug about this at
>> >> >>>> http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
>> >> >>>> with his concern about wasting memory resources. What is the impact of
>> >> >>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
>> >> >>>> pressure? Does the kernel simply start ignoring these hints?
>> >> >>>>
>> >> >>>
>> >> >>> It will throttle based on memory pressure. In idle situations it will
>> >> >>> eat your file cache, however, to satisfy the request.
>> >> >>>
>> >> >>> Now, the file cache should be much bigger than the amount of unneeded
>> >> >>> pages you prefault with the hint over the whole library, so I guess the
>> >> >>> benefit of prefaulting the right pages outweighs the downside of evicting
>> >> >>> some cache for unused library pages.
>> >> >>>
>> >> >>> Still, it's a workaround for deficits in the demand-paging/readahead
>> >> >>> heuristics and thus a bit ugly, I feel. Maybe Wu can help.
>> >> >>>
>> >> >>
>> >> >> Program page faults are inherently random, so the straightforward
>> >> >> solution would be to increase the mmap read-around size (for desktops
>> >> >> with reasonable large memory), rather than to improve program layout
>> >> >> or readahead heuristics :)
>> >> >>
>> >> >
>> >> > Program page faults may exhibit random behavior once they've started.
>> >> >
>> >> > During startup page-in pattern of over-engineered OO applications is very
>> >> > predictable. Programs are laid out based on compilation units, which have no
>> >> > relation to how they are executed. Another problem is that any large old
>> >> > application will have lots of code that is either rarely executed or
>> >> > completely dead. Random sprinkling of live code among mostly unneeded code
>> >> > is a problem.
>> >> > I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB with
>> >> > proper binary layout. Even if one lays out a program wrongly, the worst-case
>> >> > pagein pattern will be pretty similar to what it is by default.
>> >> >
>> >> > But yes, I completely agree that it would be awesome to increase the
>> >> > readahead size proportionally to available memory. It's a little silly to be
>> >> > reading tens of megabytes in 128kb increments :) You rock for trying to
>> >> > modernize this.
>> >>
>> >> Hi, Wu and Taras.
>> >>
>> >> I have been watched at this thread.
>> >> That's because I had a experience on reducing startup latency of application
>> >> in embedded system.
>> >>
>> >> I think sometime increasing of readahead size wouldn't good in embedded.
>> >> Many of embedded system has nand as storage and compression file system.
>> >> About nand, as you know, random read effect isn't rather big than hdd.
>> >> About compression file system, as one has a big compression,
>> >> it would make startup late(big block read and decompression).
>> >> We had to disable readahead of code page with kernel hacking.
>> >> And it would make application slow as time goes by.
>> >> But at that time we thought latency is more important than performance
>> >> on our application.
>> >>
>> >> Of course, it is different whenever what is file system and
>> >> compression ratio we use .
>> >> So I think increasing of readahead size might always be not good.
>> >>
>> >> Please, consider embedded system when you have a plan to tweak
>> >> readahead, too. :)
>> >
>> > Minchan, glad to know that you have experiences on embedded Linux.
>> >
>> > While increasing the general readahead size from 128kb to 512kb, I
>> > also added a limit for mmap read-around: if system memory size is less
>> > than X MB, then limit read-around size to X KB. For example, do only
>> > 128KB read-around for a 128MB embedded box, and 32KB ra for 32MB box.
>> >
>> > Do you think it a reasonable safety guard? Patch attached.
>>
>> Thanks for reply, Wu.
>>
>> I didn't have looked at the your attachment.
>> That's because it's not matter of memory size in my case.
>
> In general, the more memory size, the less we care about the possible
> readahead misses :)
>
>> It was alone application on system and it was first main application of system.
>> It means we had a enough memory.
>>
>> I guess there are such many of embedded system.
>> At that time, although I could disable readahead totally with read_ahead_kb,
>> I didn't want it. That's because I don't want to disable readahead on
>> the file I/O
>> and data section of program. So at a loss, I hacked kernel to disable
>> readahead of
>> only code section.
>
> I would like to auto tune readahead size based on the device's
> IO throughput and latency estimation, however that's not easy..

Indeed.

> Other than that, if we can assert "this class of devices won't benefit
> from large readahead", then we can do some static assignment.

A few month ago, I saw your patch about enhancing readahead.
At that time, many guys tested several size of USB and SSD which are
consist of nand device.
The result is good if we does readahead untile some crossover point.
So I think we need readahead about file I/O in non-rotation device, too.

But startup latency is important than file I/O performance in some machine.
With analysis at that time, code readahead of application affected slow startup.
In addition, during bootup, cache hit ratio was very small.

So I hoped we can disable readahead just only code section(ie, roughly
exec vma's filemap fault). :)

I don't want you to solve this problem right now.
Just let you understand embedded system's some problem
for enhancing readahead in future. :)

> Thanks,
> Fengguang
>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 7 Apr 2010 04:20

Minchan,

> A few month ago, I saw your patch about enhancing readahead.
> At that time, many guys tested several size of USB and SSD which are
> consist of nand device.
> The result is good if we does readahead untile some crossover point.
> So I think we need readahead about file I/O in non-rotation device, too.
>
> But startup latency is important than file I/O performance in some machine.
> With analysis at that time, code readahead of application affected slow startup.
> In addition, during bootup, cache hit ratio was very small.
>
> So I hoped we can disable readahead just only code section(ie, roughly
> exec vma's filemap fault). :)
>
> I don't want you to solve this problem right now.
> Just let you understand embedded system's some problem
> for enhancing readahead in future. :)

Yeah, I've never heard of such a demand, definitely good to know it!

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Taras Glek on 8 Apr 2010 13:50

On 04/07/2010 12:38 AM, Wu Fengguang wrote:
> On Wed, Apr 07, 2010 at 10:54:58AM +0800, Taras Glek wrote:
>
>> On 04/06/2010 07:24 PM, Wu Fengguang wrote:
>>
>>> Hi Taras,
>>>
>>> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
>>>
>>>
>>>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
>>>>
>>>>
>>>>> Hello,
>>>>> I am working on improving Mozilla startup times. It turns out that page
>>>>> faults(caused by lack of cooperation between user/kernelspace) are the
>>>>> main cause of slow startup. I need some insights from someone who
>>>>> understands linux vm behavior.
>>>>>
>>>>>
>>> How about improve Fedora (and other distros) to preload Mozilla (and
>>> other apps the user run at the previous boot) with fadvise() at boot
>>> time? This sounds like the most reasonable option.
>>>
>>>
>> That's a slightly different usecase. I'd rather have all large apps
>> startup as efficiently as possible without any hacks. Though until we
>> get there, we'll be using all of the hacks we can.
>>
> Boot time user space readahead can do better than kernel heuristic
> readahead in several ways:
>
> - it can collect better knowledge on which files/pages will be used
> which lead to high readahead hit ratio and less cache consumption
>
> - it can submit readahead requests for many files in parallel,
> which enables queuing (elevator, NCQ etc.) optimizations
>
> So I won't call it dirty hack :)
>
>
Fair enough.
>>> As for the kernel readahead, I have a patchset to increase default
>>> mmap read-around size from 128kb to 512kb (except for small memory
>>> systems). This should help your case as well.
>>>
>>>
>> Yes. Is the current readahead really doing read-around(ie does it read
>> pages before the one being faulted)? From what I've seen, having the
>>
> Sure. It will do read-around from current fault offset - 64kb to +64kb.
>
That's excellent.
>
>> dynamic linker read binary sections backwards causes faults.
>> http://sourceware.org/bugzilla/show_bug.cgi?id=11447
>>
> There are too many data in
> http://people.mozilla.com/~tglek/startup/systemtap_graphs/ld_bug/report.txt
> Can you show me the relevant lines? (wondering if I can ever find such lines..)
>
The first part of the file lists sections in a file and their hex
offset+size.

lines like 0 512 offset(#1) mean a read at position 0 of 512 bytes.
Incidentally this first read is coming from vfs_read, so the log doesn't
take account readahead (unlike the other reads caused by mmap page faults).

So
15310848 131072 offset(#2)=====================
eaa73c 1523c .bss
eaa73c 19d1e .comment

15142912 131072 offset(#3)=====================
e810d4 200 .dynamic
e812d4 470 .got
e81744 3b50 .got.plt
e852a0 2549c .data

Shows 2 reads where the dynamic linker first seeks to the end of the
file(to zero out .bss, causing IO via COW) and the backtracks to
read in .dynamic. However you are right, all of the backtracking reads
are over 64K.
Thanks for explaining that. I am guessing your change to boost
readaround will fix this issue nicely for firefox.

>>>
>>>
>>>>> Current Situation:
>>>>> The dynamic linker mmap()s executable and data sections of our
>>>>> executable but it doesn't call madvise().
>>>>> By default page faults trigger 131072byte reads. To make matters worse,
>>>>> the compile-time linker + gcc lay out code in a manner that does not
>>>>> correspond to how the resulting executable will be executed(ie the
>>>>> layout is basically random). This means that during startup 15-40mb
>>>>> binaries are read in basically random fashion. Even if one orders the
>>>>> binary optimally, throughput is still suboptimal due to the puny readahead.
>>>>>
>>>>> IO Hints:
>>>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
>>>>> reads and a binary that tends to take 110 page faults(ie program stops
>>>>> execution and waits for disk) can be reduced down to 6. This has the
>>>>> potential to double application startup of large apps without any clear
>>>>> downsides.
>>>>>
>>>>> Suse ships their glibc with a dynamic linker patch to fadvise()
>>>>> dynamic libraries(not sure why they switched from doing madvise
>>>>> before).
>>>>>
>>>>>
>>> This is interesting. I wonder how SuSE implements the policy.
>>> Do you have the patch or some strace output that demonstrates the
>>> fadvise() call?
>>>
>>>
>> glibc-2.3.90-ld.so-madvise.diff in
>> http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0:
>>
> 550 Can't open
> /pub/linux/distributions/suse/pub/suse/update/10.1/rpm/src/glibc-2.4-31.12.3.src.rpm:
> No such file or directory
>
> OK I give up.
>
>
>> As I recall they just fadvise the filedescriptor before accessing it.
>>
> Obviously this is a bit risky for small memory systems..
>
>
>>>>> I filed a glibc bug about this at
>>>>> http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
>>>>> with his concern about wasting memory resources. What is the impact of
>>>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
>>>>> pressure? Does the kernel simply start ignoring these hints?
>>>>>
>>>>>
>>>> It will throttle based on memory pressure. In idle situations it will
>>>> eat your file cache, however, to satisfy the request.
>>>>
>>>> Now, the file cache should be much bigger than the amount of unneeded
>>>> pages you prefault with the hint over the whole library, so I guess the
>>>> benefit of prefaulting the right pages outweighs the downside of evicting
>>>> some cache for unused library pages.
>>>>
>>>> Still, it's a workaround for deficits in the demand-paging/readahead
>>>> heuristics and thus a bit ugly, I feel. Maybe Wu can help.
>>>>
>>>>
>>> Program page faults are inherently random, so the straightforward
>>> solution would be to increase the mmap read-around size (for desktops
>>> with reasonable large memory), rather than to improve program layout
>>> or readahead heuristics :)
>>>
>>>
>> Program page faults may exhibit random behavior once they've started.
>>
> Right.
>
>
>> During startup page-in pattern of over-engineered OO applications is
>> very predictable. Programs are laid out based on compilation units,
>> which have no relation to how they are executed. Another problem is that
>> any large old application will have lots of code that is either rarely
>> executed or completely dead. Random sprinkling of live code among mostly
>> unneeded code is a problem.
>>
> Agreed.
>
>
>> I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB
>> with proper binary layout. Even if one lays out a program wrongly, the
>> worst-case pagein pattern will be pretty similar to what it is by default.
>>
> That's great. When will we enjoy your research fruits? :)
>
Released it yesterday. Hopefully other bloated binaries will benefit
from this too.

http://blog.mozilla.com/tglek/2010/04/07/icegrind-valgrind-plugin-for-optimizing-cold-startup/

Thanks a lot Wu, I feel I understand the kernel side of what's happening
now.

Taras
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Wu Fengguang on 11 Apr 2010 22:30

On Fri, Apr 09, 2010 at 01:44:41AM +0800, Taras Glek wrote:
> On 04/07/2010 12:38 AM, Wu Fengguang wrote:
> > On Wed, Apr 07, 2010 at 10:54:58AM +0800, Taras Glek wrote:
> >
> >> On 04/06/2010 07:24 PM, Wu Fengguang wrote:
> >>
> >>> Hi Taras,
> >>>
> >>> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
> >>>
> >>>
> >>>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
> >>>>
> >>>>
> >>>>> Hello,
> >>>>> I am working on improving Mozilla startup times. It turns out that page
> >>>>> faults(caused by lack of cooperation between user/kernelspace) are the
> >>>>> main cause of slow startup. I need some insights from someone who
> >>>>> understands linux vm behavior.
> >>>>>
> >>>>>
> >>> How about improve Fedora (and other distros) to preload Mozilla (and
> >>> other apps the user run at the previous boot) with fadvise() at boot
> >>> time? This sounds like the most reasonable option.
> >>>
> >>>
> >> That's a slightly different usecase. I'd rather have all large apps
> >> startup as efficiently as possible without any hacks. Though until we
> >> get there, we'll be using all of the hacks we can.
> >>
> > Boot time user space readahead can do better than kernel heuristic
> > readahead in several ways:
> >
> > - it can collect better knowledge on which files/pages will be used
> > which lead to high readahead hit ratio and less cache consumption
> >
> > - it can submit readahead requests for many files in parallel,
> > which enables queuing (elevator, NCQ etc.) optimizations
> >
> > So I won't call it dirty hack :)
> >
> >
> Fair enough.
> >>> As for the kernel readahead, I have a patchset to increase default
> >>> mmap read-around size from 128kb to 512kb (except for small memory
> >>> systems). This should help your case as well.
> >>>
> >>>
> >> Yes. Is the current readahead really doing read-around(ie does it read
> >> pages before the one being faulted)? From what I've seen, having the
> >>
> > Sure. It will do read-around from current fault offset - 64kb to +64kb.
> >
> That's excellent.
> >
> >> dynamic linker read binary sections backwards causes faults.
> >> http://sourceware.org/bugzilla/show_bug.cgi?id=11447
> >>
> > There are too many data in
> > http://people.mozilla.com/~tglek/startup/systemtap_graphs/ld_bug/report.txt
> > Can you show me the relevant lines? (wondering if I can ever find such lines..)
> >
> The first part of the file lists sections in a file and their hex
> offset+size.

> lines like 0 512 offset(#1) mean a read at position 0 of 512 bytes.
> Incidentally this first read is coming from vfs_read, so the log doesn't
> take account readahead (unlike the other reads caused by mmap page faults).

Yes, every binary/library starts with this 512b read. It is requested
by ld.so/ld-linux.so, and will trigger a 4-page readahead. This is not
good readahead. I wonder if ld.so can switch to mmap read for the
first read, in order to trigger a larger 128kb readahead. However this
will introduce a little overhead on VMA operations.

> So
> 15310848 131072 offset(#2)=====================
> eaa73c 1523c .bss
> eaa73c 19d1e .comment
>
> 15142912 131072 offset(#3)=====================
> e810d4 200 .dynamic
> e812d4 470 .got
> e81744 3b50 .got.plt
> e852a0 2549c .data
>
> Shows 2 reads where the dynamic linker first seeks to the end of the
> file(to zero out .bss, causing IO via COW) and the backtracks to
> read in .dynamic. However you are right, all of the backtracking reads
> are over 64K.

This is interesting finding to me, Thanks for the explanation :)

> Thanks for explaining that. I am guessing your change to boost
> readaround will fix this issue nicely for firefox.

You are welcome.

> >>>
> >>>
> >>>>> Current Situation:
> >>>>> The dynamic linker mmap()s executable and data sections of our
> >>>>> executable but it doesn't call madvise().
> >>>>> By default page faults trigger 131072byte reads. To make matters worse,
> >>>>> the compile-time linker + gcc lay out code in a manner that does not
> >>>>> correspond to how the resulting executable will be executed(ie the
> >>>>> layout is basically random). This means that during startup 15-40mb
> >>>>> binaries are read in basically random fashion. Even if one orders the
> >>>>> binary optimally, throughput is still suboptimal due to the puny readahead.
> >>>>>
> >>>>> IO Hints:
> >>>>> Fortunately when one specifies madvise(WILLNEED) pagefaults trigger 2mb
> >>>>> reads and a binary that tends to take 110 page faults(ie program stops
> >>>>> execution and waits for disk) can be reduced down to 6. This has the
> >>>>> potential to double application startup of large apps without any clear
> >>>>> downsides.
> >>>>>
> >>>>> Suse ships their glibc with a dynamic linker patch to fadvise()
> >>>>> dynamic libraries(not sure why they switched from doing madvise
> >>>>> before).
> >>>>>
> >>>>>
> >>> This is interesting. I wonder how SuSE implements the policy.
> >>> Do you have the patch or some strace output that demonstrates the
> >>> fadvise() call?
> >>>
> >>>
> >> glibc-2.3.90-ld.so-madvise.diff in
> >> http://www.rpmseek.com/rpm/glibc-2.4-31.12.3.src.html?hl=com&cba=0:G:0:3732595:0:15:0:
> >>
> > 550 Can't open
> > /pub/linux/distributions/suse/pub/suse/update/10.1/rpm/src/glibc-2.4-31.12.3.src.rpm:
> > No such file or directory
> >
> > OK I give up.
> >
> >
> >> As I recall they just fadvise the filedescriptor before accessing it.
> >>
> > Obviously this is a bit risky for small memory systems..
> >
> >
> >>>>> I filed a glibc bug about this at
> >>>>> http://sourceware.org/bugzilla/show_bug.cgi?id=11431 . Uli commented
> >>>>> with his concern about wasting memory resources. What is the impact of
> >>>>> madvise(WILLNEED) or the fadvise equivalent on systems under memory
> >>>>> pressure? Does the kernel simply start ignoring these hints?
> >>>>>
> >>>>>
> >>>> It will throttle based on memory pressure. In idle situations it will
> >>>> eat your file cache, however, to satisfy the request.
> >>>>
> >>>> Now, the file cache should be much bigger than the amount of unneeded
> >>>> pages you prefault with the hint over the whole library, so I guess the
> >>>> benefit of prefaulting the right pages outweighs the downside of evicting
> >>>> some cache for unused library pages.
> >>>>
> >>>> Still, it's a workaround for deficits in the demand-paging/readahead
> >>>> heuristics and thus a bit ugly, I feel. Maybe Wu can help.
> >>>>
> >>>>
> >>> Program page faults are inherently random, so the straightforward
> >>> solution would be to increase the mmap read-around size (for desktops
> >>> with reasonable large memory), rather than to improve program layout
> >>> or readahead heuristics :)
> >>>
> >>>
> >> Program page faults may exhibit random behavior once they've started.
> >>
> > Right.
> >
> >
> >> During startup page-in pattern of over-engineered OO applications is
> >> very predictable. Programs are laid out based on compilation units,
> >> which have no relation to how they are executed. Another problem is that
> >> any large old application will have lots of code that is either rarely
> >> executed or completely dead. Random sprinkling of live code among mostly
> >> unneeded code is a problem.
> >>
> > Agreed.
> >
> >
> >> I'm able to reduce startup pagefaults by 2.5x and mem usage by a few MB
> >> with proper binary layout. Even if one lays out a program wrongly, the
> >> worst-case pagein pattern will be pretty similar to what it is by default.
> >>
> > That's great. When will we enjoy your research fruits? :)
> >
> Released it yesterday. Hopefully other bloated binaries will benefit
> from this too.
>
> http://blog.mozilla.com/tglek/2010/04/07/icegrind-valgrind-plugin-for-optimizing-cold-startup/

It sounds painful to produce the valgrind log, fortunately the end
user won't suffer.

Is it viable to turn on the "-ffunction-sections -fdata-sections"
options distribution wide? If so, you may sell it to Fedora :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

From: Minchan Kim on 11 Apr 2010 23:30

Hi, Wu.

On Mon, Apr 12, 2010 at 11:27 AM, Wu Fengguang <fengguang.wu(a)intel.com> wrote:
> On Fri, Apr 09, 2010 at 01:44:41AM +0800, Taras Glek wrote:
>> On 04/07/2010 12:38 AM, Wu Fengguang wrote:
>> > On Wed, Apr 07, 2010 at 10:54:58AM +0800, Taras Glek wrote:
>> >
>> >> On 04/06/2010 07:24 PM, Wu Fengguang wrote:
>> >>
>> >>> Hi Taras,
>> >>>
>> >>> On Tue, Apr 06, 2010 at 05:51:35PM +0800, Johannes Weiner wrote:
>> >>>
>> >>>
>> >>>> On Mon, Apr 05, 2010 at 03:43:02PM -0700, Taras Glek wrote:
>> >>>>
>> >>>>
>> >>>>> Hello,
>> >>>>> I am working on improving Mozilla startup times. It turns out that page
>> >>>>> faults(caused by lack of cooperation between user/kernelspace) are the
>> >>>>> main cause of slow startup. I need some insights from someone who
>> >>>>> understands linux vm behavior.
>> >>>>>
>> >>>>>
>> >>> How about improve Fedora (and other distros) to preload Mozilla (and
>> >>> other apps the user run at the previous boot) with fadvise() at boot
>> >>> time? This sounds like the most reasonable option.
>> >>>
>> >>>
>> >> That's a slightly different usecase. I'd rather have all large apps
>> >> startup as efficiently as possible without any hacks. Though until we
>> >> get there, we'll be using all of the hacks we can.
>> >>
>> > Boot time user space readahead can do better than kernel heuristic
>> > readahead in several ways:
>> >
>> > - it can collect better knowledge on which files/pages will be used
>> > which lead to high readahead hit ratio and less cache consumption
>> >
>> > - it can submit readahead requests for many files in parallel,
>> > which enables queuing (elevator, NCQ etc.) optimizations
>> >
>> > So I won't call it dirty hack :)
>> >
>> >
>> Fair enough.
>> >>> As for the kernel readahead, I have a patchset to increase default
>> >>> mmap read-around size from 128kb to 512kb (except for small memory
>> >>> systems). This should help your case as well.
>> >>>
>> >>>
>> >> Yes. Is the current readahead really doing read-around(ie does it read
>> >> pages before the one being faulted)? From what I've seen, having the
>> >>
>> > Sure. It will do read-around from current fault offset - 64kb to +64kb.
>> >
>> That's excellent.
>> >
>> >> dynamic linker read binary sections backwards causes faults.
>> >> http://sourceware.org/bugzilla/show_bug.cgi?id=11447
>> >>
>> > There are too many data in
>> > http://people.mozilla.com/~tglek/startup/systemtap_graphs/ld_bug/report.txt
>> > Can you show me the relevant lines? (wondering if I can ever find such lines..)
>> >
>> The first part of the file lists sections in a file and their hex
>> offset+size.
>
>> lines like 0 512 offset(#1) mean a read at position 0 of 512 bytes.
>> Incidentally this first read is coming from vfs_read, so the log doesn't
>> take account readahead (unlike the other reads caused by mmap page faults).
>
> Yes, every binary/library starts with this 512b read. It is requested
> by ld.so/ld-linux.so, and will trigger a 4-page readahead. This is not
> good readahead. I wonder if ld.so can switch to mmap read for the
> first read, in order to trigger a larger 128kb readahead. However this
> will introduce a little overhead on VMA operations.

AFAIK, kernel reads first sector(ELF header and so one) of binary in
case of binary.
in fs/exec.c,
prepare_binprm()
{
....
return kernel_read(bprm->file, 0, bprm->buf, BINPRM_BUF_SIZE);
}

But dynamic loader uses libc_read for reading of shared library's one.

So you may have a chance to increase readahead size on binary but hard on shared
library. Many of app have lots of shared library so the solution of
only binary isn't big about
performance. :(

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: CONFIDENTIAL
Next: asm-generic: add NEED_SG_DMA_LENGTH to define sg_dma_len()