Accelerating swapless machines [UK Linux]

Prev: Earn Online, Want to know how?
Next: Seaeching by message content

From: Theo Markettos on 27 Feb 2010 17:48

Nix <nix-razor-pit(a)esperi.org.uk> wrote:
> On 24 Feb 2010, Jim A. stated:
>
> > There are (at least) two issues with flash drives. One is that write
> > speeds can be really slow. The other issue is whether excessive
> > writes will eventually wear out the ssd.
>
> IIRC modern high-end Flash disks are starting to render this academic
> (as in 'you need to write to it at top speed for fifty years before it
> starts to wear out'). This definitely hasn't got down to the low end,
> though, and there's some evidence that even at the high end they do this
> by throttling writes even mor (thus changing the definition of 'top
> speed' to make their prediction come out true).

I've been running USB flash as root+swap on my router for about 3 years now.
The router only has 32MB RAM and runs full Debian/mips, so it swaps quite a
lot.

I replaced the disk recently as the router was behaving oddly, though it
seems to be OK when I use it as a generic USB stick. So it hasn't been
causing trouble after about 2 years of constant use. These are 10-20 pound
USB flash sticks, nothing special.

(Upgrading was a very good idea: I picked a stick that was 'ReadyBoost'
capable and saw an immediate speed improvement. Logging in was about twice
as fast - bash being very slow to start up. I assumed this was due to the
faster swap, though ICBW)

Theo

From: Martin Gregorie on 27 Feb 2010 18:29

On Sat, 27 Feb 2010 22:42:09 +0000, Theo Markettos wrote:

> If I do:
>
> char i[2*1024*1024*1024];
>
> that will surely fail, full stop?
>
Eventually, yes, but probably not immediately. AFAIK that declaration will
grow the process virtual memory space but not actually soak up memory
until you start to fill the array, whereupon it grabs pages as needed.

--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |

From: Nix on 28 Feb 2010 07:51

On 27 Feb 2010, Theo Markettos uttered the following:

> Nix <nix-razor-pit(a)esperi.org.uk> wrote:
>> On 24 Feb 2010, Theo Markettos outgrape:
>> > 3844 atm26 20 0 627m 352m 27m R 66 35.5 618:00.14 epiphany-browse
>>
>> !!!!!
>
> The machine ran out of battery on me, but I think in that configuration I
> had a dozen or so tabs open, possibly a few more. There was no doubt some
> Flash doing pointless things in invisible windows too.

That memory consumption is still pretty awful.

>> > 2911 root 20 0 344m 68m 13m S 23 6.9 107:06.22 Xorg
>>
>> ow.
>
> I think that's Flash pointlessly animating things. Oh for the day HTML5 is
> useful.

344Mb caused by a couple of animations? I'd blame epiphany first. Gecko
*loves* uploading crazy numbers of pixmaps to the server.

>> I'm not surprised you're running really slowly. Your fundamental problem
>> is that you're running appalling memory hogs and the system really
>> *wants* to swap, but can't. It's *possible* that the compressed
>> in-memory swapping code in 2.6.33 may help, but I suspect this is simply
>> a job too far for this machine.
>
> That's why I was wondering about adding RAM. But folks seemed to suggest it
> was disc that's the bottleneck, which is why I'm confused.

Well, if you run out of RAM you start paging, which involves a lot of
disk reads. With no swap you won't write, though, and it's writing
that's slow on an SSD...

> I do use Netsurf as a trim browser, but it's not that useful for
> interactive sites (it has no Javascript). I use Konqueror elsewhere and find it
> a bit primitive. Opera after about version 7 seems to be a CPU hog (on the
> 'bigger' Centrino machine).
>
> I suppose there's Chrome(ium) and Webkit-based browsers to try. Any
> particularly worth looking at?

I have no idea how big Safari is, sorry. Konqi may be primitive but it
does everything I want (I've never been into any FF extensions, so I
don't miss them).

>> Xorg isn't much of an eater of CPU cycles, and your problem is really RAM,
>> not CPU: I'd actually suggest trading off CPU for RAM, hence compressed in-
>> memory swapping.
>
> Hmm... that's not my experience. On the 'other' machine (1.6GHz Centrino.
> 1.5GB RAM, 2GB swap):
>
> top - 22:34:25 up 24 days, 12:25, 7 users, load average: 0.87, 0.94, 0.95
> Tasks: 198 total, 2 running, 195 sleeping, 1 stopped, 0 zombie
> Cpu(s): 60.5%us, 7.7%sy, 0.0%ni, 31.2%id, 0.0%wa, 0.2%hi, 0.4%si,
> 0.0%st
> Mem: 1544104k total, 1486452k used, 57652k free, 20920k buffers
> Swap: 1991504k total, 735416k used, 1256088k free, 185372k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 15932 atm26 20 0 922m 732m 24m R 61.1 48.6 1004:37 firefox
> 3929 root 20 0 276m 101m 8032 S 2.1 6.7 374:39.55 Xorg
> 5084 atm26 20 0 25100 4944 3992 S 0.7 0.3 75:49.17 multiload-apple
> 17597 atm26 20 0 124m 20m 6488 S 0.5 1.3 21:56.19 gnome-terminal
>
> That's a ratio of 1:3 Firefox to Xorg execution time.

Bear in mind that X is rendering *everything* for *every* application: if e.g.
you ask it to do 3D stuff and you don't have any hardware support, it'll have
to render *that* in software as well, which is a real CPU pig. This is bad
for all applications, not just X, because when X is using CPU it's not
servicing requests from other apps.

Regarding X there are a lot of variables that can enormously affect your
CPU consumption.

What's the video card, what acceleration method is X using, and is the
render extension accelerated? Older X installations will be using XAA,
which doesn't accelerate the render extension at all (except on Radeons,
and even there that code recently bitrotted and was removed). EXA has
substantially lower CPU requirements. If you can move to kernel
modesetting that will enable further acceleration that should push CPU
consumption further down (both new Intel and new Radeon cards only
accelerate 2D to any great degree when kernel modesetting is enabled).

If X's CPU consumption remains problematic, install sysprof (or some
other decent systemwide profiler), profile it under load, and send the
profiles to the xorg mailing list. Someone is sure to be interested in
making X run faster :)

(Still, 384min of CPU time doesn't seem particularly appalling. X has
used 67min here, and the machine's only been up for four days, and I've
done all the stuff above to push CPU consumption down. X has a lot to
do: doing it takes CPU.)

> And yes, Firefox is
> taking that much CPU just sitting in the background. Probably Flash again,

Guess why I don't run FF. (It's probably bombarding the X server with
requests as well.)

> despite having Flashblock active (did you notice I'm not Flash's greatest
> fan?)

Join the club.

>> You don't get out of memory errors on most Linux systems. You get slow
>> systems, then really slow systems, then things start to get killed and
>> messages about it land in the *system log*, not on the screen. (But
>> things need to get really bad by then.)
>
> Even if there's no swap to resort to? I can understand thrashing swap to
> death, but if there's nothing to swap to and you try to run a zillion
> applications at once, what happens?

It gets slow because, though it is unable to swap out dirty data, it
*can* throw out clean file-backed pages (largely text pages from
binaries and shared libraries). Unfortunately because it *can't* swap
out dirty data, it has to leave it all in memory, even the hundreds of
kB of gettext-related rubbish mapped into virtually every app, and the
relocations for every single app you're running, even if they're never
accessed: and in their place it may have to throw out very popular pages
from binaries, only to have to read them back in again a fraction of a
second later.

Oh yes, that's another memory usage saver: run prelink.

> One option is to leave all the binaries on disc and only mmap() in small
> chunks of code. This will make things very slow.

That's what's already happening :)

> But that doesn't help for
> data. If I do:
>
> char i[2*1024*1024*1024];
>
> that will surely fail, full stop?

No. Well, yes, but only because you've exceeded the stack ulimit. If you
stuck it on the heap, you could expect it to work fine, up to the bounds
of contiguous address space. e.g.

{
char *foo;
size_t i;

if ((foo = malloc (2*1024*1024*1024)) == NULL)
{
puts ("On a 64-bit machine, or a 32-bit machine with 2GiB of");
puts ("contiguous address space to hand, this will never fail");
puts ("with overcommit on.");
}

for (i = 0; i < 2*1024*1024*1024; i+=1024)
{
puts ("This will fail.");
}
}

With overcommit on, storage is only allocated when accessed. The
advantage of this is that large sparse arrays and so on take no space at
all (fairly common in some classes of app). The disadvantage is that
when you *do* run out of memory, the OS has to do a lot of work to
figure out who to kill (and sometimes guesses wrong), and the killed app
doesn't get a nice NULL return, it gets a SIGKILL. With overcommit off,
the system operates like Solaris, and demands that every allocated page
must be backed by a page of RAM or swap at all times. Obviously this
uses a lot more memory, and a lot of that is liable to be wasted
(fork() a 2GiB program and you need 2GiB of swap, even if it's just
about to exec() "/bin/ls" and will never use any of that space).

Even with overcommit on you sometimes might get spontaneously killed for
being out of memory. The fork()/exec() case will kill the child with no
way to stop it, and if you run out of memory expanding the stack (which
is always lazily allocated), you're as dead with overcommit off as with
it on. As such, I consider overcommit-off to bring a false sense of
security at huge cost.

From: Nix on 28 Feb 2010 07:53

On 27 Feb 2010, Martin Gregorie said:

> On Sat, 27 Feb 2010 22:42:09 +0000, Theo Markettos wrote:
>
>> If I do:
>>
>> char i[2*1024*1024*1024];
>>
>> that will surely fail, full stop?
>>
> Eventually, yes, but probably not immediately. AFAIK that declaration will
> grow the process virtual memory space but not actually soak up memory
> until you start to fill the array, whereupon it grabs pages as needed.

It's a stack variable. The guard page will grow automatically, and will
probably exceed the ulimit. Pages between the function frame and stack
end will be allocated lazily, but that doesn't help if you're already
dead.

(Hm, it may be file-scope global, in which case the rules are a bit
different and to a first approximation you're right.)

From: Gordon Henderson on 28 Feb 2010 08:46

In article <DAy*WFM4s(a)news.chiark.greenend.org.uk>,
Theo Markettos <theom+news(a)chiark.greenend.org.uk> wrote:

>That's why I was wondering about adding RAM. But folks seemed to suggest it
>was disc that's the bottleneck, which is why I'm confused.

Probably mostly me i(suggesting disk) - as that's what I see on my own
(and wifeys) AAO netbooks with SSD - however it's now become aparently
from other posts that you actually "use" this box! As in use it for day
to day use rather than just for quick little jobs on customer sites -
which is primarly what mine is for - boot it up, run firefox and a fiew
xterms/ssh sessions and that's mostly it.

It's almost as if it's running in old-fashioned PIO mode...

However, even with more RAM, I still think it'll go slow - on my AAO it
stalls totally when it writes to it's SSD - and I have an access LED to
prove it! (I'm somewhat surprised the Dells didn't provide this)

I can load a page in firefox and it stalls half way through - disk LED
comes on - then it loads, then stalls again - disk LED on again and so
on. My thoughts are that firefox is writing cache, history & cookies to
disk and doing an fflush to make sure there're there so when it crashes
it has more on-disk to pick-up from where it left off.

Once a page is laoded it's usually remarkably quick - however I have a
custom compiled kernel and took care to remove as much unused userland
utilities, etc. as I could.

A quick & dirty test:

On the AAO:

# time dd if=/dev/zero of=testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 104.777 s, 10.0 MB/s

real 1m44.817s
user 0m0.008s
sys 0m7.104s

and on my Atom based desktop with a spinny drive:

# time dd if=/dev/zero of=testfile bs=1M count=4000
4000+0 records in
4000+0 records out
4194304000 bytes (4.2 GB) copied, 119.902 s, 35.0 MB/s

real 2m0.219s
user 0m0.040s
sys 0m27.828s

I used a file double the RAM size in both cases - 1GB on the AAO and 4GB
on the desktop to negate RAM buffering...

Raw disk speed doesn't tell us everything though - 35MB/sec on my desktop
is slow by todays standards - the 1990's are calling for that SSD... Doing
an fflush is probably going to cause a lot of small writes to take place -
probably the worst thing for an SSD top have to cope with.

Something a bit more sophisticated - bonnie:

Desktop:

Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
yakko 4G 33679 20 16231 7 31971 7 160.7 0

That more or less echos the raw write speed - 33.7MB/sec - in the test
above, rewrites are a bit slow though.

However the same on the AAO:

Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
pinky 1G 5646 3 2495 1 29142 10 690.2 7

It's struggled with writes and worse, re-writes are really slow -
2.5MB/sec. Maybe this is the crux of it - Web browser/application
re-writing or updating an existing file - cookies, cache, history -
firefox seems to have moved to using the sqlite database and I bet that's
trying really hard to make sure data is flushed to disk...

It does show that seeks to the SSD are fast though - but I suspect that
in cases other than boot, we're not really getting any benefit from this
at all.

When searching for ways to speed up the AAO, I came across several
articles complaining about the slow SSDs - and it turns out Acer used
2 brands - a cheaper and slower one in the early days and a slightly
faster one in later days... There is even an article about replacing
the SSD with a much faster CF card!

So.. Not sure where I'd go from here - if this was a day to day workhorse,
personyll, I'd probably get something with a much faster disk - or look
in installing a faster SSD or CF type of thing - or even move to a spinny
disk, if possible...

Gordon

First | Prev | Next | Last
Pages: 1 2 3 4 5 6
Prev: Earn Online, Want to know how?
Next: Seaeching by message content