From: Noob on
[ NB: X-posted tocomp.arch andcomp.unix.programmer ]

Within 1-2 years, "mainstream" desktop PCs will probably come
equipped with a "small" (32-128 GB) solid-state drive (SSD) for
the operating system and applications, and (possibly) an additional,
larger (500+ GB) hard-disk drive (HDD) for a user's media (mostly
compressed audio and video).

In the SSD+HDD scenario, I was wondering whether it would be
"better" (for some metric) to have the OS swap to the SSD or to
the HDD?

It might even make sense to be able to define a swap "hierarchy" as
in e.g. 1 GB on the SSD as level 1 and 4 GB on the HDD as level 2?

Supposing the apps reside on the SSD, and we define a large swap
partition on the HDD, does it even make sense to page executable
code to the HDD, instead of just discarding it, and loading it
again from the SSD when needed?

However, I'm not sure SSDs are the perfect match for a swap partition,
as the typical memory page is only 4 KB, whereas (AFAIU) SSDs are
optimized to store data in larger chunks (128 KB ??).

Given the typical Unix directory structure:
http://en.wikipedia.org/wiki/Unix_directory_structure
which directories should go the SSD and which to the HDD?

bin and sbin => SSD
usr => SSD probably
home => HDD
etc ?? => not modified often ?? SSD perhaps
var ??

In short, will widely-available SSDs require OS designers to make
large changes, or is the current infrastructure generic enough?

Regards.
From: Scott Lurndal on
Noob <root(a)127.0.0.1> writes:
>[ NB: X-posted tocomp.arch andcomp.unix.programmer ]
>
>Within 1-2 years, "mainstream" desktop PCs will probably come
>equipped with a "small" (32-128 GB) solid-state drive (SSD) for
>the operating system and applications, and (possibly) an additional,
>larger (500+ GB) hard-disk drive (HDD) for a user's media (mostly
>compressed audio and video).
>
>In the SSD+HDD scenario, I was wondering whether it would be
>"better" (for some metric) to have the OS swap to the SSD or to
>the HDD?

It's "Better" (for all metrics) to not swap.

>
>It might even make sense to be able to define a swap "hierarchy" as
>in e.g. 1 GB on the SSD as level 1 and 4 GB on the HDD as level 2?
>
>Supposing the apps reside on the SSD, and we define a large swap
>partition on the HDD, does it even make sense to page executable
>code to the HDD, instead of just discarding it, and loading it
>again from the SSD when needed?

No. It never makes sense to page executable code.


My current test system has 112 processors, 1TB memory and 64 Intel
SSD drives (attached to 16 LSI hardware raid controllers). It never
swaps :-). Best I/O throughput to date has been about 11 Gigabytes/second,
for workloads consisting predominately of random reads.

scott
From: Stephen Fuld on
On 3/4/2010 10:35 AM, Scott Lurndal wrote:
> Noob<root(a)127.0.0.1> writes:
>> [ NB: X-posted tocomp.arch andcomp.unix.programmer ]
>>
>> Within 1-2 years, "mainstream" desktop PCs will probably come
>> equipped with a "small" (32-128 GB) solid-state drive (SSD) for
>> the operating system and applications, and (possibly) an additional,
>> larger (500+ GB) hard-disk drive (HDD) for a user's media (mostly
>> compressed audio and video).
>>
>> In the SSD+HDD scenario, I was wondering whether it would be
>> "better" (for some metric) to have the OS swap to the SSD or to
>> the HDD?
>
> It's "Better" (for all metrics) to not swap.
>
>>
>> It might even make sense to be able to define a swap "hierarchy" as
>> in e.g. 1 GB on the SSD as level 1 and 4 GB on the HDD as level 2?
>>
>> Supposing the apps reside on the SSD, and we define a large swap
>> partition on the HDD, does it even make sense to page executable
>> code to the HDD, instead of just discarding it, and loading it
>> again from the SSD when needed?
>
> No. It never makes sense to page executable code.

While, in general, I agree, there may be one exception. If the
application executable resides on the HDD, it might make sense to swap,
once, the pages to SSD, so you could do any future page-ins from the SSD
instead of the HDD.


> My current test system has 112 processors,

112 is an interesting number. What drive that, as opposed to a more
"divisible" number like 128, or even 96?

--
- Stephen Fuld
(e-mail address disguised to prevent spam)
From: Eric Sosman on
On 3/4/2010 11:48 AM, Noob wrote:
> [ NB: X-posted tocomp.arch andcomp.unix.programmer ]
>
> Within 1-2 years, "mainstream" desktop PCs will probably come
> equipped with a "small" (32-128 GB) solid-state drive (SSD) for
> the operating system and applications, and (possibly) an additional,
> larger (500+ GB) hard-disk drive (HDD) for a user's media (mostly
> compressed audio and video).
>
> In the SSD+HDD scenario, I was wondering whether it would be
> "better" (for some metric) to have the OS swap to the SSD or to
> the HDD?

While at first blush it might seem attractive to page to
the fastest device available, the notion doesn't stand much
scrutiny. The only processes that should page at all are idle
or mostly idle, the kind of thing that sleeps for ten minutes
between bursts of activity. Shaving a few milliseconds off
the reawakening time will not improve your system's overall
performance much; you could probably get a bigger boost by
devoting fast storage to oft-used files.

Look at it another way: If you're paging enough to notice,
you should try not to page instead of trying to page faster.

> [...]
> In short, will widely-available SSDs require OS designers to make
> large changes, or is the current infrastructure generic enough?

Hierarchical file systems with notions of "nearer/faster"
and "further/slower" storage already exist, and I don't see any
obvious reason why their frameworks wouldn't handle SSD. YMMV.

Looking a little more widely, I suspect that flash storage
packaged in the form of a fast pseudo-disk will turn out to be
a temporary stopgap. The argument (not original with me) is that
the delays introduced by host bus adapters, controllers, interrupt
service, and so on are more significant for fast flash than for
slower mechanical disks. It doesn't matter (much) if all the
glue adds a millisecond to each use of a 50-100 IOPS device. But
when the device can do 8000-15000 IOPS, that millisecond becomes
a much larger portion of the total time -- and you can expect
designers to look for ways to shrink it. By this argument, the
"natural" place for flash is further from the CPU than RAM but
nearer than the I/O bus.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid
From: Scott Lurndal on
Stephen Fuld <SFuld(a)alumni.cmu.edu.invalid> writes:
>On 3/4/2010 10:35 AM, Scott Lurndal wrote:
>> Noob<root(a)127.0.0.1> writes:
>>> [ NB: X-posted tocomp.arch andcomp.unix.programmer ]
>>>
>>> Within 1-2 years, "mainstream" desktop PCs will probably come
>>> equipped with a "small" (32-128 GB) solid-state drive (SSD) for
>>> the operating system and applications, and (possibly) an additional,
>>> larger (500+ GB) hard-disk drive (HDD) for a user's media (mostly
>>> compressed audio and video).
>>>
>>> In the SSD+HDD scenario, I was wondering whether it would be
>>> "better" (for some metric) to have the OS swap to the SSD or to
>>> the HDD?
>>
>> It's "Better" (for all metrics) to not swap.
>>
>>>
>>> It might even make sense to be able to define a swap "hierarchy" as
>>> in e.g. 1 GB on the SSD as level 1 and 4 GB on the HDD as level 2?
>>>
>>> Supposing the apps reside on the SSD, and we define a large swap
>>> partition on the HDD, does it even make sense to page executable
>>> code to the HDD, instead of just discarding it, and loading it
>>> again from the SSD when needed?
>>
>> No. It never makes sense to page executable code.
>
>While, in general, I agree, there may be one exception. If the
>application executable resides on the HDD, it might make sense to swap,
>once, the pages to SSD, so you could do any future page-ins from the SSD
>instead of the HDD.
>
>
>> My current test system has 112 processors,
>
>112 is an interesting number. What drive that, as opposed to a more
>"divisible" number like 128, or even 96?

There are 16 nodes, each with 8 shanghai or 12 istanbul cores. One core
on each node is reserved for control purposes. That leaves 112 in the
shanghai case, 176 for instanbul systems.

scott