From: Moi on
On Mon, 05 Apr 2010 22:26:50 +0000, Nicolas George wrote:

> Moi wrote in message
> <455de$4bba44e4$5350c024$9115(a)cache110.multikabel.net>:
>> I've one done a similar thing for a disk-exerciser: grab each diskblok
>> exactly once,
>
> Grab each block exactly once: you said it. Peter's method does not do
> that, it probably only accesses a very small fraction of the blocks.

This is not that important.
You only need to reference one byte to cause the the whole slot to
be drawn in.
Subsequent references for addresses in the same slot virtually
come for free.
Ergo: the best way to measure is by grabbing (and writing) small
chunks of memory.
(you could do the same for disk caching by referencing/dirtying
only one byte per page)


>> but in pseudo-random order, and sufficiently distant to avoid
>> read-ahead to be effective.
>
> Such things make sense for disks because disks require mechanical
> movements and thus are much more efficient for contiguous reads. That is
> why filesystems drivers try to avoid fragmentation and implement
> readahead.

My example was not about disks, but about avoiding short cycles (loops)
in permutations.

>
> And for the same reason, software and file formats try to make reads as
> sequential as possible (for example interleaving audio and video data in
> multimedia container formats). Which in turn makes readahead useful even
> for flash-based devices.
>
> RAM does not have mechanical parts, and no bonus for sequential reads.
> Furthermore, the read patterns are typically much more random: access a
> pointer, then another, read until a 0 byte, access another pointer, etc.
> The RAM cache system does not try to do simple readahead for that;
> instead, it follows the flow of the program and analyzes and predicts
> the memory accesses.
>
> Therefore, if the algorithm does not achieve cache locality at the scale
> of a cache line (typically 32 to 128 octets), then the distance between
> items is irrelevant.

That is all correct.

Apart from sizes and parameters, the only differences between
memory and disk wrt caching are:

1) disks (sometimes) favor sequential reads and memory does not.

2) for memory, all cache misses are created equal.

3) for memory, the mapping to/from cache slots is mostly fixed,
causing multiple stretches of addresses to compete for the same
cache slot.

HTH,
AvK


From: Scott Lurndal on
Moi <root(a)invalid.address.org> writes:

>That is all correct.
>
>Apart from sizes and parameters, the only differences between
>memory and disk wrt caching are:
>
>1) disks (sometimes) favor sequential reads and memory does not.

Memory (sometimes) favors sequential (or striding) accesses. Modern
memory controllers will prefetch cache lines if stride accesses
are detected.

>
>2) for memory, all cache misses are created equal.

Definitely incorrect. A cache miss that hits on the TLB
is one thing, a cache miss that misses the TLB, but hits the
page table cache is another thing, and a cache miss that misses
the TLB and misses the page table cache is yet a third thing.

Each of the above has different latency characteristics. A
worst case TLB fill requires 22 (yes, twenty two!) memory accesses
to resolve (when using SVM and nested page tables, with 4k mappings
in all levels). When not using virtualization, a TLB fill can
still cost 3 or 4 memory accesses (one for each level of the page table
heirarchy), whereas a simple cache line fill is a single memory
access (possibly two to handle the eviction).
>
>3) for memory, the mapping to/from cache slots is mostly fixed,
> causing multiple stretches of addresses to compete for the same
> cache slot.

Actually, an address will compete for one of N ways at a particular
cache slot, typical caches are organized with 4 or 8 ways.

scott
From: Ian Collins on
On 04/ 6/10 09:27 AM, Peter Olcott wrote:

I know you are using an awful client, but please fix your quoting!

> I was going to do the thread priority thing very simply.
> Only one thread can run at a time, and the purpose of the
> multiple threads was to make fast context switching using
> thread local data.

How do you intend doing that? it looks like an attempt at premature
micro-optimisation.

> As soon as a higher priority job arrives
> activity in the current lower priority thread is suspended.

You should head David's sage advice! This may well work, but using
multiple thread priorities more often ends in tears.

--
Ian Collins
From: Peter Olcott on

"Ian Collins" <ian-news(a)hotmail.com> wrote in message
news:81vfqpFa9rU3(a)mid.individual.net...
> On 04/ 6/10 09:27 AM, Peter Olcott wrote:
>
> I know you are using an awful client, but please fix your
> quoting!
>
>> I was going to do the thread priority thing very simply.
>> Only one thread can run at a time, and the purpose of the
>> multiple threads was to make fast context switching using
>> thread local data.
>
> How do you intend doing that? it looks like an attempt at
> premature micro-optimisation.
>
>> As soon as a higher priority job arrives
>> activity in the current lower priority thread is
>> suspended.
>
> You should head David's sage advice! This may well work,
> but using multiple thread priorities more often ends in
> tears.
>
> --
> Ian Collins

I have to have some way of implementing multiple priorities.
I am aiming for 500 ms response time for very small
transactions from paying customers. I already know that
there are many things that are out of my control that could
easily screw this up. One thing that is not out of my
control is how I handle differing priorities.

If one of these high priority jobs takes at most 100 ms of
processing time (I am allowing 400 ms for HTTP transport)
Then a 3.5 minute job must be immediately put on hold. The
complex part that David is talking about is having multiple
concurrent threads with differing priority. I can't imagine
how priority inversion could occur, but, then actually
working with this stuff is new to me.

What I am proposing is (much simpler) multiple threads where
the higher priority one halts the next lower priority one,
and there is never a time when more that one of these
threads is running concurrently with the other threads. I
don't see any pitfalls with this, but, then I have never yet
done this before.

David's proposal will not work because the 3.5 minute thread
would have to finish before the 50 ms thread could begin.
This would provide intolerably long response time for the 50
ms thread.


From: David Schwartz on
On Apr 5, 6:49 pm, "Peter Olcott" <NoS...(a)OCR4Screen.com> wrote:

> If one of these high priority jobs takes at most 100 ms of
> processing time (I am allowing 400 ms for HTTP transport)
> Then a 3.5 minute job must be immediately put on hold. The
> complex part that David is talking about is having multiple
> concurrent threads with differing priority. I can't imagine
> how priority inversion could occur, but, then actually
> working with this stuff is new to me.

Like this:

1) Low priority job is running.

2) High priority job starts, pre-empts low priority job.

3) High priority job needs a lock low priority job was holding when it
was pre-empted.

4) High priority job cannot run now until low priority job gets the
CPU, which it won't do for quite some time because higher-priority
tasks are running.

The point is, creating threads with low priority can actually cause
high priority threads to act as if their priority was lowered. This
can result in worse performance, even for the high priority tasks,
than if you didn't mess with thread priorities.

> David's proposal will not work because the 3.5 minute thread
> would have to finish before the 50 ms thread could begin.
> This would provide intolerably long response time for the 50
> ms thread.

Only if you design is such that the 3.5 minute thread would have to
finish before the 50 ms thread began. Forget about which threads run
-- make sure threads are only doing the work *you* want done (or, at
worst, don't do work that's harmful to the forward progress you need
to make). Then it won't matter much what the scheduler does.

DS