Cache line list handling [Computer Architecture]

Prev: 2nd call - Applied Computing 2010: until 26 July 2010
Next: Opcode Parsing & Invalid Opcodes

From: Terje Mathisen "terje.mathisen at on 8 Jul 2010 10:50

jacko wrote:
> On 7 July, 03:32, MitchAlsup<MitchAl...(a)aol.com> wrote:
>> Why don't you code up several examples, craft interesting datasets and
>> benchmarks using same, and come back to us with your conclusions?
>>
>> Mitch
>
> Not today, I'm frying multi precision integer arithmetic, and possible

Extended precision is fun, I took a few days to write a 128-bit library
back in 94/95, in order to verify our SW workaround for the FDIV (and
FPATAN) bug.

Arbitrary precision is both simpler and harder, particularly if you know
you have to beat GMP at its own game. :-)

> virtal machine language specific opcodes.

Opcodes doing what?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: jacko on 8 Jul 2010 11:16

> Opcodes doing what?

Well I'm making a simple VM. Assuming that opcodes for VMs fall into
three categories.

1) Foundational - definatly needed.
2) Structural - help with certain common structure implementation, but
not needed, but are included.
3) Optimizable - not needed, and could be auto generated by a JIT
compilier, and so are not included.

Examples are 1) add_double, 2) add_to_return_address 3) increment.

It is somewhat a fuzzy boundry between type 2 and 3.

I am just thinking what classes of 1 and 2 could be useful supporting
some languages other than the default for the VM.

Cheers Jacko

http://acebforth.googlecode.com - not yet complete

From: Terje Mathisen "terje.mathisen at on 8 Jul 2010 11:46

Brett Davis wrote:
> In article<3l9eg7-j98.ln1(a)laptop.reistad.name>,
> Morten Reistad<first(a)last.name> wrote:
>> Seems to give large speedups to some fundamental algorithms.
>
> "You are Doing It Wrong"
> http://queue.acm.org/detail.cfm?id=1814327

Nice article, and "N times faster than squid" will probably be useful to
me quite soon.

>
> Ten times faster than B-Tree with B-Heap.

The problem here is that phk had two sorts of cached data: Web articles
with images and the index, and they were both allowed to fight for the
same limited resource (real DRAM pages).

It seems to me that you should start by making sure that the index is
always in RAM, then you allow the web data to fight for the space that
remains.

It still makes sense to pack the index so that all the higher levels
will fit in the lower cache levels.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

From: Morten Reistad on 13 Jul 2010 17:46

In article <4f8hg7-mcg2.ln1(a)ntp.tmsw.no>,
Terje Mathisen <"terje.mathisen at tmsw.no"> wrote:
>Brett Davis wrote:
>> In article<3l9eg7-j98.ln1(a)laptop.reistad.name>,
>> Morten Reistad<first(a)last.name> wrote:
>>> Seems to give large speedups to some fundamental algorithms.
>>
>> "You are Doing It Wrong"
>> http://queue.acm.org/detail.cfm?id=1814327
>
>Nice article, and "N times faster than squid" will probably be useful to
>me quite soon.

The trick is to be choosy about which locality is important. Suddenly
a vertical vs horisontal organisation of b-trees becomes very important
for perfomance.

>> Ten times faster than B-Tree with B-Heap.
>
>The problem here is that phk had two sorts of cached data: Web articles
>with images and the index, and they were both allowed to fight for the
>same limited resource (real DRAM pages).
>
>It seems to me that you should start by making sure that the index is
>always in RAM, then you allow the web data to fight for the space that
>remains.
>
>It still makes sense to pack the index so that all the higher levels
>will fit in the lower cache levels.

This isn't so much about RAM, it is about cache. We see this effect
creep in all over the place now.

>Terje
>--
>- <Terje.Mathisen at tmsw.no>
>"almost all programming can be viewed as an exercise in caching"

And the actual cache effects can be pretty surprising at times.

Watch an 8-processor(@2.4Ghz) HP Xeon with 4 gb ram outperform a
36-processor (@2.6GHz) (*) Sun with 40G ram by a factor of 6.

It was all in the L2/L3 cache design.

-- mrr

(*) The Linux kernel we tested on only saw 32 processors.

From: Andy Glew on 13 Jul 2010 22:10

On 7/13/2010 2:46 PM, Morten Reistad wrote:
> In article<4f8hg7-mcg2.ln1(a)ntp.tmsw.no>,
> Terje Mathisen<"terje.mathisen at tmsw.no"> wrote:
>> Brett Davis wrote:
>>> In article<3l9eg7-j98.ln1(a)laptop.reistad.name>,
>>> Morten Reistad<first(a)last.name> wrote:
>>>> Seems to give large speedups to some fundamental algorithms.
>>>
>>> "You are Doing It Wrong"
>>> http://queue.acm.org/detail.cfm?id=1814327
>>
> The trick is to be choosy about which locality is important. Suddenly
> a vertical vs horisontal organisation of b-trees becomes very important
> for perfomance.
>
>>> Ten times faster than B-Tree with B-Heap.
>>
>> The problem here is that phk had two sorts of cached data: Web articles
>> with images and the index, and they were both allowed to fight for the
>> same limited resource (real DRAM pages).
>
> This isn't so much about RAM, it is about cache. We see this effect
> creep in all over the place now.

I could not help but wonder if the B-Heap speedups might also be in some
part due to TLB locality.

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7
Prev: 2nd call - Applied Computing 2010: until 26 July 2010
Next: Opcode Parsing & Invalid Opcodes