From: Bill Todd on
Anton Ertl wrote:
> "kroger(a)princeton.edu" <kroger(a)princeton.edu> writes:
>> Though folks here might have a good handle on this. I've seen
>> conflicting reports that AMD's chip due this summer will exceed
>> current Intel chips. Are they talking about better performance CPU for
>> CPU, or just the aggregate performance of four cores?
>
> The stuff I have seen seemed to talk about aggregate performance
> (SpecFP-rate, and some TCP benchmark).
>
> BTW, I wonder why AMD is not pulling the same trick as Intel to get a
> quad-core in the mean-time: put two dual-cores in one package.

Possibly because it wouldn't be worth the effort (and any potential risk
of souring people on '4-core' AMD products before the real ones
appeared) - just to close a window that's scheduled to be only 6 months
wide. Intel, by contrast, plans to continue its multi-chip 'quad-core'
products through the next (45 nm.) generation, which makes its own
efforts in that area much more amortizable (as well as possibly being
easier to mate to its existing bus-oriented architecture, in contrast to
the asymmetry that David already mentioned for an Opteron implementation
of that ilk).

Finally, AMD arguably just doesn't need it: they've already got systems
that scale up as far as their current HT performance can take them (to 8
or 16 cores, depending upon the nature of the workload), and creating
such pseudo-quad-core beasts might not increase the total usable core
count at all (just reduce the socket count while increasing heat
dissipation challenges).

Intel scored a major win with Core2Duo, but Core2Quad (or whatever
they're calling it today) seems largely fanboy service (how many games
can actually *use* more than two cores to real advantage?). As Intel's
chipsets increase in bandwidth their 'quad cores' may become more
genuinely useful - just as Barcelona will when the next HT generation
debuts.

....

> Should not be more effort than the Athlon FX-72 nonsense,

I haven't looked at all closely, but had the impression that the FX-7x
setups were nothing more than normal two-socket Opteron systems - in
which case that effort was virtually nil.

- bill
From: Bill Todd on
Quadibloc wrote:
> Bill Todd wrote:
>> Finally, AMD arguably just doesn't need it: they've already got systems
>> that scale up as far as their current HT performance can take them (to 8
>> or 16 cores, depending upon the nature of the workload), and creating
>> such pseudo-quad-core beasts might not increase the total usable core
>> count at all (just reduce the socket count while increasing heat
>> dissipation challenges).
>
> That's not an argument - given Microsoft's licensing policies.

Correction: it's not an argument for people who run multi-threaded
Microsoft products on servers with more than two cores (I guess there
are some who do, but as the core count increases the number of
Microsoft-based servers plummets), and even then is mitigated by the
fact that for multiple reasons a top-of-the-line quad-core package
generates less (in some cases *far* less) than twice the performance of
two top-of-the-line dual-core packages.

- bill
From: Terje Mathisen on
Morten Reistad wrote:
> One example is codec transforms, mostly audio. These work in
> 20 millisecond samples with from 12 to 160 bytes per sample.
> The transformation code is in the low hundreds of K in size.
> Some must be performed in sequence, others do not have this
> restriction.
>
> It gets interesting when you are to transform thousands of streams.
>
> The data for several thousand streams plus the code will fit in
> cache.

That is nice. :-)

> Likewise, VPN streams are cpu-burners, even with well thought
> out stuff like AES.

Well, it shouldn't be (a cpu-burner)!

A 1996-era 200 Mhz PentiumPro could handle AES encryption/decryption for
a 100 Mbit/s full duplex link, which means that _very_ few current
servers need a single full core to handle all the available bandwidth
for VPN traffic. Multiple Gbit/s streams?

Terje
--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
From: Terje Mathisen on
Nick Maclaren wrote:
> In article <a66rqe.uro.ln(a)via.reistad.name>,
> Morten Reistad <first(a)last.name> writes:
> |> The data for a frame varies from 320 bytes in slin, to 160 in
> |> u/a law, to 80 in g726/adpcm down to 33 bytes in gsm and even
> |> less in g729. the "class 2" codecs needs the last frame and
> |> some digested information available. 500 bytes for the worst case
> |> transcoding data has room to spare.
>
> From the above, I would suggest employing someone like Terje (or even me,

Morten can't employ me, but I'd still love to have a quick look at what
his code is doing. From all his informed writings I really doubt it can
be nearly as bad as it seems. I.e. there _must_ be some really good
reasons for why it is taking a long time.

One such reason is encodings similar to CABAC (BluRay/HD-DVD) which
likes to generate several mostly-unpredictable branches per _bit_ of
decoded data.

Morten, my inbox is waiting!

> though I doubt I am available!) to look at the code, rather than spending
> time parallelising. 41 milliseconds to convert 160 bytes, especially as
> it is much less on the slower systems, smacks of seriously sub-optimal
> code.

Terje
--
- <Terje.Mathisen(a)hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
From: Morten Reistad on
In article <9mp6a4-jaq.ln1(a)osl016lin.hda.hydro.com>,
Terje Mathisen <terje.mathisen(a)hda.hydro.com> wrote:
>Morten Reistad wrote:
>> One example is codec transforms, mostly audio. These work in
>> 20 millisecond samples with from 12 to 160 bytes per sample.
>> The transformation code is in the low hundreds of K in size.
>> Some must be performed in sequence, others do not have this
>> restriction.
>>
>> It gets interesting when you are to transform thousands of streams.
>>
>> The data for several thousand streams plus the code will fit in
>> cache.
>
>That is nice. :-)
>
>> Likewise, VPN streams are cpu-burners, even with well thought
>> out stuff like AES.
>
>Well, it shouldn't be (a cpu-burner)!
>
>A 1996-era 200 Mhz PentiumPro could handle AES encryption/decryption for
>a 100 Mbit/s full duplex link, which means that _very_ few current
>servers need a single full core to handle all the available bandwidth
>for VPN traffic. Multiple Gbit/s streams?

And small packets.

Voip tends to generate streams of 100 pps per call, as calls
are quantized in 20ms intevals.

-- mrr