From: General Schvantzkoph on
On Sun, 07 Mar 2010 06:11:07 -0800, Michael S wrote:

> On Mar 7, 12:25 pm, John Adair <g...(a)enterpoint.co.uk> wrote:
>> If you can get it the T9900 is better than T9800 but they are fairly
>> rare with most companies seem to push the quad core instead.
>>
>> I have not got a mobile I7 yet but we do have desktop I7 and they have
>> been very good.
>
> Sure, desktop I7 are fast. With 8MB of cache and not so heavy reliance
> on turbo-boost one can expect them being fast. On the other hand, 35W
> mobile variants have 4MB or smaller cache and are critically dependent
> on turbo-boost, since relatively to mobile C2D their "normal" clock
> frequency is slow. Still, it just my guts feeling, I never benchmarked
> mobile i7 vs mobile C2D, so I could be wrong about their relative
> merits.
>
>> Laptops using the desktop I7 have been a definate no on battery
>> lifetime of 1hr being typical but when I get the chance I will try the
>> mobile I7 as it promises much. Parallel processors will be more use in
>> a couple of years when tools have better use of them.
>>
>> On OS I think there are X64 drivers but I would only go that way if I
>> had a really large design to deal with. Bugs and problems are far more
>> common in X64 and Linux versions of the tools and with the relatively
>> tiny user base bugs can take a while to surface and dare I say it get
>> fixed. Life is busy enough without adding unnecessary problems.
>>
>> John Adair
>> Enterpoint Ltd.
>>
>>
> For the last year or so we do nearly all our FPGA development on
> Ws2003/x64. So far, no problems. Even officially deprecated Rainbow (now
> SafeNet) USB Software Guards work fine. XP64 is derived from the same
> code base.
> We almost never use 64-bit tools, but very much appreciate the ability
> to launch numerous instances of memory-hungry 32-bit tools. More a
> matter of convenience than necessity? In single-user environment, yes.
> But why should we give up convenience that costs so little?

I have benchmarked Core2s vs iCore7s. 6M Cache iCore2s are faster on a
clock for clock basis then the 8M Cache iCore7 when running NCVerilog.
The iCore7 is a little faster on a clock for clock basis when running
Xilinx place and route tools. The cache architecture of the iCore7 sucks,
it's a three level cache vs a two level cache on the Core2. Also there is
less cache per processor on the iCore7 (2M) then the Core2 (3M) so the
degradation in performance is greater. Finally the absolute clock rate
for the Core2s is higher then it is for the iCore7, combine that with the
faster clock for clock simulation performance and the Core2 is the clear
winner for FPGA development.

From: Michael S on
On Mar 7, 4:28 pm, General Schvantzkoph <schvantzk...(a)yahoo.com>
wrote:
> On Sun, 07 Mar 2010 06:11:07 -0800, Michael S wrote:
> > On Mar 7, 12:25 pm, John Adair <g...(a)enterpoint.co.uk> wrote:
> >> If you can get it the T9900 is better than T9800 but they are fairly
> >> rare with most companies seem to push the quad core instead.
>
> >> I have not got a mobile I7 yet but we do have desktop I7 and they have
> >> been very good.
>
> > Sure, desktop I7 are fast. With 8MB of cache and not so heavy reliance
> > on turbo-boost one can expect them being fast. On the other hand, 35W
> > mobile variants have 4MB or smaller cache and are critically dependent
> > on turbo-boost, since relatively to mobile C2D their "normal" clock
> > frequency is slow. Still, it just my guts feeling, I never benchmarked
> > mobile i7 vs mobile C2D, so I could be wrong about their relative
> > merits.
>
> >> Laptops using the desktop I7 have been a definate no on battery
> >> lifetime of 1hr being typical but when I get the chance I will try the
> >> mobile I7 as it promises much. Parallel processors will be more use in
> >> a couple of years when tools have better use of them.
>
> >> On OS I think there are X64 drivers but I would only go that way if I
> >> had a really large design to deal with. Bugs and problems are far more
> >> common in X64 and Linux versions of the tools and with the relatively
> >> tiny user base bugs can take a while to surface and dare I say it get
> >> fixed. Life is busy enough without adding unnecessary problems.
>
> >> John Adair
> >> Enterpoint Ltd.
>
> > For the last year or so we do nearly all our FPGA development on
> > Ws2003/x64. So far, no problems. Even officially deprecated Rainbow (now
> > SafeNet) USB Software Guards work fine. XP64 is derived from the same
> > code base.
> > We almost never use 64-bit tools, but very much appreciate the ability
> > to launch numerous instances of memory-hungry 32-bit tools. More a
> > matter of convenience than necessity? In single-user environment, yes.
> > But why should we give up convenience that costs so little?
>
> I have benchmarked Core2s vs iCore7s. 6M Cache iCore2s are faster on a
> clock for clock basis then the 8M Cache iCore7 when running NCVerilog.
> The iCore7 is a little faster on a clock for clock basis when running
> Xilinx place and route tools.

My experience with Altera tools (synthesis and p&r, never benchmarked
a simulation) is quite different.
i7-920 (8 MB/2.66 GHz) is like 1.5 times faster than E6750 (4 MB/2.66
GHz) and only marginally slower than E8400 (6MB/3.00 GHz). Taking into
account that the fastest non-extraordinary-expensive i7 variant
(i7-960, 3.2 GHz) runs at almost the same frequency as the fastest C2D
(E8600, 3.33 GHz) I'd say that in absolute terms core-i7 is faster.

> The cache architecture of the iCore7 sucks,
> it's a three level cache vs a two level cache on the Core2.

Nehalem's L2 cache is much smaller, yes, but 1.5 times faster. Seem
like a fair trade-off.
Slower L1D cache (4 clocks instead of 3 clocks in C2D) sounds like a
bigger problem.

> Also there is
> less cache per processor on the iCore7 (2M) then the Core2 (3M) so the
> degradation in performance is greater.

Only when running multiple threads.
But we are talking about FPGA development, that is still mostly single-
threaded. Core-i7 has all 8MB available for a single thread. In C2D/
C2Q a single core has access to 6 MB.


>Finally the absolute clock rate
> for the Core2s is higher then it is for the iCore7, combine that with the
> faster clock for clock simulation performance and the Core2 is the clear
> winner for FPGA development.

Only when measured by price/performance.
In absolute sense i7-960 (or i7-975 for rich kids among us) should be
faster.
And, of course, in multi-user environment core-i7 (or, for bigger
shops, Xeon-55xx) wins by very wide margin.

Don't take me wrong, I'd very much like the CPU that combines cache
hierarchy of C2D with IMC, turbo-boost and fast unaligned access of
core-i7, but that's not going to happen. With AMD as weak as it is
right now we have no choice but to grab the whole packet that Intel
wants to sell us. And as a packet core-i7 is not bad, especially for
multi-user.



From: General Schvantzkoph on
On Sun, 07 Mar 2010 07:41:41 -0800, Michael S wrote:

> On Mar 7, 4:28 pm, General Schvantzkoph <schvantzk...(a)yahoo.com> wrote:
>> On Sun, 07 Mar 2010 06:11:07 -0800, Michael S wrote:
>> > On Mar 7, 12:25 pm, John Adair <g...(a)enterpoint.co.uk> wrote:
>> >> If you can get it the T9900 is better than T9800 but they are fairly
>> >> rare with most companies seem to push the quad core instead.
>>
>> >> I have not got a mobile I7 yet but we do have desktop I7 and they
>> >> have been very good.
>>
>> > Sure, desktop I7 are fast. With 8MB of cache and not so heavy
>> > reliance on turbo-boost one can expect them being fast. On the other
>> > hand, 35W mobile variants have 4MB or smaller cache and are
>> > critically dependent on turbo-boost, since relatively to mobile C2D
>> > their "normal" clock frequency is slow. Still, it just my guts
>> > feeling, I never benchmarked mobile i7 vs mobile C2D, so I could be
>> > wrong about their relative merits.
>>
>> >> Laptops using the desktop I7 have been a definate no on battery
>> >> lifetime of 1hr being typical but when I get the chance I will try
>> >> the mobile I7 as it promises much. Parallel processors will be more
>> >> use in a couple of years when tools have better use of them.
>>
>> >> On OS I think there are X64 drivers but I would only go that way if
>> >> I had a really large design to deal with. Bugs and problems are far
>> >> more common in X64 and Linux versions of the tools and with the
>> >> relatively tiny user base bugs can take a while to surface and dare
>> >> I say it get fixed. Life is busy enough without adding unnecessary
>> >> problems.
>>
>> >> John Adair
>> >> Enterpoint Ltd.
>>
>> > For the last year or so we do nearly all our FPGA development on
>> > Ws2003/x64. So far, no problems. Even officially deprecated Rainbow
>> > (now SafeNet) USB Software Guards work fine. XP64 is derived from
>> > the same code base.
>> > We almost never use 64-bit tools, but very much appreciate the
>> > ability to launch numerous instances of memory-hungry 32-bit tools.
>> > More a matter of convenience than necessity? In single-user
>> > environment, yes. But why should we give up convenience that costs so
>> > little?
>>
>> I have benchmarked Core2s vs iCore7s. 6M Cache iCore2s are faster on a
>> clock for clock basis then the 8M Cache iCore7 when running NCVerilog.
>> The iCore7 is a little faster on a clock for clock basis when running
>> Xilinx place and route tools.
>
> My experience with Altera tools (synthesis and p&r, never benchmarked a
> simulation) is quite different.
> i7-920 (8 MB/2.66 GHz) is like 1.5 times faster than E6750 (4 MB/2.66
> GHz) and only marginally slower than E8400 (6MB/3.00 GHz). Taking into
> account that the fastest non-extraordinary-expensive i7 variant (i7-960,
> 3.2 GHz) runs at almost the same frequency as the fastest C2D (E8600,
> 3.33 GHz) I'd say that in absolute terms core-i7 is faster.
>
>> The cache architecture of the iCore7 sucks, it's a three level cache vs
>> a two level cache on the Core2.
>
> Nehalem's L2 cache is much smaller, yes, but 1.5 times faster. Seem
> like a fair trade-off.
> Slower L1D cache (4 clocks instead of 3 clocks in C2D) sounds like a
> bigger problem.
>
>> Also there is
>> less cache per processor on the iCore7 (2M) then the Core2 (3M) so the
>> degradation in performance is greater.
>
> Only when running multiple threads.
> But we are talking about FPGA development, that is still mostly single-
> threaded. Core-i7 has all 8MB available for a single thread. In C2D/ C2Q
> a single core has access to 6 MB.
>
>
>>Finally the absolute clock rate
>> for the Core2s is higher then it is for the iCore7, combine that with
>> the faster clock for clock simulation performance and the Core2 is the
>> clear winner for FPGA development.
>
> Only when measured by price/performance. In absolute sense i7-960 (or
> i7-975 for rich kids among us) should be faster.
> And, of course, in multi-user environment core-i7 (or, for bigger shops,
> Xeon-55xx) wins by very wide margin.
>
> Don't take me wrong, I'd very much like the CPU that combines cache
> hierarchy of C2D with IMC, turbo-boost and fast unaligned access of
> core-i7, but that's not going to happen. With AMD as weak as it is right
> now we have no choice but to grab the whole packet that Intel wants to
> sell us. And as a packet core-i7 is not bad, especially for multi-user.

Simulation performance is much more important than place and route
performance. The iCore7 is a little faster then the Core2 when doing
place and routes, but the 6M Core2 wins hands down when doing simulations
(4M Core2s are much slower then 6M Core2s). I spend 100X as much time
doing simulations as I spend doing place and routes so the small
advantage that iCore7s have doing synthesis/place and routes is dwarfed
by the simulation advantage that the Core2 has.