FPGA-based hardware accelerator for PC [FPGA]

Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?

From: Phil Tomson on 8 May 2006 20:41

In article <1146981253.226901.102660(a)i39g2000cwa.googlegroups.com>,
JJ <johnjakson(a)gmail.com> wrote:
>I always hated that the PCI cores were so heavily priced compared to
>the FPGA they might go into. The pricing seemed to reflect the value
>they once added to ASICs some 10 or 15 years ago and not the potential
>of really low cost low volume applications. A $100 FPGA in small vol
>applications doesn't support $20K IP for a few $ worth of fabric it
>uses. It might be a bargain compared to the cost of rolling your own
>though, just as buying an FPGA is a real bargain compared to rolling my
>own FPGA/ASIC too.

That's why OpenCores is so important. (http://opencores.org) As FPGAs
become cheaper we're going to need an open source ecosystem of cores.
They've got a PCI bridge design at Open cores, for example.

BTW: it would also be nice to have an open source ecosystem of FPGA
design tools... but that's a bit tougher at this point.

Phil

From: JJ on 9 May 2006 01:44

Phil Tomson wrote:
> In article <1146981253.226901.102660(a)i39g2000cwa.googlegroups.com>,
> JJ <johnjakson(a)gmail.com> wrote:
> >I always hated that the PCI cores were so heavily priced compared to
> >the FPGA they might go into. The pricing seemed to reflect the value
> >they once added to ASICs some 10 or 15 years ago and not the potential
> >of really low cost low volume applications. A $100 FPGA in small vol
> >applications doesn't support $20K IP for a few $ worth of fabric it
> >uses. It might be a bargain compared to the cost of rolling your own
> >though, just as buying an FPGA is a real bargain compared to rolling my
> >own FPGA/ASIC too.
>
> That's why OpenCores is so important. (http://opencores.org) As FPGAs
> become cheaper we're going to need an open source ecosystem of cores.
> They've got a PCI bridge design at Open cores, for example.
>
> BTW: it would also be nice to have an open source ecosystem of FPGA
> design tools... but that's a bit tougher at this point.
>
> Phil

Yes but open source and closed source are also like oil and water esp
together in a commercial environment. If I were doing commercial work I
doubt I'd ever use opencores but I might peek at it for an
understanding of how it might be done or ask someone else to. On a
hobbyist level, well I have mixed feelings about gpl too. I suspect the
software world does far better with it since enough people support the
gpl movement and there is a large user base for it. Hardware ultimately
can't be made for free so it can't be the same model.

John Jakson

From: JJ on 9 May 2006 02:14

Phil Tomson wrote:
> In article <1146975146.177800.163180(a)g10g2000cwb.googlegroups.com>,
> JJ <johnjakson(a)gmail.com> wrote:
> >

snipping

> >
> >FPGAs and standard cpus are bit like oil & water, don't mix very well,
> >very parallel or very sequential.
>
> Actually, that's what could make it the perfect marriage.
>
> General purpose CPUs for the things they're good at like data IO,
> displaying information, etc. FPGAs for applications where parallelism is
> key.
>

On c.a another Transputer fellow suggested the term "impedance
mismatch" to describe the idea of mixing low speed extreme parallel
logic with high speed sequencial cpus in regard to the Cray systems
that have a bunch of Virtex Pro parts with Opterons on the same board,
a rich mans version of DRC (but long before DRC). I suggest tweening
them, puts lots of softcore Transputer like nodes into FPGA and
customize them locally, you can put software & hardware much closer to
each other. One can even model the whole thing in a common language
designed to run as code or be synthesized as hardware with suitable
partitioning, starting perhaps with occam or Verilog+C. Write as
parallel and sequential code and later move parts around to hardware or
software as needs change.

> I think the big problem right now is conceptual: we've been living in a
> serial, Von Neumann world for so long we don't know how to make effective
> use of parallelism in writng code - we have a hard time picturing it.

I think the software guys have a huge problem with parallel, but not
the old schematic guys. I have more problems with serial, much of it
unnecessary but forced on us by lack of language features that forces
me to order statements that the OoO cpu will then try to unorder. Why
not let the language state "no order" or just plain "par" with no
communication between.

> Read some software engineering blogs:
> with the advent of things like multi-core processors, the Cell, etc. (and
> most of them are blissfully unaware of the existence of FPGAs) they're
> starting to wonder about how they are going to be able to model their
> problems to take advantage of that kind of paralellism. They're looking

The problem with the Cell and other multicore cpus, is that the cpu is
all messed up to start with, AFAIK the Transputer is the only credible
architecture that considers how to describe parallel processes and run
them based on formal techniques. These serial multi cpus have the
Memory Wall problem as well as no real support for concurrency except
at a very crude level, it needs to be closer to 100 instruction cycles
context switches to work well, not 1M. The Memory Wall only makes
threading much worse than it already was and adds more pressure to the
cache design as more thread contexts try to share it.

> for new abstractions (remember, software engineering [and even hardware
> engineering these days] is all about creating and managing abstractions).
> They're looking for and creating new languages (Erlang is often mentioned
> in these sorts of conversations). Funny thing is that it's the hardware
> engineers who hold part of the key: HDLs are very good at modelling
> parallelism and dataflow. Of course HDLs as they are now would be pretty
> crappy for building software, but it's pretty easy to see that some of the
> ideas inherant in HDLs could be usefully borrowed by software engineers.
>
>

Yeh, try taking your parallel expertise knowledge to the parallel
software world, they seem to scorn the idea that hardware guys might
actually know more than they do about concurrency while they happily
reinvent parallel languages that have some features we have had for
decades but still clinging to semaphores and spinlocks. I came across
one such parallel language from U T Austin that even had always,
initial and assign constructs but no mention of Verilog or hardware
HDLs.

But there are more serious researchers in Europe who are quite
comfortable with concurrency as parallel processes like hardware, from
the Transputer days based on CSP, see wotug.org. The Transputers
native language occam based on CSP later got used to do FPGA design
then modified into HandelC so clearly some people are happy to be in
the middle.

I have proposed taking a C++ subset and adding live signal ports to a
class definition as well as always, assign etc, starts to look alot
like Verilog subset but using C syntax but builds processes as
communicating objects (or modules instances) which are nestable of
course just like hardware. The runtime for it would look just like a
simulator with an event driven time wheel or scheduler. Of course in a
modern Transputer the even wheel or process scheduler is in the
hardware so it runs such a language quite naturally, well thats the
plan. Looking like Verilog means RTL type code could be "cleaned" and
synthesized with off the shelf tools rather than having to build that
as well and the language could be open. SystemVerilog is going in the
opposite direction.

snipping

> >That PCI bus is way to slow to be of much use except for problems that
> >do a lot of compute on relatively little data, but then you could use
> >distributed computing instead. PCIe will be better but then again you
> >have to deal with new PCIe interfaces or using a bridge chip if you are
> >building one.
>
> Certainly there are classes of problems which require very little data
> transfer between FPGA and CPU that could work acceptably even in a PCI
> environment.
>

The real money I think is in the problem space where the data rates are
enormous with modest processing between data points such as
bioinformatics. If you have lots of operations on little data, you can
do better with distributed computing and clusters.

> >

snipping

>
> One wonders how different history might be now if instead of the serial
> Von Neumann architectures (that are now ubiquitious) we would have instead
> started out with say, cellular automata-like architectures. CAs
> are one computing architecture that are perfectly suited for the
> parallelism of FPGAs. (there are others like neural nets and their
> derivatives). Our thinking is limited by our 'legos', is it not?
> If all you know is a general purpose serial CPU then everything starts
> looking very serial.
>

I was just reading up on the man, a far bigger "giant" in history than
the serial Von Neumann computer gives him credit for which I never knew
to my shame. The legacy stands because the WWII era didn't have too
many tubes to play with so serial was the only practical way.

> (if I recall correctly, before he died Von Neumann himself was looking
> into things like CAs and NNs because he wanted more of a parallel architecture)
>
> There are classes of biologicially inspired algorithms like GAs, ant
> colony optimization, particle swarm optimization, etc. which could greatly
> benefit from being mapped into FPGAs.
>
> Phil

Indeed

John Jakson
transputer guy

Transputers & FPGAs two sides of the same process coin

From: pbdelete on 9 May 2006 07:45

>> A while back, Toms Hardware did a comparison of 3GHz P4s v the P100 1st
>> pentium and all the in betweens and the plot was basically linear

>Interesting. In fact I don't care about P4, as its architecture is one
>big mistake, but linear speedup would be a shame for a Pentium 3...

What in particular do you think is wrong with the P4 ..?

From: JJ on 9 May 2006 11:31

pbdelete(a)spamnuke.ludd.luthdelete.se.invalid wrote:
> >> A while back, Toms Hardware did a comparison of 3GHz P4s v the P100 1st
> >> pentium and all the in betweens and the plot was basically linear
>
> >Interesting. In fact I don't care about P4, as its architecture is one
> >big mistake, but linear speedup would be a shame for a Pentium 3...
>
> What in particular do you think is wrong with the P4 ..?

Well how about power/heat or even cost v AMD, a constant issue on Intel
for the last few years.

It was the return to the P3 that allows them to move forward again with
the Centrino, then the DualCore, not sure what this new direction is
really all about though. But Netburst and maximum clock freq at any
cost for marketing sake is dead.

The only thing good about P4 I ever heard was in memory throughput
benchmarks and maybe the media codecs which makes sense for the deeper
pipelines that were used.

John Jakson
transputer guy

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: systemc
Next: sqrt(a^2 + b^2) in synthesizable VHDL?