From: rickman on
Sym,

You write without knowing anything about me. I have a very high rate
of success on the board because of the extensive simulations I run.
But you can never eliminate the need to probe real hardware.

As to the SI issues, sure, you can get very high speed signals out of
an FPGA. You can also drive static signals and everything in
between. Many of my designs work very well in QFP packages and have
no need for special SI approaches. When I am running a 32 MHz clock,
1 ns edge rates are not needed, so I slow them down to help prevent SI
problems.

Oh, yeah, I "do* have a logic analyzer and it comes in very useful. I
was able to use it recently to find a configuration problem where my
customer had missed an aspect of how to properly initialize the
digital PLL used in the FPGA. No amount of simulation would have
caught that!

Rick
From: rickman on
On Jun 1, 9:11 pm, -jg <jim.granvi...(a)gmail.com> wrote:
> On Jun 2, 7:25 am, rickman <gnu...(a)gmail.com> wrote:
>
>
>
> > The software would have about 100 ns from the last address "chunk"
> > being clocked, 60 ns from the command flag going high and much less
> > than 30 ns from the command being clocked to driving the first output
> > bit.  I doubt it can be done at 62.5 MIPs.
>
> What's the data rate ?
>
> We have done a number of systems, where a smallish CPLD takes the ns-
> level stuff, and dual-edge etc and converts it into a more micro-
> compatible form.
>
> Sometimes that has been parallel, and sometimes SPI/SSC

The strobe the clocks the data is 33 ns high and 67 ns low. So the
clock rate is 10 MHz with two data/addr bit input or 1 data bit output
on each strobe.

The problem with using a small CPLD is that the register set is up to
32, 8-bit registers. With a 100 ns address to output time, there is
little chance of the read being done unless a copy of all registers
exists in the CPLD. Also, some of the bits to be read are real time
status bits. If the processor can get an interrupt, read the address
and write the readback data to the CPLD, then it could work, but it
has to happen in 100 ns. If they had just used a standard SPI
interface it would have been a lot easier...

Rick
From: Marc Jet on
The description is somewhat vague about the timing. From what I
understand, the main problem is the short response latency. In fact
the problem sounds very much like a job for a small CPLD.

Your micro runs at 62.5 MIPS (16ns instruction cycle)? If it's a fast
small micro with no pipeline, then that's a good start.

Given that the protocol samples on the falling edge, does it offer you
valid inputs already at the falling edge too? If so, then you seem to
have <130ns from when the LSB of address is known to data out. And
you seem to have <190ns from when ADR<MSB to 2> is known. 190ns would
be 11 instructions, which looks okay.

In a 74xx type solution, I'd dedicate 4 micro outputs to offer all 4
possible data bits to a 74 MUX which uses the 2 address LSBs to select
the correct data bit. This relaxes the software latency constraint
considerable. I'd store the "memory content" in an array indexed by
ADR<MSB to 2> and store the 4 possible data LSBs in that byte
(correctly ordered for the output port). Then the software loop has
11 instructions to assemble ADR<MSB to 2>, do the 1 byte lookup, and
write it to the output port.

This solves only one detail of the problem. Maybe it inspires you to
find a solution for it all.

Best regards
From: rickman on
On Jun 2, 12:01 pm, Marc Jet <jetm...(a)hotmail.com> wrote:
> The description is somewhat vague about the timing.  From what I
> understand, the main problem is the short response latency. In fact
> the problem sounds very much like a job for a small CPLD.
>
> Your micro runs at 62.5 MIPS (16ns instruction cycle)?  If it's a fast
> small micro with no pipeline, then that's a good start.
>
> Given that the protocol samples on the falling edge, does it offer you
> valid inputs already at the falling edge too?  If so, then you seem to
> have <130ns from when the LSB of address is known to data out.  And
> you seem to have <190ns from when ADR<MSB to 2> is known.  190ns would
> be 11 instructions, which looks okay.
>
> In a 74xx type solution, I'd dedicate 4 micro outputs to offer all 4
> possible data bits to a 74 MUX which uses the 2 address LSBs to select
> the correct data bit.  This relaxes the software latency constraint
> considerable.  I'd store the "memory content" in an array indexed by
> ADR<MSB to 2> and store the 4 possible data LSBs in that byte
> (correctly ordered for the output port).  Then the software loop has
> 11 instructions to assemble ADR<MSB to 2>, do the 1 byte lookup, and
> write it to the output port.
>
> This solves only one detail of the problem.   Maybe it inspires you to
> find a solution for it all.
>
> Best regards

Thanks for the advice. That's an interesting approach. There is a
bit more to it than that, but it sounds potentially doable depending
on the instructions required. The protocol was never intended for
software, so no provision was made for the response time of software.
In fact, register 0 only uses a 4 bit address while the others can be
longer depending on the target. All this is pretty easy in hardware,
but not so much in software. So this is not the only issue.

The other issue is that the XMOS device is only an improvement over an
FPGA in that it can include a lot more logic in a small package. The
power consumption is higher if all eight threads are running. I don't
know what happens to power if only some are idling. Since you can't
slow the clock while any one thread has to run at high speed, I should
think that would limit the power reduction you can achieve.

Rick
From: -jg on
On Jun 3, 3:59 am, rickman <gnu...(a)gmail.com> wrote:
>
> The problem with using a small CPLD is that the register set is up to
> 32, 8-bit registers.  With a 100 ns address to output time, there is
> little chance of the read being done unless a copy of all registers
> exists in the CPLD.  Also, some of the bits to be read are real time
> status bits.  If the processor can get an interrupt, read the address
> and write the readback data to the CPLD, then it could work, but it
> has to happen in 100 ns.  If they had just used a standard SPI
> interface it would have been a lot easier...

If this interface is so incompatible with SPI that you need 32 bytes
of local memory, then you are bumped into the 'smallest CPLD with RAM'
territory, - and the choice there is not great. Maybe Actel or
SiliconBlue ?

I've hit this wall myself, and it raises a point:

Rather than the uC+CPLD the marketing types are chasing, I would find
a CPLD+RAM more useful, as there are LOTS of uC out there already, and
if they can make 32KB SRAM for sub $1, they should be able to include
it almost for free, in a medium CPLD.

-jg