From: KJ on

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44dfa33e$0$8024$3b214f66(a)tunews.univie.ac.at...
> >
>>> You almost never want to have a fixed number of wait states but want to
>>> simply have the Avalon slave provide a wait request output and tell
>>> Avalon that by specifying that in the PTF file.
>>
>> Completely agree. When not writing and reading too many posts
>> I'm working on that version of the SRAM interface. It was just
>> a quick start as shown in the Quartus manual.
>
> BTW (to KJ): Do you have this type of Avalon slave
> for an SRAM? Would save some time and errors for me ;-)
>
No, over the past several years my use of async SRAMs has gone to 0 even
though I used to use them quite heavily. They've been replaced by internal
FPGA memory in the Stratix, Stratix II, Cyclone II parts. Any external
memory has tended to need to be much larger than async SRAM could affordably
provide so DDR has been used.

I'm assuming that you've checked and that Altera didn't toss one in as a
MegaCore? Too bad.

Oh well, I'll stop posting and let you get back to work.
KJ


From: Martin Schoeberl on
>> To avoid this little issue and the additional cycle I do usually (with
>> my SimpCon SRAM controller) clock the nwr with the inverted clock to
>> shift it after address setup.
> But now what about the trailing edge or write? The address could start changing and the write signal will still be active.

The nwe on the negative clock is set to '1' before address change.
That's the reason for using the negative clock - get nwr '1' again
without an additional cycle.

>> Not for the read (in my case) as I'm waiting for the read data in the
>> processor. In some cases I can hide the latency by execution of additional
>> code. However, in this case I need the data registerd in the slave. which
>> is again not possible....
>
> It is if you write some code for your component and use the 'readdatavalid' Avalon signal it will work. Once you have the address
> and command safely

Still not for my case. The slave data is valid only the single cycle
when readdatavaild is set. And that one is controled by the slave.
I cannot force the slave to hold the read data for the master valid
for several cycles.

> assert wait request at all. Since wait request is not asserted then the master device is free to go off and start up another
> transaction with any device (i.e. it has not been stalled).

My master, a processor, cannot issue just any other transaction when
a read is issued. I need a.) low latency on read and b.) pipelined
read for an efficient cache fill. That's it. No write transaction
during an outstanding read.

> Unfortunately if a second read is started (even if it not to the SRAM, even if it is to a device that has 0 wait state reads) that
> read will be greeted by a wait request because Avalon needs to insure that read data is supplied back in the order in which the
> master requested it. In order to do this it

Yes, for a more flexible system we would need out-of-order completion.
However, this is a completely different story.

Martin


From: Martin Schoeberl on
can't resist to answer ;-)

>>>> You can do it when your template 'controls' the master logic but not
>>>> the other way round.
>>>>
>>> Not sure what you mean by 'not the other way around'. This template is only for the master side control logic.
>>
>> Yes, but your trigger of the trasnaction 'within' your Avalon master
>> template. However, for me the Avalon interface is just an
>> interface. It has to react on the request from the CPU. And
>> the CPU requests the transaction from 'outside' of the
>> template/interface.
>>
> OK, lost again I think. Now it sounds like the CPU even though embedded within the FPGA doesn't have a native Avalon interface
> and you're talking about a bridge to get you from the CPU interface over to Avalon. Such a bridge though would typically not be
> terribly application specific but instead is tailored to the signals on the CPU and Avalon. Just like you can make a bridge
> between Wishbone and Avalon. If the CPU design is your homebrew though a simpler approach is to simply make it have an Avalon
> compatible interface. When you get to writing that code is where my template would be placed.

It is a general case - not about my homebrew CPU:

You have a component that (deeply inside not knowing it is connected
to Avalon) triggers a read request. With your template this trigger
gets registered when waitrequest is '0'. And this registering of
the read request adds one cycle latency.

Another point: If the waitrequest condition gets deeply embedded
in the component it would

BTW: for my CPU design - as Avalon is Altera specific I would
never make Avalon the native interface. JOP runns quite well on
Xilinx devices ;-)

Martin


From: Tommy Thorn on
I can only afford a short reply, but ...

Martin Schoeberl wrote:
>>> Another point is, in my opinion, the wrong role who has to hold data
>>> for more than one cycle. This is true for several busses (e.g. also
>>> Wishbone). For these busses the master has to hold address and write
>>> data till the slave is ready. This is a result from the backplane
>>> bus thinking. In an SoC the slave can easily register those signals
>>> when needed longer and the master can continue.
>> When happens then when you issue another request to a slave which hasn't finished processing the first? Any queue will be finite
>> and eventually you'd have to deal with stalling anyway. Any issue is that there are generally many more slaves than masters so it
>> makes sense to move the complication to the master.
>
> I disagree ;-)
> How hard is it for a slave to hold the read data more than one cycle?
> Until the next read data is requested and available? That comes almost
> for free. It's a single register, trivial logic. Ok, is a little overhead
> for an on-chip peripheral. However, you need usually a MUX in the
> peripheral for select the IO registers (now using register with a different
> meaning). Making this MUX registered is almost for free.

Focusing on the overhead for one slave supporting one outstanding
command is missing the point.

Non-trivial slaves can support multiple simultaneous outstanding
requests (say N), so they would need at least a queue N deep. Not a
problem. Now, I have multiple slaves and multiple masters on the
interconnect. Each master must be able to have at least M outstanding
requests. Any one slave can only accept one request pr cycle so the
interconnect (the arbitration) needs buffer the requests in lots of
FIFOs and _they_ add significant latency, logic, and complication (pick
two).

I actually love decoupled interfaces like these (and they are not a new
invention) as it removes the handshaking from the critical paths, but as
a general purpose interconnect fabric it just doesn't scale.


I'll need to study SimpCon more to understand what you mean by it's
support for multiple outstanding requests. Just to clarify, I'm talking
about completely independent requests, not bursts. Different masters may
issue multiple of these (up to some limit) while previously issued
requests are still not complete. I do insist the requests complete in
the order they were issued, mostly to simplify things (such as the
arbitration). Really just a subset of Avalon.

Tommy
From: KJ on

"Martin Schoeberl" <mschoebe(a)mail.tuwien.ac.at> wrote in message
news:44dfb517$0$12384$3b214f66(a)tunews.univie.ac.at...
> can't resist to answer ;-)
>
>>>
>> OK, lost again I think. Now it sounds like the CPU even though embedded
>> within the FPGA doesn't have a native Avalon interface and you're talking
>> about a bridge to get you from the CPU interface over to Avalon. Such a
>> bridge though would typically not be terribly application specific but
>> instead is tailored to the signals on the CPU and Avalon. Just like you
>> can make a bridge between Wishbone and Avalon. If the CPU design is your
>> homebrew though a simpler approach is to simply make it have an Avalon
>> compatible interface. When you get to writing that code is where my
>> template would be placed.
>
> It is a general case - not about my homebrew CPU:
>
> You have a component that (deeply inside not knowing it is connected
> to Avalon) triggers a read request. With your template this trigger
> gets registered when waitrequest is '0'. And this registering of
> the read request adds one cycle latency.
>
> Another point: If the waitrequest condition gets deeply embedded
> in the component it would
>
OK, for the homebrew (or anything where you own the 'master' side code) if
you want to bring out read/write and address combinatorially, the template
would then be

process(Put your signals here in place of clock)
begin
-- Not wanted for combinatorial version if rising_edge(Clock) then
if (Reset = '1') then
Read <= '0';
Write <= '0';
-- Address, Writedata, Next_State initializations go here also
-- Note: For the synchronous version of the template, address
-- and Writedata inits are not required since there is no Avalon
-- requirement for such. For the combinatorial version they
-- are needed either here or (more safely) outside the entire
-- if statement to provide a default to avoid latches.
elsif (WaitRequest = '0') then
-- Put your code here for whenever it is you want to read/write
-- When writing you would also set WriteData here
-- Also set Address of course
-- Next state of state machines (if any) would be put here also
else
-- Read, Write, Address and Writedata get set to hold their
-- current state. The registers for this would be in the
-- 'second' process of the two process approach.
-- If any state machines are in here than they would also have
-- the equivalent of 'Next_State <= Current_State here.
end if;
-- end if; Not wanted since this is no longer combinatorial
end process;

Besides the obligatory differences in the sensitivity list and 'if
rising_edge' all I've added is the 'else' branch to hold the current state
at whatever it happens to be at. This same code could also have been added
to the synchronous template version but would not be necessary. With the
combinatorial version it would be required of course to avoid latches.

Again using either template one can
- Be assured of meeting Avalon compatibility since it will be painfully
obvious that the Avalon signals will not change state on a clock cycle when
wait request has been set.
- The guts of the state machine (i.e. the section between 'elsif
(WaitRequest = '0') then' and the 'else' will not be littered with checks
about wait request with the potential that one check will be forgotten
leading to a possible Avalon incompatibility that may be difficult to debug
down to.
- The code for the guts of the state machine can concentrate on doing what
it needs to be doing which is reading/writing the Avalon bus upon requests
from the CPU side.

Again, these templates would be for master side code where the native
interface is Avalon.

> BTW: for my CPU design - as Avalon is Altera specific I would
> never make Avalon the native interface. JOP runns quite well on
> Xilinx devices ;-)

That being the case then the Altera specific code is performing the function
of a bridge between the CPU bus and the Avalon bus and would be segregated
as such. The ease/difficulty of that bridge design would then be a function
of how close/different those two busses are.

In any case, you certainly know your design far better than I and obviously
have been able to master Avalon enough to put together a working JOP design.

KJ


First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design