From: Tommy Thorn on
Martin Schoeberl wrote:
.... snip
>>> Or does the Avalon switch fabric, when registered, take this
>>> information into account for the waitrequest of the master?
>> It does.
>
> That's a reason to go with fix wait states!
>
> Or a bus specification that counts down the number of
> wait states ;-)

The problem with both is that it would impose undesirable constraints on
slaves. Imagine the case of a good SDRAM controller which tracks open
pages. Read latency depends on what it's currently doing (idle,
refreshing, busy with another request, etc) and whether the access hit
on open page. In this case it _could_ perhaps give an early
"read-data-in-2-cycles" strobe, but dealing with that would be a nasty
complication for the interconnect structure that has to be able to
handle any combination of multi-masters and random assortment of slaves.

It is feasible for internal protocols, tailor-made for the parts it
connects, but IMO not for a generalist interconnect like Avalon.

FWIW, I used to do exactly what you suggest (latch and mux), but it
didn't scale for me as the interconnection grew and was expensive in
both resources and cycle time (not to mention the added complication and
opportunity for bugs). Adopting the Avalon style template (which is
obvious IMnsHO) freed so many resources (mine and the FPGAs), enabling
performance improvements elsewhere. Having spent way too much time and
energy on exactly this point, I'd be very interested in others insights.

> BTW: Did you take a look into the SimpCon idea?

I just did, after writing the above :-), but my point still stands.
AFAICT, SimpCon doesn't support multiple outstanding requests from any
one master and therefore doesn't distinguish between being able to
accept a new request and and being done with a request.

I'm sure SimpCon works for you, but for me it would be not much better
than Wishbone.

> Dreaming a little bit: Would be cool to write an
> open-source system generator (like SOPC builder) for
> it. Including your suggestion of an open and documented
> specification file format.

Sure, though I suspect writing all this would be trivial compared to
coming to an agreement on how the interface should work :-)

>>> Again, one more cycle latency ;-)
>> Again, nope not if done correctly.
>
> I think we finally agreed, did we?

I'm not KJ, but I think the issue is clear. Following the template
would incur an extra cycle for you (assuming JOP couldn't be fixed in a
different way). It definitely possible to do what you suggest and avoid
that cycle at the expense of extra overhead and logic (the glitch
doesn't matter for internal deployment).

Tommy
From: Martin Schoeberl on
>> Or a bus specification that counts down the number of
>> wait states ;-)
>
> The problem with both is that it would impose undesirable constraints on slaves. Imagine the case of a good SDRAM controller which
> tracks open pages. Read latency depends on what it's currently doing (idle, refreshing, busy with another request, etc) and
> whether the access hit on open page. In this case it _could_ perhaps give an early "read-data-in-2-cycles" strobe, but dealing
> with that would be a nasty complication for the interconnect structure that has to be able to handle any combination of
> multi-masters and random assortment of slaves.

If the interconnction of multiple masters gets too complicated
the information can still be ignored. Than it's the same as with
a single ready signal. However, there is a benefit for single
master accessing a varibale latency slave (your SDRAM example).

> FWIW, I used to do exactly what you suggest (latch and mux), but it didn't scale for me as the interconnection grew and was
> expensive in both resources and cycle time (not to mention the added complication and

one more argument in a single cycle command with register in
the slave ;-)

>> BTW: Did you take a look into the SimpCon idea?
>
> I just did, after writing the above :-), but my point still stands. AFAICT, SimpCon doesn't support multiple outstanding requests
> from any one master and therefore doesn't distinguish between being able to accept a new request and and being done with a
> request.

It can do the same pipelining requests as Avalon. With a small
difference. Take the SDRAM as example: It has a long initial
latency and then incremental words can follow fast. With Avalon
you can issue several (depends on the slave pipeline) requests
and than you have to wait. In SimpCon you issue the first request
then you have to wait for the latency and the following requests
are faster. For this example it's just different when the latency
wait occours: at the beginning (SimpCon) or later. The first
word latency is the same and the following reads are the same.

> I'm sure SimpCon works for you, but for me it would be not much better than Wishbone.

No, both Avalon and SimpCon can handle pipelined (in order)
requests. Wishbone not. Out-of-order compeltion of requests
is a complete different story. I don't see usage for this
at the moment for a 'normal' CPU master.

>
>> Dreaming a little bit: Would be cool to write an
>> open-source system generator (like SOPC builder) for
>> it. Including your suggestion of an open and documented
>> specification file format.
>
> Sure, though I suspect writing all this would be trivial compared to coming to an agreement on how the interface should work :-)

I agree! As Tanenbaum said, "The nice thing about standards
is that there are so many to choose from".

Martin


From: KJ on

>
>>> Yes, but e.g. for an SRAM interface there are some timings in ns. And
>>> it's not that clear how this translates to wait states.
>>
>> Since Avalon is not directly compatible the typical SRAMs, this implies
>> that
>
> Again disagree ;-) The Avalon specification also covers asynchronous
> peripherals. That's adds to a little bit to the complexity of the
> specification.
>
No, the Avalon specification is completely synchronous and is not directly
compatible with any of the garden variety asynchronous SRAMs that I'm aware
of. Just because you can add a controller that is compatible with the
synchronous Avalon bus on one side and an asynchronous SRAM on the other
does not imply anything at all about Avalon, it just says that one can make
a controller that will provide that interface. The same can be said about
interfacing Avalon to any other peripheral too so there is nothing special
about SRAMs and Avalon.

>> Assuming for the moment, that you wanted to write the code for such a
>> component, one would likely define that the component to have the
>> following:
>> - A set of Avalon bus signals
>> - SRAM Signals that are defined as Avalon 'external' (i.e. they will get
>> exported to the top level) so that they can be brought out of the FPGA.
>> - Generic parameters so that the actual design code does not need to hard
>> code any of the specific SRAM timing requirements.
>
> Yes, that's the way it is described in the Quartus manual. I did my
> SRAM interface in this way. Here is a part of the .ptf that describes
> the timing of the external SRAM:
>
> SLAVE sram_tristate_slave
> {
> SYSTEM_BUILDER_INFO
> {
> ....
> Setup_Time = "0ns";
> Hold_Time = "2ns";
> Read_Wait_States = "18ns";
> Write_Wait_States = "10ns";
> Read_Latency = "0";
> ....
>
>> Given that, the VHDL code inside the SRAM controller would set it's
>> Avalon side wait request high as appropriate while it physically performs
>> the
>
> There is no VHDL code associated with this SRAM. All is done by the
> SOPC builder.
>
Are we talking about interfacing with a synchronous SRAM or an async SRAM?
If it's a synchronous SRAM than I agree the Avalon signal set is likely
compatible but if it's the garden variety async SRAM where timings are
measured relative to edges of WR and RD than what you have won't work
reliably in the real world.

>> read/write to the external SRAM. The number of wait states would be
>> roughly equal to the SRAM cycle time divided by the Avalon clock cycle
>> time.
>
> The SOPC builder will translate the timing from ns to clock cycles for
> me. However, this is a kind of iterative process as the timing of the
> component depends on tco and tsu of the FPGA pins of the compiled design.
> Input pin th can usually be ignored as it is covered by the minimum tco
> of the output pins. The same is true for the SRAM write th.
>
The above is again making me think that we're talking about interfacing to
an async SRAM. If that's the case, then from your description it sounds
like Avalon address/read/write basically become the corresponding signals on
the SRAM. If that's the case, then how are you guaranteeing that the
address is stable at the SRAM prior to write being asserted and after it has
been de-asserted. The way the Avalon address and write signals work they
will both be transitioning some Tco after the rising edge of the clock.
There is absolutely no guarantee of any timing relationship between address
and write on the Avalon side, so if those are brought out unmodified to the
SRAM you have no guarantee there either....but for an async SRAM you
absolutely have to have that. If address and write are nominally
transitioning at the 'same' time then you won't get reliable operation (or
if you build enough of these they will 'erratically' fail) because you can't
guarantee that you've met the timing requirements of the SRAM.

>> Although maybe it sounds like a lot of work and you may think it results
>> in some sort of 'inefficient bloat' it really isn't. Any synthesizer
>> will quickly reduce the logic to what is needed based on the usage of the
>> design. What you get in exchange is very portable and reusable
>> components.
>
> No, it's really not much work. Just a few mouse clicks (no VHDL) and the
> synthesized result is not big. The SRAM tristate bridge contains just
> the address and control output registers. I assume the input registers
> are somwhere burried in the arbitrator.
>

Actually I was referring to what I had described as not being that much
work. I agree that what you've done doesn't take much work but I also don't
think that your 'sram_tristate_slave' component will work reliably if used
to interface with an external asynchronous SRAM. It probably will work if
you're interfacing it to a synchronous SRAM if you also then bring out the
Avalon clock as the SRAM clock.

I guess this also answers why you're seeing that the master device is
basically 'stuck' with wait requests until the SRAM operation has been
completed. The reason is because you as the designer of the
'sram_tristate_slave' component did not provide any code to support
registering the control signals yourself inside your component (inside the
VHDL that doesn't exist). Had you done this, you would've been able to
design a component that would allow the master to continue on while the SRAM
operation is still in progress. The only time the master would then need to
be stalled is if it performed a subsequent access to the SRAM while the
previous one was still in progress.

KJ


From: Martin Schoeberl on
>> Yes, that's the way it is described in the Quartus manual. I did my
>> SRAM interface in this way. Here is a part of the .ptf that describes
>> the timing of the external SRAM:
>>
>> SLAVE sram_tristate_slave
>> {
>> SYSTEM_BUILDER_INFO
>> {
>> ....
>> Setup_Time = "0ns";
>> Hold_Time = "2ns";
>> Read_Wait_States = "18ns";
>> Write_Wait_States = "10ns";
>> Read_Latency = "0";
>> ....
>>
>>> Given that, the VHDL code inside the SRAM controller would set it's Avalon side wait request high as appropriate while it
>>> physically performs the
>>
>> There is no VHDL code associated with this SRAM. All is done by the
>> SOPC builder.
>>
> Are we talking about interfacing with a synchronous SRAM or an async SRAM? If it's a synchronous SRAM than I agree the Avalon
> signal set is likely compatible but if it's the garden variety async SRAM where timings are measured relative to edges of WR and
> RD than what you have won't work reliably in the real world.

It's async. SRAM. And I just did what is described in the Quartus
manual. If that is not reliable then Altera should update the manual.

> The above is again making me think that we're talking about interfacing to an async SRAM. If that's the case, then from your
> description it sounds like Avalon address/read/write basically become the corresponding signals on the SRAM. If that's the case,
> then how are you guaranteeing that the address is stable at the SRAM prior to write being asserted and after it has been
> de-asserted. The way the Avalon address and write signals work they will both be transitioning some Tco after the rising edge of
> the clock. There is absolutely no guarantee of any timing relationship between address and write on the Avalon side, so if those
> are brought out unmodified to the SRAM you have no guarantee there either....but for an async SRAM you absolutely have to have
> that. If address and write are nominally transitioning at the 'same' time then you won't get reliable operation (or if you build
> enough of these they will 'erratically' fail) because you can't guarantee that you've met the timing requirements of the SRAM.

As I set setup time to 0ns, you're right. There is a little issue (depends
on the tco of the different pins) when wrn goes low before the address
is stable. That's against the SRAM timing spec. (minimum wrn low after
address is 0ns). However, I 'assume' that this does not matter. Setting
Setup_Time to something >0ns will add one additional cycle.

To avoid this little issue and the additional cycle I do usually (with
my SimpCon SRAM controller) clock the nwr with the inverted clock to
shift it after address setup.

> Actually I was referring to what I had described as not being that much work. I agree that what you've done doesn't take much
> work but I also don't think that your 'sram_tristate_slave' component will work reliably if used to interface with an external
> asynchronous SRAM. It probably will work if you're interfacing it to a synchronous SRAM if you also then bring out the Avalon
> clock as the SRAM clock.

Ok, again - it was a first try and do it as described by Altera. However,
the next step will be a 'real' SRAM slave. With the nwr trick as described
and with timing (wait states) given in clock cycles as parameter.

> 'sram_tristate_slave' component did not provide any code to support registering the control signals yourself inside your component
> (inside the VHDL that doesn't exist). Had you done this, you would've been able to

That's an option in the tristate bridge ;-) However, I assume that this means
only register the ouput to get a low tco.

> design a component that would allow the master to continue on while the SRAM operation is still in progress. The only time the
> master would then need to be stalled is if it performed a subsequent access to the SRAM while the previous one was still in
> progress.

Not for the read (in my case) as I'm waiting for the read data in the
processor. In some cases I can hide the latency by execution of additional
code. However, in this case I need the data registerd in the slave. which
is again not possible....

Martin


From: Martin Schoeberl on
> Martin Schoeberl wrote:
>> What helps is to know in advance (one or two cycles) when the result
>> will be available. That's the trick with the SimpCon interface.
>
> That approach is common internally in real cores, but adds a lot of complication while it's an open question how many Avalon
> application could benefit from it.

It's not that complicated to handle. Even one cycle in advance is
nice to know. However, it's just an additional information. You
still can ignore it.

>> There is not a single ack or waitrequest signal, but a counter that
>> will say how many cycles it will take to provide the result. In this
>> case I can restart the pipeline earlier.
>
> AFAIR, Avalon _does_ support slaves with fixed number of latency cycles, but an SDRAM controller by nature won't be fixed cycles.

Exactly in this case the counter approach helps.

>> Another point is, in my opinion, the wrong role who has to hold data
>> for more than one cycle. This is true for several busses (e.g. also
>> Wishbone). For these busses the master has to hold address and write
>> data till the slave is ready. This is a result from the backplane
>> bus thinking. In an SoC the slave can easily register those signals
>> when needed longer and the master can continue.
>
> When happens then when you issue another request to a slave which hasn't finished processing the first? Any queue will be finite
> and eventually you'd have to deal with stalling anyway. Any issue is that there are generally many more slaves than masters so it
> makes sense to move the complication to the master.

I disagree ;-)
How hard is it for a slave to hold the read data more than one cycle?
Until the next read data is requested and available? That comes almost
for free. It's a single register, trivial logic. Ok, is a little overhead
for an on-chip peripheral. However, you need usually a MUX in the
peripheral for select the IO registers (now using register with a different
meaning). Making this MUX registered is almost for free.

And you win for the master. Let's say it unformal:
Give the master more freedom to move, you usually have less
masters than slaves in a system. Therfore, the master(s) will
be the bottleneck.

> ...
>> Wishbone and Avalon specify just a single cycle data valid.
>
> Again, simplify the slave (and the interconnect) and burden the master.

I argue the other way round - now it becomes almost political ;-)

> Avalon is IMO the best balance between complexity, performance and features in all the (few) interconnect I've seen yet (I haven't
> seen SimpCon yet). In particular I found Wishbone severely lacking for my needs. Avalon is proprietary though, so I roll my own
> portable implementation inspired by Avalon with just the features I needed:
> - all reads are pipelined with variable latency (accept of request is distinct from delivery of data, thus inherently supporting
> multiple outstanding requests)
> - multi master support
> - burst support (actually not implemented yet, but not that hard)

Ok, as you roll your own (like I do) perhaps we can agree on
one interface. In that case - two using the same interface -
it's almost a standard...

> It's nearly as trivial as Wishbone, though offers much higher performance. Latency is entirely up to the slave which can deliver
> data as soon as the cycle after the request was posted. (Though, arriving at this simplicity took a few false starts).

The same in SimpCon.

Martin


First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13
Prev: Embedded clocks
Next: CPU design