From: dgreig on
On Apr 27, 11:05 pm, Jonathan Bromley <s...(a)oxfordbromley.plus.com>
wrote:
> On Tue, 27 Apr 2010 03:41:01 -0700 (PDT), dgreig wrote:
> >Unfortunataly unsigned to signed requires zero padding, adding the
> >extra bit inferres a 18*18 block rather than 9*9. In the case of 18
> >bit inputs the unsigned to signed requires one more bit than the block
> >actually has.
>
> What about Kolja Sulimma's suggestion of a conditional adder
> after a  17x18 multiply?  This is only a sketch, but shows
> that it is quite neat both in VHDL code and in hardware:
>
> subtype S36 is signed(35 downto 0);
>
> function U18xS18 (
>   U: unsigned(17 downto 0),
>   S: signed(17 downto 0)
> ) return S36 is
>   variable product: S36;
> begin
>   product := signed'(U) * S;
>   if (U(17) = '1') then
>     product(35 downto 18) :=
>       product(35 downto 18) + signed'(U);
>   end if;
>   return product;
> end;
>
> Disclaimer: I haven't tried synthesising this, and I suspect you
> may need to play with the code some more to get the best
> synthesis results.
> --
> Jonathan Bromley

Problem is more logic + routing + clock cycle latency. The dirty
method altmult_accum at least makes best use of the resources and the
2 cycle latency is key to system throughput. System uses 282.5 out of
288 18*18 multipliers, 91% block ram and ~70% of logic elements. Still
achieving a 25% timing margin on a mid speed device (slowest max is
~200MHz) and I am reluctant to push the boat out. The niggle is that
apart from pll's and DDR2 specifics, the code would be otherwise
transportable.
First  |  Prev  | 
Pages: 1 2 3
Prev: data2mem
Next: Booting Linux from my own bootloader