From: glen herrmannsfeldt on
John_H <newsgroup(a)johnhandwork.com> wrote:
(snip)

> Great advice. You can also perform several stages of the division per
> clock cycle reducing a 16-bit division to 4 clock cycles, for
> instance. There are better ways to perform pipelined division but
> you'll need to consider this approach regardless.

If the clock rate is fixed, and you don't need throughput, then yet.
Most FPGAs have an FF for each LUT, so pipelining is free.

(snip of multiply instead of divide)

> To understand a faster way to divide compared to "determining whether
> you can subtract the divisor for each stage of the pipeline then
> subtracting from or passing the previous value" you can instead
> *always* subtract from a positive value and *always* add from a
> negative value, simply appending a little arithmetic to the signs from
> the intermediate stages to get the final result. Try doing some
> binary long division by hand with the two approaches and you may see
> how you can come up with the same results with better optimized
> hardware.

I haven't thought of it in terms of current FPGA hardware.
That works if you can easily switch between add and subtract
based on the previous cycle. I forget which FPGAs do that
and which don't. Look up non-restoring division in any
computer arithmetic book.

-- glen