From: Bruce Tomlin on
In article <qhwtch92jx.fsf(a)ruckus.brouhaha.com>,
Eric Smith <eric(a)brouhaha.com> wrote:

> "John Selck" <gpjiweg(a)t-online.de> writes:
> > And then we would have ended up like the Z80: That a copy loop is faster
> > than the actual block copy instruction :D
>
> Can you cite an example? The block move instructions on the Z80 are
> actually fairly efficient. They use three memory cycles to move a byte,
> vs. the theoretical minimum of two. Doing a block copy via a software
> loop is going to require at least five memory cycles per byte moved, and
> probably more.

That was my understanding as well, having actually done machine language
on a Z-80 (typing in hex like ED B0 because it was faster than using an
assembler).
From: heuser.marcus on
Be careful or Captain Zilog crushes you!

http://www.milehighcomics.com/cgi-bin/backissue.cgi?action=fullsize&issue=14882664687%201

;o)

bye
Marcus

From: Linards Ticmanis on
Michael J. Mahon wrote:

> Wow--they needed more than a megabyte/second of video data? I guess
> if you have about the same resolution as an Apple II, but twice the
> color depth, then you do... The //c and following Apple II's got
> around this by providing another memory bank in parallel with the
> main memory bank.

The C64 mainly needed it for its text mode. Unlike the Apple, the
character shapes reside within the Video chip's AND the CPUs normal
memory map, are read over the normal address bus, and can be redefined
by pointing a register to RAM instead of ROM.

So once every line of characters the CPU has to stop for 40 cycles, so
that the character numbers can be read and stored. Then during the
normal video cycles only the character shapes are read. Color numbers
for the characters, on the other hand, are read in parallel from a
special SRAM with its own addressing lines going to the video chip.

The Apple reads the character numbers on every line during its normal
video access, and reads the character shapes in parallel from a ROM
that's not visible to the CPU. Thus you have fixed character shapes.#

In multicolor hires mode, the same system is used to read a larger set
of color registers (instead of the character numbers). Sprites (movable
object blocks) are created with a similar process.

The C64 design is an improvement on the VIC 20. The VIC also uses
changable character shapes, but because it doesn't stop the CPU it gets
only those 22 (or 23?) super broad characters per line. No time to read
more if you need to read both character numbers and character shapes
with no more than 1MHz of bandwidth.

Of course redefinable characters made for very nice (and fast) building
blocks of game backgrounds, and were used extensively in scroller type
games. That's one of the reasons why there are so many scrollers for the
Commodores and so few for the Apple.

--
Linards Ticmanis
From: John Selck on
Am 19.05.2006, 22:34 Uhr, schrieb Eric Smith <eric(a)brouhaha.com>:

> "John Selck" <gpjiweg(a)t-online.de> writes:
>> And then we would have ended up like the Z80: That a copy loop is faster
>> than the actual block copy instruction :D
>
> Can you cite an example? The block move instructions on the Z80 are
> actually fairly efficient. They use three memory cycles to move a byte,
> vs. the theoretical minimum of two. Doing a block copy via a software
> loop is going to require at least five memory cycles per byte moved, and
> probably more.

LDIR eats 21 clock cycles per iteration, doing the same with other Z80
instructions can be way faster.

I tried to use the Z80 in the C128 for block copy/fill because I thought
"hey, it has a block copy command, so I guess it is fast" but it wasn't.
Then I tried normal opcodes, it was way faster than using LDIR but still
not faster than copying the stuff with the 8502.

Here's a list of Z80 instruction and their clock cycles:

http://www.ftp83plus.net/Tutorials/z80inset1A.htm
From: John Selck on
Am 12.05.2006, 04:21 Uhr, schrieb Michael J. Mahon <mjmahon(a)aol.com>:

> And in any case, depending on the peculiarities of a particular chip
> implementation is just asking to be locked out of future improvements.

Like I stated several times now: You can easily do a processor check and
use different code. On plain 6502 you use the faster routine with illegals
and on 65816 etc you use a normal routine. It's 5 minutes work to do that.