x86 i/o management [Computer Architecture]

Prev: Intel cache inclusion
Next: Free/Open x86 Sim (was Re: Multi-star)

From: Paul Wallich on 10 May 2010 20:36

kenney(a)cix.compulink.co.uk wrote:
> In article <4BE72A38.1040100(a)patten-glew.net>, ag-news(a)patten-glew.net
> (Andy 'Krazy' Glew) wrote:
>
>> There are advantages to I/O ports over memory mapped I/O. Basically,
>> it tells the processor earlier, before address decode, that it is
>> likely to involve a serialization.
>
> Just as important in the 8 bit days was that it saved memory. Also if
> the number of I/O ports was limited the address bus did not need to be
> fully decoded which saved lines on an external bus. The Z80 and I think
> the 8080 had a control line which distinguished between I/O port and
> memory access.

I'm not sure this historical argument really holds water. You saved
memory addressing, sure, but at the cost of a whole separate set of
wires going to the I/O devices. (Or a control line, as above, that
caused the same wires to have a different use). And you cost yourself
instruction space, which was also at a premium. What you really saved,
it seems to me, is complexity in the I/O devices, because they didn't
have to meet any of the requirements of living on the same bus as RAM.

In later years, people also did things like bank-swapping, so that a
control line or special register could give you access to memory that
lived at I/O addresses; they also didn't bother to do full decoding of
addresses, so the same I/O registers might be visible at a whole pile of
different locations.

paul

From: Andy 'Krazy' Glew on 10 May 2010 22:16

On 5/10/2010 5:13 AM, Quadibloc wrote:
> On May 9, 3:33 pm, Andy 'Krazy' Glew<ag-n...(a)patten-glew.net> wrote:
>> So all memory accesses are burdened with the cost of
>> determining that a location is or is not a memory mapped I/O location, with possible side effects.
>
> I thought there was some kind of table inside the microprocessor that
> gets set by the BIOS during bootup to tell it that this range of
> memory addresses is I/O, please don't cache it.

Yes. Apart from the page tables, which have had PCD and WT bits since they were added (in the i386?), there are the
MTRRs, which were added by P6. I take the blame for them.

However, this is exactly my point:

NOW, WITH MEMORY MAPPED I/O: we have to generate the physical address, so that we can look up the memory type.

BEFORE THE MTRRs: Not only did we have to generate the physical addres, but we had to send the request out to the
microprocessor bus. Chipset logic would decode the address, and tell us what the PCD and PWT settings. I.e. the MTRRs
sped things up wrt the prior x86 art.

HOWEVER, there is still quite a long time between the instruction being fetched, decoded, and being able to compute the
linear (virtual) and lookup the physical address. In particular, subsequent memory access instructions may already have
started to execute. Therefore, if you want a memory mapped I/O location to have side effects that affect subsequent
instructions, they themselves must be uncached and strongly ordered.

I think this has actually led us to a better design point - at least one where SpMT is allowed, because the rule is that
you can speculatively access anything that isn't explicitly marked UC uncacheable.

But it leads to annoyances such as UC being always slow and worst case. As we have discussed on comp.arch before,
there really need to be several flavors of UC: UC, suitable for memory mapped I/O; UC, suitable for normal memory that
you happen to want to keep uncached. ...

> And in modern ones,
> there would even be another table that says this range of addresses
> are 16 bits wide, so just use 16 out of the 64 bits of your data bus,
> and start using the lower bits of your address bus.

There is no such table inside the processor itself.

There are routing tables in the chipset(s).

From: Andy 'Krazy' Glew on 10 May 2010 22:24

On 5/10/2010 12:24 PM, Jeremy Linton wrote:
> On 5/9/2010 4:33 PM, Andy 'Krazy' Glew wrote:
>> However, by now I/O ports are mainly just legacy. Memory mapped I/O
>> rules, largely because of the ease of writing C device drivers to access
>> the control registers of such devices. So all memory accesses are
>> burdened with the cost of determining that a location is or is not a
>> memory mapped I/O location, with possible side effects.
>
> As someone who writes a fair number of drivers in C, I fail to see how
> memory mapped register windows provide me any benefit over IO mapped
> ones. The extra pain of fencing far outweighs the advantage of writting
> code which obscures the fact i'm writting a device register (aka pointer
> offset with casts). In code I write I almost always have macros of the
> form WRITE_REGISTER_32BITS(dev,REGISTERNAME,value) etc.. Those could
> just as well be doing IN/OUT.
>
> I really never understood the push for memory mapping everything
> (especially registers). Plus, I can't remember when I last mapped
> something that wasn't a register window. DMA seems to be the perfered
> method of transferring data to/from nearly every modern device. The
> mappings exist only to control the device.
> Even graphics cards which trandionally map a fair amount into the
> address space, apparently work better using double buffering methods.
> Plus, the most common io mechanisms often don't exactly lend themselves
> to direct device mapping into a user buffer. So in the end, I end up
> doing a lot of copy_in/out if the device cannot DMA directly into a
> pinned user allocated buffer. This kills performace and is probably my
> number one gripe about a couple of devices I've used recently.

I agree with you, Jeremy.

Anybody who cares about writing portable code accesses device registers with macros such as

WRITE_CONTROL_BIT_IN_REGISTER(ctladdr,bitnum,bitval)

or the like - for reasons of endianness, if nothing else.

You can't reliably, in C or C++, do something like:

struct IO_Device {
unsigned flag:15;
unsigned rest:17;
};

because the languages make almost no guarantees about where the bits get put, and whether there is padding.

Some ABIs do, however, make sufficient guarantees.

However, as recently as last week I was noodling around and foundsomebody in an embedded processor magazine or forum
recommending code such as the above, using structs.

Which, I suppose, is good enough if you think all of the world is an x86.

Ada had enough language features to make declarung records for I/O safe.

I hate much of Ada, but I regret the loss of this.

From: Andy 'Krazy' Glew on 10 May 2010 22:26

On 5/10/2010 3:25 PM, MitchAlsup wrote:
> On May 10, 2:24 pm, Jeremy Linton<reply-to-l...(a)nospam.org> wrote:
>> On 5/9/2010 4:33 PM, Andy 'Krazy' Glew wrote:
>>
>>> However, by now I/O ports are mainly just legacy. Memory mapped I/O
>>> rules, largely because of the ease of writing C device drivers to access
>>> the control registers of such devices. So all memory accesses are
>>> burdened with the cost of determining that a location is or is not a
>>> memory mapped I/O location, with possible side effects.
>>
>> As someone who writes a fair number of drivers in C, I fail to see how
>> memory mapped register windows provide me any benefit over IO mapped
>> ones.
>
> Just try to read a 64-bit quantity into RAX with an IN instruction or
> the converse with an Out instruction.

I rather liked Sun's load/store alternate address space.

Basically any memory operation. 64 bits, whatever. IIRC

But the fact that you were operating on something that was not ordinary memory was visible at the decoder.

From: Andy 'Krazy' Glew on 10 May 2010 22:33

On 5/10/2010 3:25 PM, MitchAlsup wrote:

> However, I expect that it is a lot harder to virtualize In and Out
> than the memory mapped device stuff, even if it is detected later in
> the pipeline and causes all sorts of miscreant activities.

I don't recall any truly fundamental issues. Except...

You have to virtualize memory. Virtualizing x86 legacy I/O is just another thing to do. Perhaps straightforward, but
nevertheless annoying.

I thought Itanium had a good idea here. They ran x86 I/O instructions through the page tables, kluging things so that
any access to a 16 bit address ABCD(hex) appeared to be to a linear address ABCDABCD(hex) i.e. they multiplied by
2^16+1, i.e. 10001 hex. (I call this the multiply by 11 trick.) So each I/O port got a different page table entry, so
that you could hang protection off it using the existing virtual memory address translation hardware.

> {However**2--SPARC's model of Address Spaces is even worse.}

Ooops. I just said I liked SPARC alternate address spaces. But, I have never implemented them, whereas you have,
Mitch. Tell us why they are worse, eh?

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8
Prev: Intel cache inclusion
Next: Free/Open x86 Sim (was Re: Multi-star)