From: Casey Leedom on
| From: Andy Gospodarek <andy(a)greyhouse.net>
| Date: Wednesday, July 21, 2010 08:07 am
|
| Agreed. The subtle difference between a locally assigned address that
| is persistent and one that is random would be helpful.

And just to point out that this case _does_ exist: the igb/igbvf drivers use
random_ether_addr() to generate a random, locally assigned MAC address for the
PCI-E SR-IOV Virtual Function MAC Addresses while the cxgb4/cxgb4vf drivers use
a persistent, non-random locally assigned MAC Addresses.

Note that I am neither arguing for nor against the proposal. I'm just
pointing out an existence case for the distinction. And yes, bit 1 being set in
the first octet of a MAC address for locally assigned MAC Addresses is part of
the IEEE 802 specification just as bit 0 being set in the same octet indicates
that it's a multi-station address.

Casey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Casey Leedom on
| From: David Miller <davem(a)davemloft.net>
| Date: Wednesday, July 21, 2010 10:32 am
|
| From: Stephen Hemminger <shemminger(a)vyatta.com>
| Date: Wed, 21 Jul 2010 10:28:16 -0700
|
| > IMHO no local assigned address should be used by udev. The cxgb4 driver
| > should be using random value.
| >
| > Does anyone have an example of locally assigned address that has
| > persistence so that udev could use it.
|
| The cxgb4 vf addresses are not random because they are fetched from the
| card's NVRAM/EEPROM/firmware/whatever and thus are persistent.
|
| We definitely want udev to use persistent rules for them.
|
| This whole issue only exists because of the Intel VF case, where it
| lacks persistent addresses but somehow we want to assign persistent
| names to it's VF interfaces.

Yes, we _explicitly_ wanted to have persistent MAC Addresses for our PCI-E SR-
IOV Virtual Functions for a whole raft of reasons. The two most important were:

1. Linux' model for persistent device naming today seems to be
oriented around persistent network device addresses.

2. Lots of data centers use MAC addresses for things like DHCP/BOOTP,
security/filtering, etc.

Our design goal was to look as much like a normal Ethernet MAC as possible in
order to reduce the need for software/behavior changes.

| One idea I've proposed in other discussions about this is that if the
| address is not persistent (either via the MAC address bit or the sysfs
| value we're thinking of providing here) we use the device's geographic
| location ("device path") as the key for udev stuff.

Another option might be to have a new Net Device Operations call to ask the
adapter for a Unique Key. This could be formed for most devices via a tuple of
the {PCI Vendor ID, PCI Device ID, Adapter Serial Number, Port Number, [and if
applicable] Adapter Function ID}. Of course this could be a fairly long string
.... :-)

Casey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Casey Leedom on
| From: David Miller <davem(a)davemloft.net>
| Date: Wednesday, July 21, 2010 11:39 am
|
| From: Casey Leedom <leedom(a)chelsio.com>
| Date: Wed, 21 Jul 2010 11:29:47 -0700
|
| > Another option might be to have a new Net Device Operations call to ask
| > the adapter for a Unique Key. This could be formed for most devices via a
| > tuple of the {PCI Vendor ID, PCI Device ID, Adapter Serial Number, Port
| > Number, [and if applicable] Adapter Function ID}. Of course this could
| > be a fairly long string ... :-)
|
| If a unique key were available, it could be used to generate a persistent
| MAC address.

True but ... the Unique Key name space is probably a lot larger (in bits) than
the MAC Address name space (~(48-2) bits) ... :-)

| And this sort of means that these drivers could use bits of the
| device's geographic ID the construct persistent MAC addresses, but
| only if done in a MAC namespace that could be guarenteed unique on the
| local system.

Yep. That's the problem of trying to construct a Unique Locally Assigned MAC
Address from a Unique Name in a larger name space.

| From: "Rose, Gregory V" <gregory.v.rose(a)intel.com>
| Date: Wednesday, July 21, 2010 11:43 am
|
| I'm curious, what happens when the VM using the VF migrates to a new
| machine and has another VF assigned to with a different MAC address?

To be honest, I still haven't wrapped my head around how Virtual Machines are
ever going to be able to migrate when they have arbitrary PCI Devices "assigned"
(KVM Terminology) to them (AKA "PCI Pass Through"). Allowing VMs to directly
touch real and arbitrary hardware devices means that some abstraction of "saving
the device state" and "restoring the device state" can be successfully
negotiated ... which would be hard even if you quiesce the device and you
migrate to another Physical Host with identical PCI Hardware which is then
"assigned" to the migrated VM. Hard, hard, hard.

This is why most of the Hypervisor systems have used synthetic Pseudo Devices
to allow that state of those Virtual Devices to be migrated (including the MAC
Addresses which we've been talking about). I actually think that the Microsoft
HyperV approach of Virtual Ingress Queues may be a better solution. You still
need to make the VM-to-Hypervisor transitions but you get to avoid the Free List
memory copy costs which are actually the dominant cost in the RX path to VMs
using software vNICs.

But that's straying far from the topic at hand. The short answer is pretty
much what David suggests: the _hardware_ PCI-E SR-IOV Virtual Function provides
persistent, non-random MAC Addresses for use by the VF Driver -- if it wants to
use them. A VF Driver running in a VM is capable of specifying arbitrary MAC
Addresses for use with the VF and may ignore the hardware MAC Addresses provided
by the VF. This is little different from the current situation with software
vNICs which use manufactured MAC Addresses (which are persistent in all of the
Hypervisor systems at which I've looked).

| From: David Miller <davem(a)davemloft.net>
| Date: Wednesday, July 21, 2010 11:48 am
|
| If the VM itself is the "physical entity" of the system, the logical
| conclusion I come to is that some kind of key should be obtained
| through the VM to uniquely give the device a persistent MAC.

Which is, as above, what all Hypervisor systems which I've looked at do.

| You could do things like have the PF controller use the root filesystem
| ID label to construct the VF's MAC address, or something like that.

It's actually stored in the VM's meta-data. When a VM migrates from one
Physical Host to another all of the VM's transient and persistent state must be
available to the new Physical Host. Xen, for instance, has the concept of a
Physical Host Pool where all of the Physical Hosts have common access to shared
resources like Network Attached Storage, LAN/VLANs and the shared VM meta-data.

Casey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Casey Leedom on
On Jul 21, 2010, at 11:53 PM, Stefan Assmann wrote:

> Using the VF in the host is a feature and I'm sure people will think of
> ways to make good use of it. However the actual problem we've seen is a
> more practical one. So to pass-through a VF to a VM the host has to be
> aware that the VF exists. Therefore you usually have to enable the VF in
> the host (i.e. specify the max_vfs parameter). The device will be
> discovered by the system and because of the random MAC address udev
> ignores the new device. With the additional information we provide with
> our solution udev will be able to recognize the device by it's "device
> path" and handle it properly (until you decide to pass it to a VM or
> just be happy with it in the host).

Or you simply don't have the VF Driver loaded in the "Domain 0" Control OS. When we install the cxgb4 PF Driver with "num_vf=..." this enables the PCI-E SR-IOV Capabilities within the various PFs and the corresponding VF PCI Devices are instantiated and discovered by the Domain 0 Linux OS. But without a cxgb4vf VF Driver loaded, those devices just sit there � available for "Device Assignment" to VMs.

> Remember the issue that lead to the proposal of renaming VFs to vfeth?
> That's exactly the problem we try to fix. Additional benefit of an
> "address assignment type" as Ben likes to call it would be the handling
> of MAC address stealing NICs.

The above was mostly to cope with some SR-IOV Drivers using random MAC addresses for the VFs.

Casey--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Casey Leedom on
| From: Stefan Assmann <sassmann(a)redhat.com>
| Date: Friday, July 23, 2010 01:08 am
|
| On 23.07.2010 02:26, Casey Leedom wrote:
| > Or you simply don't have the VF Driver loaded in the "Domain 0" Control
| > OS. When we install the cxgb4 PF Driver with "num_vf=..." this enables
| > the PCI-E SR-IOV Capabilities within the various PFs and the
| > corresponding VF PCI Devices are instantiated and discovered by the
| > Domain 0 Linux OS. But without a cxgb4vf VF Driver loaded, those
| > devices just sit there available for "Device Assignment" to VMs.
|
| Just out of curiosity, how do you prevent the VF driver from getting
| loaded in the host? Except from blacklisting it.

I don't install them. :-)

I'm actually fairly unfamiliar with the details of managing/administering
Linux systems so I'm guessing that there are much better ways of controlling for
which devices a Linux system will attempt to load drivers. For instance, I
didn't know about the concept of "blacklisting" a driver.

Casey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/