From: David Miller on
From: Stephen Hemminger <shemminger(a)vyatta.com>
Date: Wed, 30 Sep 2009 17:39:23 -0700

> Why not use NETIF_F_LRO and ethtool to control LRO support?

In fact, you must, in order to handle bridging and routing
correctly.

Bridging and routing is illegal with LRO enabled, so the kernel
automatically issues the necessary ethtool commands to disable
LRO in the relevant devices.

Therefore you must support the ethtool LRO operation in order to
support LRO at all.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
Date: Wed, 30 Sep 2009 14:34:57 -0700 (PDT)

> +{
> + struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> + u8 *base;
> + int i;
> +
> + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> VMXNET3_CMD_GET_STATS);
> +
> + /* this does assume each counter is 64-bit wide */
> +
> + base = (u8 *)&adapter->tqd_start->stats;
> + for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_dev_stats); i++)
> + *buf++ = *(u64 *)(base + vmxnet3_tq_dev_stats[i].offset);
> +
> + base = (u8 *)&adapter->tx_queue.stats;
> + for (i = 0; i < ARRAY_SIZE(vmxnet3_tq_driver_stats); i++)
> + *buf++ = *(u64 *)(base + vmxnet3_tq_driver_stats[i].offset);
> +
> + base = (u8 *)&adapter->rqd_start->stats;

There's a lot of code like this that isn't indented properly. Either
that or your email client has corrupted the patch by breaking up long
lines or similar.

Another example:

> +static int
> +vmxnet3_set_rx_csum(struct net_device *netdev, u32 val)
> +{
> + struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> +
> + if (adapter->rxcsum != val) {
> + adapter->rxcsum = val;
> + if (netif_running(netdev)) {
> + if (val)
> + adapter->shared->devRead.misc.uptFeatures |=
> + UPT1_F_RXCSUM;
> + else
> + adapter->shared->devRead.misc.uptFeatures &=
> + ~UPT1_F_RXCSUM;
> +
> + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> + VMXNET3_CMD_UPDATE_FEATURE);
> + }
> + }
> + return 0;
> +}

Yikes! :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Shreyas Bhatewara on

Hello all,

I do not mean to be bothersome but this thread has been unusually silent.
Could you please review the patch for me and reply with your comments /
acks ?

Thanks.
->Shreyas


On Tue, 6 Oct 2009, Shreyas Bhatewara wrote:

>
>
> Ethernet NIC driver for VMware's vmxnet3
>
> From: Shreyas Bhatewara <sbhatewara(a)vmware.com>
>
> This patch adds driver support for VMware's virtual Ethernet NIC: vmxnet3
> Guests running on VMware hypervisors supporting vmxnet3 device will thus have
> access to improved network functionalities and performance.
>
> Signed-off-by: Shreyas Bhatewara <sbhatewara(a)vmware.com>
> Signed-off-by: Bhavesh Davda <bhavesh(a)vmware.com>
> Signed-off-by: Ronghua Zhang <ronghua(a)vmware.com>
>
> ---
>
> VMware virtual Ethernet NIC Driver : vmxnet3 - v3
>
> Changelog (v3-v2)
> - use ethtool instead of a module param to control hw LRO feature
> - rebase to 2.6.32-rc3
>
> Changelog (v2-v1)
> - Rebased the patch to v2.6.32-rc1
> - Changed all uint32_t types to u32 and friends
> - Removed duplicate max queue size from upt1_defs.h
> - Replaced #defines by enum
> - uniform spacing between datatype and membername in structures
> - removed some noisy printks, eliminated some BUG_ONs
> - corrected arguments of kcalloc
> - used pc_request_selected_regions, pci_enable_dev_mem
> - elminated not-so-useful wrapper functions, used eth_op_ functions instead
> - used strlcpy
> - used get_sset_counts instead of get_stats_count
> - used net_device_stats from struct net_device
>
>
> Please review the patch and provide your feedback/comments for
> upstreaming.
>
> Thanking you
> ->Shreyas
>
> ---
>
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 09a2028..0509f26 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5628,6 +5628,13 @@ S: Maintained
> F: drivers/vlynq/vlynq.c
> F: include/linux/vlynq.h
>
> +VMWARE VMXNET3 ETHERNET DRIVER
> +M: Shreyas Bhatewara <sbhatewara(a)vmware.com>
> +M: VMware, Inc. <pv-drivers(a)vmware.com>
> +L: netdev(a)vger.kernel.org
> +S: Maintained
> +F: drivers/net/vmxnet3/
> +
> VOLTAGE AND CURRENT REGULATOR FRAMEWORK
> M: Liam Girdwood <lrg(a)slimlogic.co.uk>
> M: Mark Brown <broonie(a)opensource.wolfsonmicro.com>
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 7127760..9789de2 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -3230,4 +3230,12 @@ config VIRTIO_NET
> This is the virtual network driver for virtio. It can be used with
> lguest or QEMU based VMMs (like KVM or Xen). Say Y or M.
>
> +config VMXNET3
> + tristate "VMware VMXNET3 ethernet driver"
> + depends on PCI && X86
> + help
> + This driver supports VMware's vmxnet3 virtual ethernet NIC.
> + To compile this driver as a module, choose M here: the
> + module will be called vmxnet3.
> +
> endif # NETDEVICES
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index d866b8c..d3a0418 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -26,6 +26,7 @@ obj-$(CONFIG_TEHUTI) += tehuti.o
> obj-$(CONFIG_ENIC) += enic/
> obj-$(CONFIG_JME) += jme.o
> obj-$(CONFIG_BE2NET) += benet/
> +obj-$(CONFIG_VMXNET3) += vmxnet3/
>
> gianfar_driver-objs := gianfar.o \
> gianfar_ethtool.o \
> diff --git a/drivers/net/vmxnet3/Makefile b/drivers/net/vmxnet3/Makefile
> new file mode 100644
> index 0000000..880f509
> --- /dev/null
> +++ b/drivers/net/vmxnet3/Makefile
> @@ -0,0 +1,35 @@
> +################################################################################
> +#
> +# Linux driver for VMware's vmxnet3 ethernet NIC.
> +#
> +# Copyright (C) 2007-2009, VMware, Inc. All Rights Reserved.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms of the GNU General Public License as published by the
> +# Free Software Foundation; version 2 of the License and no later version.
> +#
> +# This program is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> +# NON INFRINGEMENT. See the GNU General Public License for more
> +# details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program; if not, write to the Free Software
> +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> +#
> +# The full GNU General Public License is included in this distribution in
> +# the file called "COPYING".
> +#
> +# Maintained by: Shreyas Bhatewara <pv-drivers(a)vmware.com>
> +#
> +#
> +################################################################################
> +
> +#
> +# Makefile for the VMware vmxnet3 ethernet NIC driver
> +#
> +
> +obj-$(CONFIG_VMXNET3) += vmxnet3.o
> +
> +vmxnet3-objs := vmxnet3_drv.o vmxnet3_ethtool.o
> diff --git a/drivers/net/vmxnet3/upt1_defs.h b/drivers/net/vmxnet3/upt1_defs.h
> new file mode 100644
> index 0000000..37108fb
> --- /dev/null
> +++ b/drivers/net/vmxnet3/upt1_defs.h
> @@ -0,0 +1,96 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara <pv-drivers(a)vmware.com>
> + *
> + */
> +
> +#ifndef _UPT1_DEFS_H
> +#define _UPT1_DEFS_H
> +
> +struct UPT1_TxStats {
> + u64 TSOPktsTxOK; /* TSO pkts post-segmentation */
> + u64 TSOBytesTxOK;
> + u64 ucastPktsTxOK;
> + u64 ucastBytesTxOK;
> + u64 mcastPktsTxOK;
> + u64 mcastBytesTxOK;
> + u64 bcastPktsTxOK;
> + u64 bcastBytesTxOK;
> + u64 pktsTxError;
> + u64 pktsTxDiscard;
> +};
> +
> +struct UPT1_RxStats {
> + u64 LROPktsRxOK; /* LRO pkts */
> + u64 LROBytesRxOK; /* bytes from LRO pkts */
> + /* the following counters are for pkts from the wire, i.e., pre-LRO */
> + u64 ucastPktsRxOK;
> + u64 ucastBytesRxOK;
> + u64 mcastPktsRxOK;
> + u64 mcastBytesRxOK;
> + u64 bcastPktsRxOK;
> + u64 bcastBytesRxOK;
> + u64 pktsRxOutOfBuf;
> + u64 pktsRxError;
> +};
> +
> +/* interrupt moderation level */
> +enum {
> + UPT1_IML_NONE = 0, /* no interrupt moderation */
> + UPT1_IML_HIGHEST = 7, /* least intr generated */
> + UPT1_IML_ADAPTIVE = 8, /* adpative intr moderation */
> +};
> +/* values for UPT1_RSSConf.hashFunc */
> +enum {
> + UPT1_RSS_HASH_TYPE_NONE = 0x0,
> + UPT1_RSS_HASH_TYPE_IPV4 = 0x01,
> + UPT1_RSS_HASH_TYPE_TCP_IPV4 = 0x02,
> + UPT1_RSS_HASH_TYPE_IPV6 = 0x04,
> + UPT1_RSS_HASH_TYPE_TCP_IPV6 = 0x08,
> +};
> +
> +enum {
> + UPT1_RSS_HASH_FUNC_NONE = 0x0,
> + UPT1_RSS_HASH_FUNC_TOEPLITZ = 0x01,
> +};
> +
> +#define UPT1_RSS_MAX_KEY_SIZE 40
> +#define UPT1_RSS_MAX_IND_TABLE_SIZE 128
> +
> +struct UPT1_RSSConf {
> + u16 hashType;
> + u16 hashFunc;
> + u16 hashKeySize;
> + u16 indTableSize;
> + u8 hashKey[UPT1_RSS_MAX_KEY_SIZE];
> + u8 indTable[UPT1_RSS_MAX_IND_TABLE_SIZE];
> +};
> +
> +/* features */
> +enum {
> + UPT1_F_RXCSUM = 0x0001, /* rx csum verification */
> + UPT1_F_RSS = 0x0002,
> + UPT1_F_RXVLAN = 0x0004, /* VLAN tag stripping */
> + UPT1_F_LRO = 0x0008,
> +};
> +#endif
> diff --git a/drivers/net/vmxnet3/vmxnet3_defs.h b/drivers/net/vmxnet3/vmxnet3_defs.h
> new file mode 100644
> index 0000000..dc8ee44
> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_defs.h
> @@ -0,0 +1,535 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara <pv-drivers(a)vmware.com>
> + *
> + */
> +
> +#ifndef _VMXNET3_DEFS_H_
> +#define _VMXNET3_DEFS_H_
> +
> +#include "upt1_defs.h"
> +
> +/* all registers are 32 bit wide */
> +/* BAR 1 */
> +enum {
> + VMXNET3_REG_VRRS = 0x0, /* Vmxnet3 Revision Report Selection */
> + VMXNET3_REG_UVRS = 0x8, /* UPT Version Report Selection */
> + VMXNET3_REG_DSAL = 0x10, /* Driver Shared Address Low */
> + VMXNET3_REG_DSAH = 0x18, /* Driver Shared Address High */
> + VMXNET3_REG_CMD = 0x20, /* Command */
> + VMXNET3_REG_MACL = 0x28, /* MAC Address Low */
> + VMXNET3_REG_MACH = 0x30, /* MAC Address High */
> + VMXNET3_REG_ICR = 0x38, /* Interrupt Cause Register */
> + VMXNET3_REG_ECR = 0x40 /* Event Cause Register */
> +};
> +
> +/* BAR 0 */
> +enum {
> + VMXNET3_REG_IMR = 0x0, /* Interrupt Mask Register */
> + VMXNET3_REG_TXPROD = 0x600, /* Tx Producer Index */
> + VMXNET3_REG_RXPROD = 0x800, /* Rx Producer Index for ring 1 */
> + VMXNET3_REG_RXPROD2 = 0xA00 /* Rx Producer Index for ring 2 */
> +};
> +
> +#define VMXNET3_PT_REG_SIZE 4096 /* BAR 0 */
> +#define VMXNET3_VD_REG_SIZE 4096 /* BAR 1 */
> +
> +#define VMXNET3_REG_ALIGN 8 /* All registers are 8-byte aligned. */
> +#define VMXNET3_REG_ALIGN_MASK 0x7
> +
> +/* I/O Mapped access to registers */
> +#define VMXNET3_IO_TYPE_PT 0
> +#define VMXNET3_IO_TYPE_VD 1
> +#define VMXNET3_IO_ADDR(type, reg) (((type) << 24) | ((reg) & 0xFFFFFF))
> +#define VMXNET3_IO_TYPE(addr) ((addr) >> 24)
> +#define VMXNET3_IO_REG(addr) ((addr) & 0xFFFFFF)
> +
> +enum {
> + VMXNET3_CMD_FIRST_SET = 0xCAFE0000,
> + VMXNET3_CMD_ACTIVATE_DEV = VMXNET3_CMD_FIRST_SET,
> + VMXNET3_CMD_QUIESCE_DEV,
> + VMXNET3_CMD_RESET_DEV,
> + VMXNET3_CMD_UPDATE_RX_MODE,
> + VMXNET3_CMD_UPDATE_MAC_FILTERS,
> + VMXNET3_CMD_UPDATE_VLAN_FILTERS,
> + VMXNET3_CMD_UPDATE_RSSIDT,
> + VMXNET3_CMD_UPDATE_IML,
> + VMXNET3_CMD_UPDATE_PMCFG,
> + VMXNET3_CMD_UPDATE_FEATURE,
> + VMXNET3_CMD_LOAD_PLUGIN,
> +
> + VMXNET3_CMD_FIRST_GET = 0xF00D0000,
> + VMXNET3_CMD_GET_QUEUE_STATUS = VMXNET3_CMD_FIRST_GET,
> + VMXNET3_CMD_GET_STATS,
> + VMXNET3_CMD_GET_LINK,
> + VMXNET3_CMD_GET_PERM_MAC_LO,
> + VMXNET3_CMD_GET_PERM_MAC_HI,
> + VMXNET3_CMD_GET_DID_LO,
> + VMXNET3_CMD_GET_DID_HI,
> + VMXNET3_CMD_GET_DEV_EXTRA_INFO,
> + VMXNET3_CMD_GET_CONF_INTR
> +};
> +
> +struct Vmxnet3_TxDesc {
> + u64 addr;
> +
> + u32 len:14;
> + u32 gen:1; /* generation bit */
> + u32 rsvd:1;
> + u32 dtype:1; /* descriptor type */
> + u32 ext1:1;
> + u32 msscof:14; /* MSS, checksum offset, flags */
> +
> + u32 hlen:10; /* header len */
> + u32 om:2; /* offload mode */
> + u32 eop:1; /* End Of Packet */
> + u32 cq:1; /* completion request */
> + u32 ext2:1;
> + u32 ti:1; /* VLAN Tag Insertion */
> + u32 tci:16; /* Tag to Insert */
> +};
> +
> +/* TxDesc.OM values */
> +#define VMXNET3_OM_NONE 0
> +#define VMXNET3_OM_CSUM 2
> +#define VMXNET3_OM_TSO 3
> +
> +/* fields in TxDesc we access w/o using bit fields */
> +#define VMXNET3_TXD_EOP_SHIFT 12
> +#define VMXNET3_TXD_CQ_SHIFT 13
> +#define VMXNET3_TXD_GEN_SHIFT 14
> +
> +#define VMXNET3_TXD_CQ (1 << VMXNET3_TXD_CQ_SHIFT)
> +#define VMXNET3_TXD_EOP (1 << VMXNET3_TXD_EOP_SHIFT)
> +#define VMXNET3_TXD_GEN (1 << VMXNET3_TXD_GEN_SHIFT)
> +
> +#define VMXNET3_HDR_COPY_SIZE 128
> +
> +
> +struct Vmxnet3_TxDataDesc {
> + u8 data[VMXNET3_HDR_COPY_SIZE];
> +};
> +
> +
> +struct Vmxnet3_TxCompDesc {
> + u32 txdIdx:12; /* Index of the EOP TxDesc */
> + u32 ext1:20;
> +
> + u32 ext2;
> + u32 ext3;
> +
> + u32 rsvd:24;
> + u32 type:7; /* completion type */
> + u32 gen:1; /* generation bit */
> +};
> +
> +
> +struct Vmxnet3_RxDesc {
> + u64 addr;
> +
> + u32 len:14;
> + u32 btype:1; /* Buffer Type */
> + u32 dtype:1; /* Descriptor type */
> + u32 rsvd:15;
> + u32 gen:1; /* Generation bit */
> +
> + u32 ext1;
> +};
> +
> +/* values of RXD.BTYPE */
> +#define VMXNET3_RXD_BTYPE_HEAD 0 /* head only */
> +#define VMXNET3_RXD_BTYPE_BODY 1 /* body only */
> +
> +/* fields in RxDesc we access w/o using bit fields */
> +#define VMXNET3_RXD_BTYPE_SHIFT 14
> +#define VMXNET3_RXD_GEN_SHIFT 31
> +
> +
> +struct Vmxnet3_RxCompDesc {
> + u32 rxdIdx:12; /* Index of the RxDesc */
> + u32 ext1:2;
> + u32 eop:1; /* End of Packet */
> + u32 sop:1; /* Start of Packet */
> + u32 rqID:10; /* rx queue/ring ID */
> + u32 rssType:4; /* RSS hash type used */
> + u32 cnc:1; /* Checksum Not Calculated */
> + u32 ext2:1;
> +
> + u32 rssHash; /* RSS hash value */
> +
> + u32 len:14; /* data length */
> + u32 err:1; /* Error */
> + u32 ts:1; /* Tag is stripped */
> + u32 tci:16; /* Tag stripped */
> +
> + u32 csum:16;
> + u32 tuc:1; /* TCP/UDP Checksum Correct */
> + u32 udp:1; /* UDP packet */
> + u32 tcp:1; /* TCP packet */
> + u32 ipc:1; /* IP Checksum Correct */
> + u32 v6:1; /* IPv6 */
> + u32 v4:1; /* IPv4 */
> + u32 frg:1; /* IP Fragment */
> + u32 fcs:1; /* Frame CRC correct */
> + u32 type:7; /* completion type */
> + u32 gen:1; /* generation bit */
> +};
> +
> +/* fields in RxCompDesc we access via Vmxnet3_GenericDesc.dword[3] */
> +#define VMXNET3_RCD_TUC_SHIFT 16
> +#define VMXNET3_RCD_IPC_SHIFT 19
> +
> +/* fields in RxCompDesc we access via Vmxnet3_GenericDesc.qword[1] */
> +#define VMXNET3_RCD_TYPE_SHIFT 56
> +#define VMXNET3_RCD_GEN_SHIFT 63
> +
> +/* csum OK for TCP/UDP pkts over IP */
> +#define VMXNET3_RCD_CSUM_OK (1 << VMXNET3_RCD_TUC_SHIFT | \
> + 1 << VMXNET3_RCD_IPC_SHIFT)
> +
> +/* value of RxCompDesc.rssType */
> +enum {
> + VMXNET3_RCD_RSS_TYPE_NONE = 0,
> + VMXNET3_RCD_RSS_TYPE_IPV4 = 1,
> + VMXNET3_RCD_RSS_TYPE_TCPIPV4 = 2,
> + VMXNET3_RCD_RSS_TYPE_IPV6 = 3,
> + VMXNET3_RCD_RSS_TYPE_TCPIPV6 = 4,
> +};
> +
> +
> +/* a union for accessing all cmd/completion descriptors */
> +union Vmxnet3_GenericDesc {
> + u64 qword[2];
> + u32 dword[4];
> + u16 word[8];
> + struct Vmxnet3_TxDesc txd;
> + struct Vmxnet3_RxDesc rxd;
> + struct Vmxnet3_TxCompDesc tcd;
> + struct Vmxnet3_RxCompDesc rcd;
> +};
> +
> +#define VMXNET3_INIT_GEN 1
> +
> +/* Max size of a single tx buffer */
> +#define VMXNET3_MAX_TX_BUF_SIZE (1 << 14)
> +
> +/* # of tx desc needed for a tx buffer size */
> +#define VMXNET3_TXD_NEEDED(size) (((size) + VMXNET3_MAX_TX_BUF_SIZE - 1) / \
> + VMXNET3_MAX_TX_BUF_SIZE)
> +
> +/* max # of tx descs for a non-tso pkt */
> +#define VMXNET3_MAX_TXD_PER_PKT 16
> +
> +/* Max size of a single rx buffer */
> +#define VMXNET3_MAX_RX_BUF_SIZE ((1 << 14) - 1)
> +/* Minimum size of a type 0 buffer */
> +#define VMXNET3_MIN_T0_BUF_SIZE 128
> +#define VMXNET3_MAX_CSUM_OFFSET 1024
> +
> +/* Ring base address alignment */
> +#define VMXNET3_RING_BA_ALIGN 512
> +#define VMXNET3_RING_BA_MASK (VMXNET3_RING_BA_ALIGN - 1)
> +
> +/* Ring size must be a multiple of 32 */
> +#define VMXNET3_RING_SIZE_ALIGN 32
> +#define VMXNET3_RING_SIZE_MASK (VMXNET3_RING_SIZE_ALIGN - 1)
> +
> +/* Max ring size */
> +#define VMXNET3_TX_RING_MAX_SIZE 4096
> +#define VMXNET3_TC_RING_MAX_SIZE 4096
> +#define VMXNET3_RX_RING_MAX_SIZE 4096
> +#define VMXNET3_RC_RING_MAX_SIZE 8192
> +
> +/* a list of reasons for queue stop */
> +
> +enum {
> + VMXNET3_ERR_NOEOP = 0x80000000, /* cannot find the EOP desc of a pkt */
> + VMXNET3_ERR_TXD_REUSE = 0x80000001, /* reuse TxDesc before tx completion */
> + VMXNET3_ERR_BIG_PKT = 0x80000002, /* too many TxDesc for a pkt */
> + VMXNET3_ERR_DESC_NOT_SPT = 0x80000003, /* descriptor type not supported */
> + VMXNET3_ERR_SMALL_BUF = 0x80000004, /* type 0 buffer too small */
> + VMXNET3_ERR_STRESS = 0x80000005, /* stress option firing in vmkernel */
> + VMXNET3_ERR_SWITCH = 0x80000006, /* mode switch failure */
> + VMXNET3_ERR_TXD_INVALID = 0x80000007, /* invalid TxDesc */
> +};
> +
> +/* completion descriptor types */
> +#define VMXNET3_CDTYPE_TXCOMP 0 /* Tx Completion Descriptor */
> +#define VMXNET3_CDTYPE_RXCOMP 3 /* Rx Completion Descriptor */
> +
> +enum {
> + VMXNET3_GOS_BITS_UNK = 0, /* unknown */
> + VMXNET3_GOS_BITS_32 = 1,
> + VMXNET3_GOS_BITS_64 = 2,
> +};
> +
> +#define VMXNET3_GOS_TYPE_LINUX 1
> +
> +
> +struct Vmxnet3_GOSInfo {
> + u32 gosBits:2; /* 32-bit or 64-bit? */
> + u32 gosType:4; /* which guest */
> + u32 gosVer:16; /* gos version */
> + u32 gosMisc:10; /* other info about gos */
> +};
> +
> +
> +struct Vmxnet3_DriverInfo {
> + u32 version;
> + struct Vmxnet3_GOSInfo gos;
> + u32 vmxnet3RevSpt;
> + u32 uptVerSpt;
> +};
> +
> +
> +#define VMXNET3_REV1_MAGIC 0xbabefee1
> +
> +/*
> + * QueueDescPA must be 128 bytes aligned. It points to an array of
> + * Vmxnet3_TxQueueDesc followed by an array of Vmxnet3_RxQueueDesc.
> + * The number of Vmxnet3_TxQueueDesc/Vmxnet3_RxQueueDesc are specified by
> + * Vmxnet3_MiscConf.numTxQueues/numRxQueues, respectively.
> + */
> +#define VMXNET3_QUEUE_DESC_ALIGN 128
> +
> +
> +struct Vmxnet3_MiscConf {
> + struct Vmxnet3_DriverInfo driverInfo;
> + u64 uptFeatures;
> + u64 ddPA; /* driver data PA */
> + u64 queueDescPA; /* queue descriptor table PA */
> + u32 ddLen; /* driver data len */
> + u32 queueDescLen; /* queue desc. table len in bytes */
> + u32 mtu;
> + u16 maxNumRxSG;
> + u8 numTxQueues;
> + u8 numRxQueues;
> + u32 reserved[4];
> +};
> +
> +
> +struct Vmxnet3_TxQueueConf {
> + u64 txRingBasePA;
> + u64 dataRingBasePA;
> + u64 compRingBasePA;
> + u64 ddPA; /* driver data */
> + u64 reserved;
> + u32 txRingSize; /* # of tx desc */
> + u32 dataRingSize; /* # of data desc */
> + u32 compRingSize; /* # of comp desc */
> + u32 ddLen; /* size of driver data */
> + u8 intrIdx;
> + u8 _pad[7];
> +};
> +
> +
> +struct Vmxnet3_RxQueueConf {
> + u64 rxRingBasePA[2];
> + u64 compRingBasePA;
> + u64 ddPA; /* driver data */
> + u64 reserved;
> + u32 rxRingSize[2]; /* # of rx desc */
> + u32 compRingSize; /* # of rx comp desc */
> + u32 ddLen; /* size of driver data */
> + u8 intrIdx;
> + u8 _pad[7];
> +};
> +
> +
> +enum vmxnet3_intr_mask_mode {
> + VMXNET3_IMM_AUTO = 0,
> + VMXNET3_IMM_ACTIVE = 1,
> + VMXNET3_IMM_LAZY = 2
> +};
> +
> +enum vmxnet3_intr_type {
> + VMXNET3_IT_AUTO = 0,
> + VMXNET3_IT_INTX = 1,
> + VMXNET3_IT_MSI = 2,
> + VMXNET3_IT_MSIX = 3
> +};
> +
> +#define VMXNET3_MAX_TX_QUEUES 8
> +#define VMXNET3_MAX_RX_QUEUES 16
> +/* addition 1 for events */
> +#define VMXNET3_MAX_INTRS 25
> +
> +
> +struct Vmxnet3_IntrConf {
> + bool autoMask;
> + u8 numIntrs; /* # of interrupts */
> + u8 eventIntrIdx;
> + u8 modLevels[VMXNET3_MAX_INTRS]; /* moderation level for
> + * each intr */
> + u32 reserved[3];
> +};
> +
> +/* one bit per VLAN ID, the size is in the units of u32 */
> +#define VMXNET3_VFT_SIZE (4096 / (sizeof(u32) * 8))
> +
> +
> +struct Vmxnet3_QueueStatus {
> + bool stopped;
> + u8 _pad[3];
> + u32 error;
> +};
> +
> +
> +struct Vmxnet3_TxQueueCtrl {
> + u32 txNumDeferred;
> + u32 txThreshold;
> + u64 reserved;
> +};
> +
> +
> +struct Vmxnet3_RxQueueCtrl {
> + bool updateRxProd;
> + u8 _pad[7];
> + u64 reserved;
> +};
> +
> +enum {
> + VMXNET3_RXM_UCAST = 0x01, /* unicast only */
> + VMXNET3_RXM_MCAST = 0x02, /* multicast passing the filters */
> + VMXNET3_RXM_BCAST = 0x04, /* broadcast only */
> + VMXNET3_RXM_ALL_MULTI = 0x08, /* all multicast */
> + VMXNET3_RXM_PROMISC = 0x10 /* promiscuous */
> +};
> +
> +struct Vmxnet3_RxFilterConf {
> + u32 rxMode; /* VMXNET3_RXM_xxx */
> + u16 mfTableLen; /* size of the multicast filter table */
> + u16 _pad1;
> + u64 mfTablePA; /* PA of the multicast filters table */
> + u32 vfTable[VMXNET3_VFT_SIZE]; /* vlan filter */
> +};
> +
> +
> +#define VMXNET3_PM_MAX_FILTERS 6
> +#define VMXNET3_PM_MAX_PATTERN_SIZE 128
> +#define VMXNET3_PM_MAX_MASK_SIZE (VMXNET3_PM_MAX_PATTERN_SIZE / 8)
> +
> +#define VMXNET3_PM_WAKEUP_MAGIC 0x01 /* wake up on magic pkts */
> +#define VMXNET3_PM_WAKEUP_FILTER 0x02 /* wake up on pkts matching
> + * filters */
> +
> +
> +struct Vmxnet3_PM_PktFilter {
> + u8 maskSize;
> + u8 patternSize;
> + u8 mask[VMXNET3_PM_MAX_MASK_SIZE];
> + u8 pattern[VMXNET3_PM_MAX_PATTERN_SIZE];
> + u8 pad[6];
> +};
> +
> +
> +struct Vmxnet3_PMConf {
> + u16 wakeUpEvents; /* VMXNET3_PM_WAKEUP_xxx */
> + u8 numFilters;
> + u8 pad[5];
> + struct Vmxnet3_PM_PktFilter filters[VMXNET3_PM_MAX_FILTERS];
> +};
> +
> +
> +struct Vmxnet3_VariableLenConfDesc {
> + u32 confVer;
> + u32 confLen;
> + u64 confPA;
> +};
> +
> +
> +struct Vmxnet3_TxQueueDesc {
> + struct Vmxnet3_TxQueueCtrl ctrl;
> + struct Vmxnet3_TxQueueConf conf;
> +
> + /* Driver read after a GET command */
> + struct Vmxnet3_QueueStatus status;
> + struct UPT1_TxStats stats;
> + u8 _pad[88]; /* 128 aligned */
> +};
> +
> +
> +struct Vmxnet3_RxQueueDesc {
> + struct Vmxnet3_RxQueueCtrl ctrl;
> + struct Vmxnet3_RxQueueConf conf;
> + /* Driver read after a GET commad */
> + struct Vmxnet3_QueueStatus status;
> + struct UPT1_RxStats stats;
> + u8 __pad[88]; /* 128 aligned */
> +};
> +
> +
> +struct Vmxnet3_DSDevRead {
> + /* read-only region for device, read by dev in response to a SET cmd */
> + struct Vmxnet3_MiscConf misc;
> + struct Vmxnet3_IntrConf intrConf;
> + struct Vmxnet3_RxFilterConf rxFilterConf;
> + struct Vmxnet3_VariableLenConfDesc rssConfDesc;
> + struct Vmxnet3_VariableLenConfDesc pmConfDesc;
> + struct Vmxnet3_VariableLenConfDesc pluginConfDesc;
> +};
> +
> +/* All structures in DriverShared are padded to multiples of 8 bytes */
> +struct Vmxnet3_DriverShared {
> + u32 magic;
> + /* make devRead start at 64bit boundaries */
> + u32 pad;
> + struct Vmxnet3_DSDevRead devRead;
> + u32 ecr;
> + u32 reserved[5];
> +};
> +
> +
> +#define VMXNET3_ECR_RQERR (1 << 0)
> +#define VMXNET3_ECR_TQERR (1 << 1)
> +#define VMXNET3_ECR_LINK (1 << 2)
> +#define VMXNET3_ECR_DIC (1 << 3)
> +#define VMXNET3_ECR_DEBUG (1 << 4)
> +
> +/* flip the gen bit of a ring */
> +#define VMXNET3_FLIP_RING_GEN(gen) ((gen) = (gen) ^ 0x1)
> +
> +/* only use this if moving the idx won't affect the gen bit */
> +#define VMXNET3_INC_RING_IDX_ONLY(idx, ring_size) \
> + do {\
> + (idx)++;\
> + if (unlikely((idx) == (ring_size))) {\
> + (idx) = 0;\
> + } \
> + } while (0)
> +
> +#define VMXNET3_SET_VFTABLE_ENTRY(vfTable, vid) \
> + (vfTable[vid >> 5] |= (1 << (vid & 31)))
> +#define VMXNET3_CLEAR_VFTABLE_ENTRY(vfTable, vid) \
> + (vfTable[vid >> 5] &= ~(1 << (vid & 31)))
> +
> +#define VMXNET3_VFTABLE_ENTRY_IS_SET(vfTable, vid) \
> + ((vfTable[vid >> 5] & (1 << (vid & 31))) != 0)
> +
> +#define VMXNET3_MAX_MTU 9000
> +#define VMXNET3_MIN_MTU 60
> +
> +#define VMXNET3_LINK_UP (10000 << 16 | 1) /* 10 Gbps, up */
> +#define VMXNET3_LINK_DOWN 0
> +
> +#endif /* _VMXNET3_DEFS_H_ */
> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
> new file mode 100644
> index 0000000..a886f24
> --- /dev/null
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -0,0 +1,2553 @@
> +/*
> + * Linux driver for VMware's vmxnet3 ethernet NIC.
> + *
> + * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; version 2 of the License and no later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT. See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
> + *
> + * The full GNU General Public License is included in this distribution in
> + * the file called "COPYING".
> + *
> + * Maintained by: Shreyas Bhatewara <pv-drivers(a)vmware.com>
> + *
> + */
> +
> +#include "vmxnet3_int.h"
> +
> +char vmxnet3_driver_name[] = "vmxnet3";
> +#define VMXNET3_DRIVER_DESC "VMware vmxnet3 virtual NIC driver"
> +
> +
> +/*
> + * PCI Device ID Table
> + * Last entry must be all 0s
> + */
> +static const struct pci_device_id vmxnet3_pciid_table[] = {
> + {PCI_VDEVICE(VMWARE, PCI_DEVICE_ID_VMWARE_VMXNET3)},
> + {0}
> +};
> +
> +MODULE_DEVICE_TABLE(pci, vmxnet3_pciid_table);
> +
> +static atomic_t devices_found;
> +
> +
> +/*
> + * Enable/Disable the given intr
> + */
> +static void
> +vmxnet3_enable_intr(struct vmxnet3_adapter *adapter, unsigned intr_idx)
> +{
> + VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_IMR + intr_idx * 8, 0);
> +}
> +
> +
> +static void
> +vmxnet3_disable_intr(struct vmxnet3_adapter *adapter, unsigned intr_idx)
> +{
> + VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_IMR + intr_idx * 8, 1);
> +}
> +
> +
> +/*
> + * Enable/Disable all intrs used by the device
> + */
> +static void
> +vmxnet3_enable_all_intrs(struct vmxnet3_adapter *adapter)
> +{
> + int i;
> +
> + for (i = 0; i < adapter->intr.num_intrs; i++)
> + vmxnet3_enable_intr(adapter, i);
> +}
> +
> +
> +static void
> +vmxnet3_disable_all_intrs(struct vmxnet3_adapter *adapter)
> +{
> + int i;
> +
> + for (i = 0; i < adapter->intr.num_intrs; i++)
> + vmxnet3_disable_intr(adapter, i);
> +}
> +
> +
> +static void
> +vmxnet3_ack_events(struct vmxnet3_adapter *adapter, u32 events)
> +{
> + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_ECR, events);
> +}
> +
> +
> +static bool
> +vmxnet3_tq_stopped(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
> +{
> + return netif_queue_stopped(adapter->netdev);
> +}
> +
> +
> +static void
> +vmxnet3_tq_start(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
> +{
> + tq->stopped = false;
> + netif_start_queue(adapter->netdev);
> +}
> +
> +
> +static void
> +vmxnet3_tq_wake(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
> +{
> + tq->stopped = false;
> + netif_wake_queue(adapter->netdev);
> +}
> +
> +
> +static void
> +vmxnet3_tq_stop(struct vmxnet3_tx_queue *tq, struct vmxnet3_adapter *adapter)
> +{
> + tq->stopped = true;
> + tq->num_stop++;
> + netif_stop_queue(adapter->netdev);
> +}
> +
> +
> +/*
> + * Check the link state. This may start or stop the tx queue.
> + */
> +static void
> +vmxnet3_check_link(struct vmxnet3_adapter *adapter)
> +{
> + u32 ret;
> +
> + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD, VMXNET3_CMD_GET_LINK);
> + ret = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_CMD);
> + adapter->link_speed = ret >> 16;
> + if (ret & 1) { /* Link is up. */
> + printk(KERN_INFO "%s: NIC Link is Up %d Mbps\n",
> + adapter->netdev->name, adapter->link_speed);
> + if (!netif_carrier_ok(adapter->netdev))
> + netif_carrier_on(adapter->netdev);
> +
> + vmxnet3_tq_start(&adapter->tx_queue, adapter);
> + } else {
> + printk(KERN_INFO "%s: NIC Link is Down\n",
> + adapter->netdev->name);
> + if (netif_carrier_ok(adapter->netdev))
> + netif_carrier_off(adapter->netdev);
> +
> + vmxnet3_tq_stop(&adapter->tx_queue, adapter);
> + }
> +}
> +
> +
> +static void
> +vmxnet3_process_events(struct vmxnet3_adapter *adapter)
> +{
> + u32 events = adapter->shared->ecr;
> + if (!events)
> + return;
> +
> + vmxnet3_ack_events(adapter, events);
> +
> + /* Check if link state has changed */
> + if (events & VMXNET3_ECR_LINK)
> + vmxnet3_check_link(adapter);
> +
> + /* Check if there is an error on xmit/recv queues */
> + if (events & (VMXNET3_ECR_TQERR | VMXNET3_ECR_RQERR)) {
> + VMXNET3_WRITE_BAR1_REG(adapter, VMXNET3_REG_CMD,
> + VMXNET3_CMD_GET_QUEUE_STATUS);
> +
> + if (adapter->tqd_start->status.stopped) {
> + printk(KERN_ERR "%s: tq error 0x%x\n",
> + adapter->netdev->name,
> + adapter->tqd_start->status.error);
> + }
> + if (adapter->rqd_start->status.stopped) {
> + printk(KERN_ERR "%s: rq error 0x%x\n",
> + adapter->netdev->name,
> + adapter->rqd_start->status.error);
> + }
> +
> + schedule_work(&adapter->work);
> + }
> +}
> +
> +
> +static void
> +vmxnet3_unmap_tx_buf(struct vmxnet3_tx_buf_info *tbi,
> + struct pci_dev *pdev)
> +{
> + if (tbi->map_type == VMXNET3_MAP_SINGLE)
> + pci_unmap_single(pdev, tbi->dma_addr, tbi->len,
> + PCI_DMA_TODEVICE);
> + else if (tbi->map_type == VMXNET3_MAP_PAGE)
> + pci_unmap_page(pdev, tbi->dma_addr, tbi->len,
> + PCI_DMA_TODEVICE);
> + else
> + BUG_ON(tbi->map_type != VMXNET3_MAP_NONE);
> +
> + tbi->map_type = VMXNET3_MAP_NONE; /* to help debugging */
> +}
> +
> +
> +static int
> +vmxnet3_unmap_pkt(u32 eop_idx, struct vmxnet3_tx_queue *tq,
> + struct pci_dev *pdev, struct vmxnet3_adapter *adapter)
> +{
> + struct sk_buff *skb;
> + int entries = 0;
> +
> + /* no out of order completion */
> + BUG_ON(tq->buf_info[eop_idx].sop_idx != tq->tx_ring.next2comp);
> + BUG_ON(tq->tx_ring.base[eop_idx].txd.eop != 1);
> +
> + skb = tq->buf_info[eop_idx].skb;
> + BUG_ON(skb == NULL);
> + tq->buf_info[eop_idx].skb = NULL;
> +
> + VMXNET3_INC_RING_IDX_ONLY(eop_idx, tq->tx_ring.size);
> +
> + while (tq->tx_ring.next2comp != eop_idx) {
> + vmxnet3_unmap_tx_buf(tq->buf_info + tq->tx_ring.next2comp,
> + pdev);
> +
> + /* update next2comp w/o tx_lock. Since we are marking more,
> + * instead of less, tx ring entries avail, the worst case is
> + * that the tx routine incorrectly re-queues a pkt due to
> + * insufficient tx ring entries.
> + */
> + vmxnet3_cmd_ring_adv_next2comp(&tq->tx_ring);
> + entries++;
> + }
> +
> + dev_kfree_skb_any(skb);
> + return entries;
> +}
> +
> +
> +static int
> +vmxnet3_tq_tx_complete(struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter)
> +{
> + int completed = 0;
> + union Vmxnet3_GenericDesc *gdesc;
> +
> + gdesc = tq->comp_ring.base + tq->comp_ring.next2proc;
> + while (gdesc->tcd.gen == tq->comp_ring.gen) {
> + completed += vmxnet3_unmap_pkt(gdesc->tcd.txdIdx, tq,
> + adapter->pdev, adapter);
> +
> + vmxnet3_comp_ring_adv_next2proc(&tq->comp_ring);
> + gdesc = tq->comp_ring.base + tq->comp_ring.next2proc;
> + }
> +
> + if (completed) {
> + spin_lock(&tq->tx_lock);
> + if (unlikely(vmxnet3_tq_stopped(tq, adapter) &&
> + vmxnet3_cmd_ring_desc_avail(&tq->tx_ring) >
> + VMXNET3_WAKE_QUEUE_THRESHOLD(tq) &&
> + netif_carrier_ok(adapter->netdev))) {
> + vmxnet3_tq_wake(tq, adapter);
> + }
> + spin_unlock(&tq->tx_lock);
> + }
> + return completed;
> +}
> +
> +
> +static void
> +vmxnet3_tq_cleanup(struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter)
> +{
> + int i;
> +
> + while (tq->tx_ring.next2comp != tq->tx_ring.next2fill) {
> + struct vmxnet3_tx_buf_info *tbi;
> + union Vmxnet3_GenericDesc *gdesc;
> +
> + tbi = tq->buf_info + tq->tx_ring.next2comp;
> + gdesc = tq->tx_ring.base + tq->tx_ring.next2comp;
> +
> + vmxnet3_unmap_tx_buf(tbi, adapter->pdev);
> + if (tbi->skb) {
> + dev_kfree_skb_any(tbi->skb);
> + tbi->skb = NULL;
> + }
> + vmxnet3_cmd_ring_adv_next2comp(&tq->tx_ring);
> + }
> +
> + /* sanity check, verify all buffers are indeed unmapped and freed */
> + for (i = 0; i < tq->tx_ring.size; i++) {
> + BUG_ON(tq->buf_info[i].skb != NULL ||
> + tq->buf_info[i].map_type != VMXNET3_MAP_NONE);
> + }
> +
> + tq->tx_ring.gen = VMXNET3_INIT_GEN;
> + tq->tx_ring.next2fill = tq->tx_ring.next2comp = 0;
> +
> + tq->comp_ring.gen = VMXNET3_INIT_GEN;
> + tq->comp_ring.next2proc = 0;
> +}
> +
> +
> +void
> +vmxnet3_tq_destroy(struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter)
> +{
> + if (tq->tx_ring.base) {
> + pci_free_consistent(adapter->pdev, tq->tx_ring.size *
> + sizeof(struct Vmxnet3_TxDesc),
> + tq->tx_ring.base, tq->tx_ring.basePA);
> + tq->tx_ring.base = NULL;
> + }
> + if (tq->data_ring.base) {
> + pci_free_consistent(adapter->pdev, tq->data_ring.size *
> + sizeof(struct Vmxnet3_TxDataDesc),
> + tq->data_ring.base, tq->data_ring.basePA);
> + tq->data_ring.base = NULL;
> + }
> + if (tq->comp_ring.base) {
> + pci_free_consistent(adapter->pdev, tq->comp_ring.size *
> + sizeof(struct Vmxnet3_TxCompDesc),
> + tq->comp_ring.base, tq->comp_ring.basePA);
> + tq->comp_ring.base = NULL;
> + }
> + kfree(tq->buf_info);
> + tq->buf_info = NULL;
> +}
> +
> +
> +static void
> +vmxnet3_tq_init(struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter)
> +{
> + int i;
> +
> + /* reset the tx ring contents to 0 and reset the tx ring states */
> + memset(tq->tx_ring.base, 0, tq->tx_ring.size *
> + sizeof(struct Vmxnet3_TxDesc));
> + tq->tx_ring.next2fill = tq->tx_ring.next2comp = 0;
> + tq->tx_ring.gen = VMXNET3_INIT_GEN;
> +
> + memset(tq->data_ring.base, 0, tq->data_ring.size *
> + sizeof(struct Vmxnet3_TxDataDesc));
> +
> + /* reset the tx comp ring contents to 0 and reset comp ring states */
> + memset(tq->comp_ring.base, 0, tq->comp_ring.size *
> + sizeof(struct Vmxnet3_TxCompDesc));
> + tq->comp_ring.next2proc = 0;
> + tq->comp_ring.gen = VMXNET3_INIT_GEN;
> +
> + /* reset the bookkeeping data */
> + memset(tq->buf_info, 0, sizeof(tq->buf_info[0]) * tq->tx_ring.size);
> + for (i = 0; i < tq->tx_ring.size; i++)
> + tq->buf_info[i].map_type = VMXNET3_MAP_NONE;
> +
> + /* stats are not reset */
> +}
> +
> +
> +static int
> +vmxnet3_tq_create(struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter)
> +{
> + BUG_ON(tq->tx_ring.base || tq->data_ring.base ||
> + tq->comp_ring.base || tq->buf_info);
> +
> + tq->tx_ring.base = pci_alloc_consistent(adapter->pdev, tq->tx_ring.size
> + * sizeof(struct Vmxnet3_TxDesc),
> + &tq->tx_ring.basePA);
> + if (!tq->tx_ring.base) {
> + printk(KERN_ERR "%s: failed to allocate tx ring\n",
> + adapter->netdev->name);
> + goto err;
> + }
> +
> + tq->data_ring.base = pci_alloc_consistent(adapter->pdev,
> + tq->data_ring.size *
> + sizeof(struct Vmxnet3_TxDataDesc),
> + &tq->data_ring.basePA);
> + if (!tq->data_ring.base) {
> + printk(KERN_ERR "%s: failed to allocate data ring\n",
> + adapter->netdev->name);
> + goto err;
> + }
> +
> + tq->comp_ring.base = pci_alloc_consistent(adapter->pdev,
> + tq->comp_ring.size *
> + sizeof(struct Vmxnet3_TxCompDesc),
> + &tq->comp_ring.basePA);
> + if (!tq->comp_ring.base) {
> + printk(KERN_ERR "%s: failed to allocate tx comp ring\n",
> + adapter->netdev->name);
> + goto err;
> + }
> +
> + tq->buf_info = kcalloc(tq->tx_ring.size, sizeof(tq->buf_info[0]),
> + GFP_KERNEL);
> + if (!tq->buf_info) {
> + printk(KERN_ERR "%s: failed to allocate tx bufinfo\n",
> + adapter->netdev->name);
> + goto err;
> + }
> +
> + return 0;
> +
> +err:
> + vmxnet3_tq_destroy(tq, adapter);
> + return -ENOMEM;
> +}
> +
> +
> +/*
> + * starting from ring->next2fill, allocate rx buffers for the given ring
> + * of the rx queue and update the rx desc. stop after @num_to_alloc buffers
> + * are allocated or allocation fails
> + */
> +
> +static int
> +vmxnet3_rq_alloc_rx_buf(struct vmxnet3_rx_queue *rq, u32 ring_idx,
> + int num_to_alloc, struct vmxnet3_adapter *adapter)
> +{
> + int num_allocated = 0;
> + struct vmxnet3_rx_buf_info *rbi_base = rq->buf_info[ring_idx];
> + struct vmxnet3_cmd_ring *ring = &rq->rx_ring[ring_idx];
> + u32 val;
> +
> + while (num_allocated < num_to_alloc) {
> + struct vmxnet3_rx_buf_info *rbi;
> + union Vmxnet3_GenericDesc *gd;
> +
> + rbi = rbi_base + ring->next2fill;
> + gd = ring->base + ring->next2fill;
> +
> + if (rbi->buf_type == VMXNET3_RX_BUF_SKB) {
> + if (rbi->skb == NULL) {
> + rbi->skb = dev_alloc_skb(rbi->len +
> + NET_IP_ALIGN);
> + if (unlikely(rbi->skb == NULL)) {
> + rq->stats.rx_buf_alloc_failure++;
> + break;
> + }
> + rbi->skb->dev = adapter->netdev;
> +
> + skb_reserve(rbi->skb, NET_IP_ALIGN);
> + rbi->dma_addr = pci_map_single(adapter->pdev,
> + rbi->skb->data, rbi->len,
> + PCI_DMA_FROMDEVICE);
> + } else {
> + /* rx buffer skipped by the device */
> + }
> + val = VMXNET3_RXD_BTYPE_HEAD << VMXNET3_RXD_BTYPE_SHIFT;
> + } else {
> + BUG_ON(rbi->buf_type != VMXNET3_RX_BUF_PAGE ||
> + rbi->len != PAGE_SIZE);
> +
> + if (rbi->page == NULL) {
> + rbi->page = alloc_page(GFP_ATOMIC);
> + if (unlikely(rbi->page == NULL)) {
> + rq->stats.rx_buf_alloc_failure++;
> + break;
> + }
> + rbi->dma_addr = pci_map_page(adapter->pdev,
> + rbi->page, 0, PAGE_SIZE,
> + PCI_DMA_FROMDEVICE);
> + } else {
> + /* rx buffers skipped by the device */
> + }
> + val = VMXNET3_RXD_BTYPE_BODY << VMXNET3_RXD_BTYPE_SHIFT;
> + }
> +
> + BUG_ON(rbi->dma_addr == 0);
> + gd->rxd.addr = rbi->dma_addr;
> + gd->dword[2] = (ring->gen << VMXNET3_RXD_GEN_SHIFT) | val |
> + rbi->len;
> +
> + num_allocated++;
> + vmxnet3_cmd_ring_adv_next2fill(ring);
> + }
> + rq->uncommitted[ring_idx] += num_allocated;
> +
> + dprintk(KERN_ERR "alloc_rx_buf: %d allocated, next2fill %u, next2comp "
> + "%u, uncommited %u\n", num_allocated, ring->next2fill,
> + ring->next2comp, rq->uncommitted[ring_idx]);
> +
> + /* so that the device can distinguish a full ring and an empty ring */
> + BUG_ON(num_allocated != 0 && ring->next2fill == ring->next2comp);
> +
> + return num_allocated;
> +}
> +
> +
> +static void
> +vmxnet3_append_frag(struct sk_buff *skb, struct Vmxnet3_RxCompDesc *rcd,
> + struct vmxnet3_rx_buf_info *rbi)
> +{
> + struct skb_frag_struct *frag = skb_shinfo(skb)->frags +
> + skb_shinfo(skb)->nr_frags;
> +
> + BUG_ON(skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS);
> +
> + frag->page = rbi->page;
> + frag->page_offset = 0;
> + frag->size = rcd->len;
> + skb->data_len += frag->size;
> + skb_shinfo(skb)->nr_frags++;
> +}
> +
> +
> +static void
> +vmxnet3_map_pkt(struct sk_buff *skb, struct vmxnet3_tx_ctx *ctx,
> + struct vmxnet3_tx_queue *tq, struct pci_dev *pdev,
> + struct vmxnet3_adapter *adapter)
> +{
> + u32 dw2, len;
> + unsigned long buf_offset;
> + int i;
> + union Vmxnet3_GenericDesc *gdesc;
> + struct vmxnet3_tx_buf_info *tbi = NULL;
> +
> + BUG_ON(ctx->copy_size > skb_headlen(skb));
> +
> + /* use the previous gen bit for the SOP desc */
> + dw2 = (tq->tx_ring.gen ^ 0x1) << VMXNET3_TXD_GEN_SHIFT;
> +
> + ctx->sop_txd = tq->tx_ring.base + tq->tx_ring.next2fill;
> + gdesc = ctx->sop_txd; /* both loops below can be skipped */
> +
> + /* no need to map the buffer if headers are copied */
> + if (ctx->copy_size) {
> + ctx->sop_txd->txd.addr = tq->data_ring.basePA +
> + tq->tx_ring.next2fill *
> + sizeof(struct Vmxnet3_TxDataDesc);
> + ctx->sop_txd->dword[2] = dw2 | ctx->copy_size;
> + ctx->sop_txd->dword[3] = 0;
> +
> + tbi = tq->buf_info + tq->tx_ring.next2fill;
> + tbi->map_type = VMXNET3_MAP_NONE;
> +
> + dprintk(KERN_ERR "txd[%u]: 0x%Lx 0x%x 0x%x\n",
> + tq->tx_ring.next2fill, ctx->sop_txd->txd.addr,
> + ctx->sop_txd->dword[2], ctx->sop_txd->dword[3]);
> + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring);
> +
> + /* use the right gen for non-SOP desc */
> + dw2 = tq->tx_ring.gen << VMXNET3_TXD_GEN_SHIFT;
> + }
> +
> + /* linear part can use multiple tx desc if it's big */
> + len = skb_headlen(skb) - ctx->copy_size;
> + buf_offset = ctx->copy_size;
> + while (len) {
> + u32 buf_size;
> +
> + buf_size = len > VMXNET3_MAX_TX_BUF_SIZE ?
> + VMXNET3_MAX_TX_BUF_SIZE : len;
> +
> + tbi = tq->buf_info + tq->tx_ring.next2fill;
> + tbi->map_type = VMXNET3_MAP_SINGLE;
> + tbi->dma_addr = pci_map_single(adapter->pdev,
> + skb->data + buf_offset, buf_size,
> + PCI_DMA_TODEVICE);
> +
> + tbi->len = buf_size; /* this automatically convert 2^14 to 0 */
> +
> + gdesc = tq->tx_ring.base + tq->tx_ring.next2fill;
> + BUG_ON(gdesc->txd.gen == tq->tx_ring.gen);
> +
> + gdesc->txd.addr = tbi->dma_addr;
> + gdesc->dword[2] = dw2 | buf_size;
> + gdesc->dword[3] = 0;
> +
> + dprintk(KERN_ERR "txd[%u]: 0x%Lx 0x%x 0x%x\n",
> + tq->tx_ring.next2fill, gdesc->txd.addr,
> + gdesc->dword[2], gdesc->dword[3]);
> + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring);
> + dw2 = tq->tx_ring.gen << VMXNET3_TXD_GEN_SHIFT;
> +
> + len -= buf_size;
> + buf_offset += buf_size;
> + }
> +
> + for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> + struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[i];
> +
> + tbi = tq->buf_info + tq->tx_ring.next2fill;
> + tbi->map_type = VMXNET3_MAP_PAGE;
> + tbi->dma_addr = pci_map_page(adapter->pdev, frag->page,
> + frag->page_offset, frag->size,
> + PCI_DMA_TODEVICE);
> +
> + tbi->len = frag->size;
> +
> + gdesc = tq->tx_ring.base + tq->tx_ring.next2fill;
> + BUG_ON(gdesc->txd.gen == tq->tx_ring.gen);
> +
> + gdesc->txd.addr = tbi->dma_addr;
> + gdesc->dword[2] = dw2 | frag->size;
> + gdesc->dword[3] = 0;
> +
> + dprintk(KERN_ERR "txd[%u]: 0x%llu %u %u\n",
> + tq->tx_ring.next2fill, gdesc->txd.addr,
> + gdesc->dword[2], gdesc->dword[3]);
> + vmxnet3_cmd_ring_adv_next2fill(&tq->tx_ring);
> + dw2 = tq->tx_ring.gen << VMXNET3_TXD_GEN_SHIFT;
> + }
> +
> + ctx->eop_txd = gdesc;
> +
> + /* set the last buf_info for the pkt */
> + tbi->skb = skb;
> + tbi->sop_idx = ctx->sop_txd - tq->tx_ring.base;
> +}
> +
> +
> +/*
> + * parse and copy relevant protocol headers:
> + * For a tso pkt, relevant headers are L2/3/4 including options
> + * For a pkt requesting csum offloading, they are L2/3 and may include L4
> + * if it's a TCP/UDP pkt
> + *
> + * Returns:
> + * -1: error happens during parsing
> + * 0: protocol headers parsed, but too big to be copied
> + * 1: protocol headers parsed and copied
> + *
> + * Other effects:
> + * 1. related *ctx fields are updated.
> + * 2. ctx->copy_size is # of bytes copied
> + * 3. the portion copied is guaranteed to be in the linear part
> + *
> + */
> +static int
> +vmxnet3_parse_and_copy_hdr(struct sk_buff *skb, struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_tx_ctx *ctx,
> + struct vmxnet3_adapter *adapter)
> +{
> + struct Vmxnet3_TxDataDesc *tdd;
> +
> + if (ctx->mss) {
> + ctx->eth_ip_hdr_size = skb_transport_offset(skb);
> + ctx->l4_hdr_size = ((struct tcphdr *)
> + skb_transport_header(skb))->doff * 4;
> + ctx->copy_size = ctx->eth_ip_hdr_size + ctx->l4_hdr_size;
> + } else {
> + unsigned int pull_size;
> +
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + ctx->eth_ip_hdr_size = skb_transport_offset(skb);
> +
> + if (ctx->ipv4) {
> + struct iphdr *iph = (struct iphdr *)
> + skb_network_header(skb);
> + if (iph->protocol == IPPROTO_TCP) {
> + pull_size = ctx->eth_ip_hdr_size +
> + sizeof(struct tcphdr);
> +
> + if (unlikely(!pskb_may_pull(skb,
> + pull_size))) {
> + goto err;
> + }
> + ctx->l4_hdr_size = ((struct tcphdr *)
> + skb_transport_header(skb))->doff * 4;
> + } else if (iph->protocol == IPPROTO_UDP) {
> + ctx->l4_hdr_size =
> + sizeof(struct udphdr);
> + } else {
> + ctx->l4_hdr_size = 0;
> + }
> + } else {
> + /* for simplicity, don't copy L4 headers */
> + ctx->l4_hdr_size = 0;
> + }
> + ctx->copy_size = ctx->eth_ip_hdr_size +
> + ctx->l4_hdr_size;
> + } else {
> + ctx->eth_ip_hdr_size = 0;
> + ctx->l4_hdr_size = 0;
> + /* copy as much as allowed */
> + ctx->copy_size = min((unsigned int)VMXNET3_HDR_COPY_SIZE
> + , skb_headlen(skb));
> + }
> +
> + /* make sure headers are accessible directly */
> + if (unlikely(!pskb_may_pull(skb, ctx->copy_size)))
> + goto err;
> + }
> +
> + if (unlikely(ctx->copy_size > VMXNET3_HDR_COPY_SIZE)) {
> + tq->stats.oversized_hdr++;
> + ctx->copy_size = 0;
> + return 0;
> + }
> +
> + tdd = tq->data_ring.base + tq->tx_ring.next2fill;
> +
> + memcpy(tdd->data, skb->data, ctx->copy_size);
> + dprintk(KERN_ERR "copy %u bytes to dataRing[%u]\n",
> + ctx->copy_size, tq->tx_ring.next2fill);
> + return 1;
> +
> +err:
> + return -1;
> +}
> +
> +
> +static void
> +vmxnet3_prepare_tso(struct sk_buff *skb,
> + struct vmxnet3_tx_ctx *ctx)
> +{
> + struct tcphdr *tcph = (struct tcphdr *)skb_transport_header(skb);
> + if (ctx->ipv4) {
> + struct iphdr *iph = (struct iphdr *)skb_network_header(skb);
> + iph->check = 0;
> + tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr, 0,
> + IPPROTO_TCP, 0);
> + } else {
> + struct ipv6hdr *iph = (struct ipv6hdr *)skb_network_header(skb);
> + tcph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, 0,
> + IPPROTO_TCP, 0);
> + }
> +}
> +
> +
> +/*
> + * Transmits a pkt thru a given tq
> + * Returns:
> + * NETDEV_TX_OK: descriptors are setup successfully
> + * NETDEV_TX_OK: error occured, the pkt is dropped
> + * NETDEV_TX_BUSY: tx ring is full, queue is stopped
> + *
> + * Side-effects:
> + * 1. tx ring may be changed
> + * 2. tq stats may be updated accordingly
> + * 3. shared->txNumDeferred may be updated
> + */
> +
> +static int
> +vmxnet3_tq_xmit(struct sk_buff *skb, struct vmxnet3_tx_queue *tq,
> + struct vmxnet3_adapter *adapter, struct net_device *netdev)
> +{
> + int ret;
> + u32 count;
> + unsigned long flags;
> + struct vmxnet3_tx_ctx ctx;
> + union Vmxnet3_GenericDesc *gdesc;
> +
> + /* conservatively estimate # of descriptors to use */
> + count = VMXNET3_TXD_NEEDED(skb_headlen(skb)) +
> + skb_shinfo(skb)->nr_frags + 1;
> +
> + ctx.ipv4 = (skb->protocol == __constant_ntohs(ETH_P_IP));
> +
> + ctx.mss = skb_shinfo(skb)->gso_size;
> + if (ctx.mss) {
> + if (skb_header_cloned(skb)) {
> + if (unlikely(pskb_expand_head(skb, 0, 0,
> + GFP_ATOMIC) != 0)) {
> + tq->stats.drop_tso++;
> + goto drop_pkt;
> + }
> + tq->stats.copy_skb_header++;
> + }
> + vmxnet3_prepare_tso(skb, &ctx);
> + } else {
> + if (unlikely(count > VMXNET3_MAX_TXD_PER_PKT)) {
> +
> + /* non-tso pkts must not use more than
> + * VMXNET3_MAX_TXD_PER_PKT entries
> + */
> + if (skb_linearize(skb) != 0) {
> + tq->stats.drop_too_many_frags++;
> + goto drop_pkt;
> + }
> + tq->stats.linearized++;
> +
> + /* recalculate the # of descriptors to use */
> + count = VMXNET3_TXD_NEEDED(skb_headlen(skb)) + 1;
> + }
> + }
> +
> + ret = vmxnet3_parse_and_copy_hdr(skb, tq, &ctx, adapter);
> + if (ret >= 0) {
> + BUG_ON(ret <= 0 && ctx.copy_size != 0);
> + /* hdrs parsed, check against other limits */
> + if (ctx.mss) {
> + if (unlikely(ctx.eth_ip_hdr_size + ctx.l4_hdr_size >
> + VMXNET3_MAX_TX_BUF_SIZE)) {
> + goto hdr_too_big;
> + }
> + } else {
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + if (unlikely(ctx.eth_ip_hdr_size +
> + skb->csum_offset >
> + VMXNET3_MAX_CSUM_OFFSET)) {
> + goto hdr_too_big;
> + }
> + }
> + }
> + } else {
> + tq->stats.drop_hdr_inspect_err++;
> + goto drop_pkt;
> + }
> +
> + spin_lock_irqsave(&tq->tx_lock, flags);
> +
> + if (count > vmxnet3_cmd_ring_desc_avail(&tq->tx_ring)) {
> + tq->stats.tx_ring_full++;
> + dprintk(KERN_ERR "tx queue stopped on %s, next2comp %u"
> + " next2fill %u\n", adapter->netdev->name,
> + tq->tx_ring.next2comp, tq->tx_ring.next2fill);
> +
> + vmxnet3_tq_stop(tq, adapter);
> + spin_unlock_irqrestore(&tq->tx_lock, flags);
> + return NETDEV_TX_BUSY;
> + }
> +
> + /* fill tx descs related to addr & len */
> + vmxnet3_map_pkt(skb, &ctx, tq, adapter->pdev, adapter);
> +
> + /* setup the EOP desc */
> + ctx.eop_txd->dword[3] = VMXNET3_TXD_CQ | VMXNET3_TXD_EOP;
> +
> + /* setup the SOP desc */
> + gdesc = ctx.sop_txd;
> + if (ctx.mss) {
> + gdesc->txd.hlen = ctx.eth_ip_hdr_size + ctx.l4_hdr_size;
> + gdesc->txd.om = VMXNET3_OM_TSO;
> + gdesc->txd.msscof = ctx.mss;
> + tq->shared->txNumDeferred += (skb->len - gdesc->txd.hlen +
> + ctx.mss - 1) / ctx.mss;
> + } else {
> + if (skb->ip_summed == CHECKSUM_PARTIAL) {
> + gdesc->txd.hlen = ctx.eth_ip_hdr_size;
> + gdesc->txd.om = VMXNET3_OM_CSUM;
> + gdesc->txd.msscof = ctx.eth_ip_hdr_size +
> + skb->csum_offset;
> + } else {
> + gdesc->txd.om = 0;
> + gdesc->txd.msscof = 0;
> + }
> + tq->shared->txNumDeferred++;
> + }
> +
> + if (vlan_tx_tag_present(skb)) {
> + gdesc->txd.ti = 1;
> + gdesc->txd.tci = vlan_tx_tag_get(skb);
> + }
> +
> + wmb();
> +
> + /* finally flips the GEN bit of the SOP desc */
> + gdesc->dword[2] ^= VMXNET3_TXD_GEN;
> + dprintk(KERN_ERR "txd[%u]: SOP 0x%Lx 0x%x 0x%x\n",
> + (u32)((union Vmxnet3_GenericDesc *)ctx.sop_txd -
> + tq->tx_ring.base), gdesc->txd.addr, gdesc->dword[2],
> + gdesc->dword[3]);
> +
> + spin_unlock_irqrestore(&tq->tx_lock, flags);
> +
> + if (tq->shared->txNumDeferred >= tq->shared->txThreshold) {
> + tq->shared->txNumDeferred = 0;
> + VMXNET3_WRITE_BAR0_REG(adapter, VMXNET3_REG_TXPROD,
> + tq->tx_ring.next2fill);
> + }
> + netdev->trans_start = jiffies;
> +
> + return NETDEV_TX_OK;
> +
> +hdr_too_big:
> + tq->stats.drop_oversized_hdr++;
> +drop_pkt:
> + tq->stats.drop_total++;
> + dev_kfree_skb(skb);
> + return NETDEV_TX_OK;
> +}
> +
> +
> +static netdev_tx_t
> +vmxnet3_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
> +{
> + struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> + struct vmxnet3_tx_queue *tq = &adapter->tx_queue;
> +
> + return vmxnet3_tq_xmit(skb, tq, adapter, netdev);
> +}
> +
> +
> +static void
> +vmxnet3_rx_csum(struct vmxnet3_adapter *adapter,
> + struct sk_buff *skb,
> + union Vmxnet3_GenericDesc *gdesc)
> +{
> + if (!gdesc->rcd.cnc && adapter->rxcsum) {
> + /* typical case: TCP/UDP over IP and both csums are correct */
> + if ((gdesc->dword[3] & VMXNET3_RCD_CSUM_OK) ==
> + VMXNET3_RCD_CSUM_OK) {
> + skb->ip_summed = CHECKSUM_UNNECESSARY;
> + BUG_ON(!(gdesc->rcd.tcp || gdesc->rcd.udp));
> + BUG_ON(!(gdesc->rcd.v4 || gdesc->rcd.v6));
> + BUG_ON(gdesc->rcd.frg);
> + } else {
> + if (gdesc->rcd.csum) {
> + skb->csum = htons(gdesc->rcd.csum);
> + skb->ip_summed = CHECKSUM_PARTIAL;
> + } else {
> + skb->ip_summed = CHECKSUM_NONE;
> + }
> + }
> + } else {
> + skb->ip_summed = CHECKSUM_NONE;
> + }
> +}
> +
> +
> +static void
> +vmxnet3_rx_error(struct vmxnet3_rx_queue *rq, struct Vmxnet3_RxCompDesc *rcd,
> + struct vmxnet3_rx_ctx *ctx, struct vmxnet3_adapter *adapter)
> +{
> + rq->stats.drop_err++;
> + if (!rcd->fcs)
> + rq->stats.drop_fcs++;
> +
> + rq->stats.drop_total++;
> +
> + /*
> + * We do not unmap and chain the rx buffer to the skb.
> + * We basically pretend this buffer is not used and will be recycled
> + * by vmxnet3_rq_alloc_rx_buf()
> + */
> +
> + /*
> + * ctx->skb may be NULL if this is the first and the only one
> + * desc for the pkt
> + */
> + if (ctx->skb)
> + dev_kfree_skb_irq(ctx->skb);
> +
> + ctx->skb = NULL;
> +}
> +
> +
> +static int
> +vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
> + struct vmxnet3_adapter *adapter, int quota)
> +{
> + static u32 rxprod_reg[2] = {VMXNET3_REG_RXPROD, VMXNET3_REG_RXPROD2};
> + u32 num_rxd = 0;
> + struct Vmxnet3_RxCompDesc *rcd;
> + struct vmxnet3_rx_ctx *ctx = &rq->rx_ctx;
> +
> + rcd = &rq->comp_ring.base[rq->comp_ring.next2proc].rcd;
> + while (rcd->gen == rq->comp_ring.gen) {
> + struct vmxnet3_rx_buf_info *rbi;
> + struct sk_buff *skb;
> + int num_to_alloc;
> + struct Vmxnet3_RxDesc *rxd;
> + u32 idx, ring_idx;
> +
> + if (num_rxd >= quota) {
> + /* we may stop even before we see the EOP desc of
> + * the current pkt
> + */
> + break;
> + }
> + num_rxd++;
> +
> + idx = rcd->rxdIdx;
> + ring_idx = rcd->rqID == rq->qid ? 0 : 1;
> +
> + rxd = &rq->rx_ring[ring_idx].base[idx].rxd;
> + rbi = rq->buf_info[ring_idx] + idx;
> +
> + BUG_ON(rxd->addr != rbi->dma_addr || rxd->len != rbi->len);
> +
> + if (unlikely(rcd->eop && rcd->err)) {
> + vmxnet3_rx_error(rq, rcd, ctx, adapter);
> + goto rcd_done;
> + }
> +
> + if (rcd->sop) { /* first buf of the pkt */
> + BUG_ON(rxd->btype != VMXNET3_RXD_BTYPE_HEAD ||
> + rcd->rqID != rq->qid);
> +
> + BUG_ON(rbi->buf_type != VMXNET3_RX_BUF_SKB);
> + BUG_ON(ctx->skb != NULL || rbi->skb == NULL);
> +
> + if (unlikely(rcd->len == 0)) {
> + /* Pretend the rx buffer is skipped. */
> + BUG_ON(!(rcd->sop && rcd->eop));
> + dprintk(KERN_ERR "rxRing[%u][%u] 0 length\n",
> + ring_idx, idx);
> + goto rcd_done;
> + }
> +
> + ctx->skb = rbi->skb;
> + rbi->skb = NULL;
> +
> + pci_unmap_single(adapter->pdev, rbi->dma_addr, rbi->len,
> + PCI_DMA_FROMDEVICE);
> +
> + skb_put(ctx->skb, rcd->len);
> + } else {
> + BUG_ON(ctx->skb == NULL);
> + /* non SOP buffer must be type 1 in most cases */
> + if (rbi->buf_type == VMXNET3_RX_BUF_PAGE) {
> + BUG_ON(rxd->btype != VMXNET3_RXD_BTYPE_BODY);
> +
> + if (rcd->len) {
> + pci_unmap_page(adapter->pdev,
> + rbi->dma_addr, rbi->len,
> + PCI_DMA_FROMDEVICE);
> +
> + vmxnet3_append_frag(ctx->skb, rcd, rbi);
> + rbi->page = NULL;
> + }
> + } else {
> + /*
> + * The only time a non-SOP buffer is type 0 is
> + * when it's EOP and error flag is raised, which
> + * has already been handled.
> + */
> + BUG_ON(true);
> + }
> + }
> +
> + skb = ctx->skb;
> + if (rcd->eop) {
> + skb->len += skb->data_len;
> + skb->truesize += skb->data_len;
> +
> + vmxnet3_rx_csum(adapter, skb,
> + (union Vmxnet3_GenericDesc *)rcd);
> + skb->protocol = eth_type_trans(skb, adapter->netdev);
> +
> + if (unlikely(adapter->vlan_grp && rcd->ts)) {
> + vlan_hwaccel_receive_skb(skb,
> + adapter->vlan_grp, rcd->tci);
> + } else {
> + netif_receive_skb(skb);
> + }
> +
> + adapter->netdev->last_rx = jiffies;
> + ctx->skb = NULL;
> + }
> +
> +rcd_done:
> + /* device may skip some rx descs */
> + rq->rx_ring[ring_idx].next2comp = idx;
> + VMXNET3_INC_RING_IDX_ONLY(rq->rx_ring[ring_idx].next2comp,
> + rq->rx_ring[ring_idx].size);
> +
> + /* refill rx buffers frequently to avoid starving the h/w */
> + num_to_alloc = vmxnet3_cmd_ring_desc_avail(rq->rx_ring +
> + ring_idx);
> + if (unlikely(num_to_alloc > VMXNET3_RX_ALLOC_THRESHOLD(rq,
> + ring_idx, adapter))) {
> + vmxnet3_rq_alloc_rx_buf(rq, ring_idx, num_to_alloc,
> + adapter);
> +
> + /* if needed, update the register */
> + if (unlikely(rq->shared->updateRxProd)) {
> + VMXNET3_WRITE_BAR0_REG(adapter,
> + rxprod_reg[ring_idx] + rq->qid * 8,
> + rq->rx_ring[ring_idx].next2fill);
> + rq->uncommitted[ring_idx] = 0;
> + }
> + }
> +
> + vmxnet3_comp_ring_adv_next2proc(&rq->comp_ring);
> + rcd = &rq->comp_ring.base[rq->comp_ring.next2proc].rcd;
> + }
> +
> + return num_rxd;
> +}
> +
> +
> +static void
> +vmxnet3_rq_cleanup(struct vmxnet3_rx_queue *rq,
> + struct vmxnet3_adapter *adapter)
> +{
> + u32 i, ring_idx;
> + struct Vmxnet3_RxDesc *rxd;
> +
> + for (ring_idx = 0; ring_idx < 2; ring_idx++) {
> + for (i = 0; i < rq->rx_ring[ring_idx].size; i++) {
> + rxd = &rq->rx_ring[ring_idx].base[i].rxd;
> +
> + if (rxd->btype == VMXNET3_RXD_BTYPE_HEAD &&
> + rq->buf_info[ring_idx][i].skb) {
> + pci_unmap_single(adapter->pdev, rxd->addr,
> + rxd->len, PCI_DMA_FROMDEVICE);
> + dev_kfree_skb(rq->buf_info[ring_idx][i].skb);
> + rq->buf_info[ring_idx][i].skb = NULL;
> + } else if (rxd->btype == VMXNET3_RXD_BTYPE_BODY &&
> + rq->buf_info[ring_idx][i].page) {
> + pci_unmap_page(adapter->pdev, rxd->addr,
> + rxd->len, PCI_DMA_FROMDEVICE);
> + put_page(rq->buf_info[ring_idx][i].page);
> + rq->buf_info[ring_idx][i].page = NULL;
> + }
> + }
> +
> + rq->rx_ring[ring_idx].gen = VMXNET3_INIT_GEN;
> + rq->rx_ring[ring_idx].next2fill =
> + rq->rx_ring[ring_idx].next2comp = 0;
> + rq->uncommitted[ring_idx] = 0;
> + }
> +
> + rq->comp_ring.gen = VMXNET3_INIT_GEN;
> + rq->comp_ring.next2proc = 0;
> +}
> +
> +
> +void vmxnet3_rq_destroy(struct vmxnet3_rx_queue *rq,
> + struct vmxnet3_adapter *adapter)
> +{
> + int i;
> + int j;
> +
> + /* all rx buffers must have already been freed */
> + for (i = 0; i < 2; i++) {
> + if (rq->buf_info[i]) {
> + for (j = 0; j < rq->rx_ring[i].size; j++)
> + BUG_ON(rq->buf_info[i][j].page != NULL);
> + }
> + }
> +
> +
> + kfree(rq->buf_info[0]);
> +
> + for (i = 0; i < 2; i++) {
> + if (rq->rx_ring[i].base) {
> + pci_free_consistent(adapter->pdev, rq->rx_ring[i].size
> + * sizeof(struct Vmxnet3_RxDesc),
> + rq->rx_ring[i].base,
> + rq->rx_ring[i].basePA);
> + rq->rx_ring[i].base = NULL;
> + }
> + rq->buf_info[i] = NULL;
> + }
> +
> + if (rq->comp_ring.base) {
> + pci_free_consistent(adapter->pdev, rq->comp_ring.size *
> + sizeof(struct Vmxnet3_RxCompDesc),
> + rq->comp_ring.base, rq->comp_ring.basePA);
> + rq->comp_ring.base = NULL;
> + }
> +}
> +
> +
> +static int
> +vmxnet3_rq_init(struct vmxnet3_rx_queue *rq,
> + struct vmxnet3_adapter *adapter)
> +{
> + int i;
> +
> + /* initialize buf_info */
> + for (i = 0; i < rq->rx_ring[0].size; i++) {
> +
> + /* 1st buf for a pkt is skbuff */
> + if (i % adapter->rx_buf_per_pkt == 0) {
> + rq->buf_info[0][i].buf_type = VMXNET3_RX_BUF_SKB;
> + rq->buf_info[0][i].len = adapter->skb_buf_size;
> + } else { /* subsequent bufs for a pkt is frag */
> + rq->buf_info[0][i].buf_type = VMXNET3_RX_BUF_PAGE;
> + rq->buf_info[0][i].len = PAGE_SIZE;
> + }
> + }
> + for (i = 0; i < rq->rx_ring[1].size; i++) {
> + rq->buf_info[1][i].buf_type = VMXNET3_RX_BUF_PAGE;
> + rq->buf_info[1][i].len = PAGE_SIZE;
> + }
> +
> + /* reset internal state and allocate buffers for both rings */
> + for (i = 0; i < 2; i++) {
> + rq->rx_ring[i].next2fill = rq->rx_ring[i].next2comp = 0;
> + rq->uncommitted[i] = 0;
> +
> + memset(rq->rx_ring[i].base, 0, rq->rx_ring[i].size *
> + sizeof(struct Vmxnet3_RxDesc));
> + rq->rx_ring[i].gen = VMXNET3_INIT_GEN;
> + }
> + if (vmxnet3_rq_alloc_rx_buf(rq, 0, rq->rx_ring[0].size - 1,
> + adapter) == 0) {
> + /* at least has 1 rx buffer for the 1st ring */
> + return -ENOMEM;
> + }
> + vmxnet3_rq_alloc_rx_buf(rq, 1, rq->rx_ring[1].size - 1, adapter);
> +
> + /* reset the comp ring */
> + rq->comp_ring.next2proc = 0;
> + memset(rq->comp_ring.base, 0, rq->comp_ring.size *
> + sizeof(struct Vmxnet3_RxCompDesc));
> + rq->comp_ring.gen = VMXNET3_INIT_GEN;
> +
> + /* reset rxctx */
> + rq->rx_ctx.skb = NULL;
> +
> + /* stats are not reset */
> + return 0;
> +}
> +
> +
> +static int
> +vmxnet3_rq_create(struct vmxnet3_rx_queue *rq, struct vmxnet3_adapter *adapter)
> +{
> + int i;
> + size_t sz;
> + struct vmxnet3_rx_buf_info *bi;
> +
> + for (i = 0; i < 2; i++) {
> +
> + sz = rq->rx_ring[i].size * sizeof(struct Vmxnet3_RxDesc);
> + rq->rx_ring[i].base = pci_alloc_consistent(adapter->pdev, sz,
> + &rq->rx_ring[i].basePA);
> + if (!rq->rx_ring[i].base) {
> + printk(KERN_ERR "%s: failed to allocate rx ring %d\n",
> + adapter->netdev->name, i);
> + goto err;
> + }
> + }
> +
> + sz = rq->comp_ring.size * sizeof(struct Vmxnet3_RxCompDesc);
> + rq->comp_ring.base = pci_alloc_consistent(adapter->pdev, sz,
> + &rq->comp_ring.basePA);
> + if (!rq->comp_ring.base) {
> + printk(KERN_ERR "%s: failed to allocate rx comp ring\n",
> + adapter->netdev->name);
> + goto err;
> + }
> +
> + sz = sizeof(struct vmxnet3_rx_buf_info) * (rq->rx_ring[0].size +
> + rq->rx_ring[1].size);
> + bi = kmalloc(sz, GFP_KERNEL);
> + if (!bi) {
> + printk(KERN_ERR "%s: failed to allocate rx bufinfo\n",
> + adapter->netdev->name);
> + goto err;
> + }
> + memset(bi, 0, sz);
> + rq->buf_info[0] = bi;
> + rq->buf_info[1] = bi + rq->rx_ring[0].size;
> +
> + return 0;
> +
> +err:
> + vmxnet3_rq_destroy(rq, adapter);
> + return -ENOMEM;
> +}
> +
> +
> +static void
> +vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget, int *txd_done,
> + int *rxd_done)
> +{
> + if (unlikely(adapter->shared->ecr))
> + vmxnet3_process_events(adapter);
> +
> + *txd_done = vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
> + *rxd_done = vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter, budget);
> +}
> +
> +
> +static int
> +vmxnet3_poll(struct napi_struct *napi, int budget)
> +{
> + struct vmxnet3_adapter *adapter = container_of(napi,
> + struct vmxnet3_adapter, napi);
> + int rxd_done, txd_done;
> +
> + vmxnet3_do_poll(adapter, budget, &txd_done, &rxd_done);
> +
> + if (rxd_done < budget) {
> + napi_complete(napi);
> + vmxnet3_enable_intr(adapter, 0);
> + }
> + return rxd_done;
> +}
> +
> +
> +/* Interrupt handler for vmxnet3 */
> +static irqreturn_t
> +vmxnet3_intr(int irq, void *dev_id)
> +{
> + struct net_device *dev = dev_id;
> + struct vmxnet3_adapter *adapter = netdev_priv(dev);
> +
> + if (unlikely(adapter->intr.type == VMXNET3_IT_INTX)) {
> + u32 icr = VMXNET3_READ_BAR1_REG(adapter, VMXNET3_REG_ICR);
> + if (unlikely(icr == 0))
> + /* not ours */
> + return IRQ_NONE;
> + }
> +
> +
> + /* disable intr if needed */
> + if (adapter->intr.mask_mode == VMXNET3_IMM_ACTIVE)
> + vmxnet3_disable_intr(adapter, 0);
> +
> + napi_schedule(&adapter->napi);
> +
> + return IRQ_HANDLED;
> +}
> +
> +#ifdef CONFIG_N
From: Stephen Hemminger on
On Thu, 8 Oct 2009 10:59:26 -0700 (PDT)
Shreyas Bhatewara <sbhatewara(a)vmware.com> wrote:

> Hello all,
>
> I do not mean to be bothersome but this thread has been unusually silent.
> Could you please review the patch for me and reply with your comments /
> acks ?
>
> Thanks.
> ->Shreyas


Looks fine, but just a minor style nit (can be changed after insertion in mainline).

The code:

static void
vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget, int *txd_done,
int *rxd_done)
{
if (unlikely(adapter->shared->ecr))
vmxnet3_process_events(adapter);

*txd_done = vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
*rxd_done = vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter, budget);
}


static int
vmxnet3_poll(struct napi_struct *napi, int budget)
{
struct vmxnet3_adapter *adapter = container_of(napi,
struct vmxnet3_adapter, napi);
int rxd_done, txd_done;

vmxnet3_do_poll(adapter, budget, &txd_done, &rxd_done);

if (rxd_done < budget) {
napi_complete(napi);
vmxnet3_enable_intr(adapter, 0);
}
return rxd_done;
}


Is simpler if you just have do_poll return rx done value. Probably Gcc
inline's it all anyway.

static int
vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget)
{
if (unlikely(adapter->shared->ecr))
vmxnet3_process_events(adapter);

vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
return vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter, budget);
}


static int
vmxnet3_poll(struct napi_struct *napi, int budget)
{
struct vmxnet3_adapter *adapter = container_of(napi,
struct vmxnet3_adapter, napi);
int rxd_done;

rxd_done = vmxnet3_do_poll(adapter, budget);
if (rxd_done < budget) {
napi_complete(napi);
vmxnet3_enable_intr(adapter, 0);
}
return rxd_done;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Shreyas Bhatewara on
> -----Original Message-----
> From: Stephen Hemminger [mailto:shemminger(a)linux-foundation.org]
> Sent: Friday, October 09, 2009 2:36 PM
> To: Shreyas Bhatewara
> Cc: linux-kernel; netdev; David S. Miller; Jeff Garzik; Anthony
> Liguori; Chris Wright; Greg Kroah-Hartman; Andrew Morton;
> virtualization; pv-drivers
> Subject: Re: [PATCH 2.6.32-rc3] net: VMware virtual Ethernet NIC
> driver: vmxnet3
>
> On Thu, 8 Oct 2009 10:59:26 -0700 (PDT)
> Shreyas Bhatewara <sbhatewara(a)vmware.com> wrote:
>
> > Hello all,
> >
> > I do not mean to be bothersome but this thread has been unusually
> silent.
> > Could you please review the patch for me and reply with your comments
> /
> > acks ?
> >
> > Thanks.
> > ->Shreyas
>
>
> Looks fine, but just a minor style nit (can be changed after insertion
> in mainline).
>
> The code:
>
> static void
> vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget, int
> *txd_done,
> int *rxd_done)
> {
> if (unlikely(adapter->shared->ecr))
> vmxnet3_process_events(adapter);
>
> *txd_done = vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
> *rxd_done = vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter,
> budget);
> }
>
>
> static int
> vmxnet3_poll(struct napi_struct *napi, int budget)
> {
> struct vmxnet3_adapter *adapter = container_of(napi,
> struct vmxnet3_adapter, napi);
> int rxd_done, txd_done;
>
> vmxnet3_do_poll(adapter, budget, &txd_done, &rxd_done);
>
> if (rxd_done < budget) {
> napi_complete(napi);
> vmxnet3_enable_intr(adapter, 0);
> }
> return rxd_done;
> }
>
>
> Is simpler if you just have do_poll return rx done value. Probably Gcc
> inline's it all anyway.
>
> static int
> vmxnet3_do_poll(struct vmxnet3_adapter *adapter, int budget)
> {
> if (unlikely(adapter->shared->ecr))
> vmxnet3_process_events(adapter);
>
> vmxnet3_tq_tx_complete(&adapter->tx_queue, adapter);
> return vmxnet3_rq_rx_complete(&adapter->rx_queue, adapter,
> budget);
> }
>
>
> static int
> vmxnet3_poll(struct napi_struct *napi, int budget)
> {
> struct vmxnet3_adapter *adapter = container_of(napi,
> struct vmxnet3_adapter, napi);
> int rxd_done;
>
> rxd_done = vmxnet3_do_poll(adapter, budget);
> if (rxd_done < budget) {
> napi_complete(napi);
> vmxnet3_enable_intr(adapter, 0);
> }
> return rxd_done;
> }



Thanks Stephen.

Yes, the vmxnet3_do_poll() was an inline function in the very first patch. It was thought of as a better idea to let gcc handle the inlining.
I will piggyback this nit on a forthcoming change.

->Shreyas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/