From: Andrew Morton on
On Sat, 06 Mar 2010 17:50:58 +0100
Stefani Seibold <stefani(a)seibold.net> wrote:

> This patch fix the PHY poller, which can block the whole system. On a
> Freescale PPC 834x this result in a delay of 450 us due the slow
> communication with the PHY chip.
>
> For PHY chips without interrupts, the status of the ethernet will be
> polled every 2 sec. The poll function will read some register of the MII
> PHY. The time between the sending the MII_READ_COMMAND and receiving the
> result is more the 100 us on a PPC 834x.
>
> The patch modifies the poller a lit bit. Only a link status state change
> will result in a successive detection of the connection type. The poll
> cycle on the other hand will be increased to every seconds.

You didn't tell us how many seconds. That would be important?

> Together this patch will prevent a blocking of nearly 400 us every two
> seconds of the whole system on a PPC 834x.
>

I can't say that I really understand what you did - what functionality
did we lose?

Would it not be better to extend the phy state machine a bit? When we
detect the link status change, issue the MII command then arm the
delayed-work timer to expire in a millisecond, then go in next time and
read the result?

That might require fancy locking to prevent other threads of control
from going in and mucking up the MII state. Possibly leave phydev_lock
held across that millisecond to keep other people away?

>
> ...
>
> diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net//phy/phy.c
> --- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100
> +++ linux-2.6.33/drivers/net//phy/phy.c 2010-02-28 22:53:14.725464101 +0100

erp, your weird patch headers ("//") broke my seven-year-old script.
But I fixed it!

> @@ -871,9 +871,8 @@ void phy_state_machine(struct work_struc
> case PHY_RUNNING:
> /* Only register a CHANGE if we are
> * polling */
> - if (PHY_POLL == phydev->irq)
> - phydev->state = PHY_CHANGELINK;
> - break;
> + if (PHY_POLL != phydev->irq)
> + break;
> case PHY_CHANGELINK:
> err = phy_read_status(phydev);
>
> diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy_device.c linux-2.6.33/drivers/net//phy/phy_device.c
> --- linux-2.6.33.orig/drivers/net/phy/phy_device.c 2010-02-24 19:52:17.000000000 +0100
> +++ linux-2.6.33/drivers/net//phy/phy_device.c 2010-02-28 22:53:14.726464145 +0100
> @@ -161,7 +161,7 @@ struct phy_device* phy_device_create(str
> dev->speed = 0;
> dev->duplex = -1;
> dev->pause = dev->asym_pause = 0;
> - dev->link = 1;
> + dev->link = 0;
> dev->interface = PHY_INTERFACE_MODE_GMII;
>
> dev->autoneg = AUTONEG_ENABLE;
> @@ -694,10 +694,16 @@ int genphy_update_link(struct phy_device
> if (status < 0)
> return status;
>
> - if ((status & BMSR_LSTATUS) == 0)
> + if ((status & BMSR_LSTATUS) == 0) {
> + if (phydev->link==0)
> + return 1;
> phydev->link = 0;
> - else
> + }
> + else {
> + if (phydev->link==1)
> + return 1;
> phydev->link = 1;
> + }

Please don't invent new coding styles. scripts/checkpatch.pl is here
to help.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stefani Seibold on
Am Freitag, den 12.03.2010, 14:42 -0800 schrieb Andrew Morton:
> On Sat, 06 Mar 2010 17:50:58 +0100
> Stefani Seibold <stefani(a)seibold.net> wrote:
>
> > This patch fix the PHY poller, which can block the whole system. On a
> > Freescale PPC 834x this result in a delay of 450 us due the slow
> > communication with the PHY chip.
> >
> > For PHY chips without interrupts, the status of the ethernet will be
> > polled every 2 sec. The poll function will read some register of the MII
> > PHY. The time between the sending the MII_READ_COMMAND and receiving the
> > result is more the 100 us on a PPC 834x.
> >
> > The patch modifies the poller a lit bit. Only a link status state change
> > will result in a successive detection of the connection type. The poll
> > cycle on the other hand will be increased to every seconds.
>
> You didn't tell us how many seconds. That would be important?
>

The old implementation would poll the PHC every 2 seconds, the new one
once per second.

> > Together this patch will prevent a blocking of nearly 400 us every two
> > seconds of the whole system on a PPC 834x.
> >
>
> I can't say that I really understand what you did - what functionality
> did we lose?
>

There is not real drawback, only the detection of a connection type
change without unplugging the cable will not work. But this is more an
esoteric use case.


> Would it not be better to extend the phy state machine a bit? When we
> detect the link status change, issue the MII command then arm the
> delayed-work timer to expire in a millisecond, then go in next time and
> read the result?
>
> That might require fancy locking to prevent other threads of control
> from going in and mucking up the MII state. Possibly leave phydev_lock
> held across that millisecond to keep other people away?
>

You are right, that would be the best solution. But i am currently busy,
so this a fast interim solution ;-) It was also my approach, because it
the PHY communication in the most cases very slow. Maybe i will do this
in the near future.

> >
> > ...
> >
> > diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy.c linux-2.6.33/drivers/net//phy/phy.c
> > --- linux-2.6.33.orig/drivers/net/phy/phy.c 2010-02-24 19:52:17.000000000 +0100
> > +++ linux-2.6.33/drivers/net//phy/phy.c 2010-02-28 22:53:14.725464101 +0100
>
> erp, your weird patch headers ("//") broke my seven-year-old script.
> But I fixed it!
>

Sorry, my fault. But now you have a better script.

> > @@ -871,9 +871,8 @@ void phy_state_machine(struct work_struc
> > case PHY_RUNNING:
> > /* Only register a CHANGE if we are
> > * polling */
> > - if (PHY_POLL == phydev->irq)
> > - phydev->state = PHY_CHANGELINK;
> > - break;
> > + if (PHY_POLL != phydev->irq)
> > + break;
> > case PHY_CHANGELINK:
> > err = phy_read_status(phydev);
> >
> > diff -u -N -r -p linux-2.6.33.orig/drivers/net/phy/phy_device.c linux-2.6.33/drivers/net//phy/phy_device.c
> > --- linux-2.6.33.orig/drivers/net/phy/phy_device.c 2010-02-24 19:52:17.000000000 +0100
> > +++ linux-2.6.33/drivers/net//phy/phy_device.c 2010-02-28 22:53:14.726464145 +0100
> > @@ -161,7 +161,7 @@ struct phy_device* phy_device_create(str
> > dev->speed = 0;
> > dev->duplex = -1;
> > dev->pause = dev->asym_pause = 0;
> > - dev->link = 1;
> > + dev->link = 0;
> > dev->interface = PHY_INTERFACE_MODE_GMII;
> >
> > dev->autoneg = AUTONEG_ENABLE;
> > @@ -694,10 +694,16 @@ int genphy_update_link(struct phy_device
> > if (status < 0)
> > return status;
> >
> > - if ((status & BMSR_LSTATUS) == 0)
> > + if ((status & BMSR_LSTATUS) == 0) {
> > + if (phydev->link==0)
> > + return 1;
> > phydev->link = 0;
> > - else
> > + }
> > + else {
> > + if (phydev->link==1)
> > + return 1;
> > phydev->link = 1;
> > + }
>
> Please don't invent new coding styles. scripts/checkpatch.pl is here
> to help.
>
>

Will be fixed!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Stefani Seibold <stefani(a)seibold.net>
Date: Sat, 13 Mar 2010 13:53:10 +0100

> For PHY chips without interrupts, the status of the ethernet will be
> polled every 2 sec. The poll function will read some register of the MII
> PHY. The time between the sending the MII_READ_COMMAND and receiving the
> result could be very long (>100us).

I'm not apply this, as I described in my previous email.

If it's expensive to detect link configuration changes that
doesn't mean you just turn it off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Stefani Seibold on
I had now analyzed the PHY handling in most of the network drivers. Most
of the PHY communication will be handled in a polling/blocking way,
write a command word and then wait for the results. Due the nature of
the PHY attachment, this will take some time.

Some of the network drivers do this polling/blocking also in atomic code
paths, like interrupts or timer. So activities on the PHY can cause huge
latency jitters.

On the other side, most of the network driver handle the PHY without
using or only partially using the phylib.

The phylib has also a drawback, because it polls the PHY despite if it
has interrupt support for it or not. I can't see a reason for this
behavior.

So the problem of huge latencies by polling the PHY occurs in most of
the network drivers. For example have a look at the e100 network driver
in the file drivers/net/e100.c, function mdio_ctrl_hw(): This function
will poll for max. of 4000 us or 4 ms.

To fix this latency jitter problem with the PHY polling there are the
following steps to do:

- disable polling in driver/net/phy.c if an interrupt for the PHY is
available
- create an own single or per cpu workqueue for the phylib, so that the
PHY specific code can temporary schedule or block
- prevent all current user of the phylib to access the PHY in a atomic
code path
- modify all current users of the phylib from using cpu_relax() to
cond_resched() and replace the counters against inquiring a timeout
- modify all other network drivers to use the phylib

What do you think?

Stefani


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: David Miller on
From: Stefani Seibold <stefani(a)seibold.net>
Date: Sun, 21 Mar 2010 22:54:50 +0100

> The phylib has also a drawback, because it polls the PHY despite if it
> has interrupt support for it or not. I can't see a reason for this
> behavior.

Careful, in my experience many PHYs that do have interrupt
support have buggy implementations to the point where the
interrupt support cannot be used at all.

Typically the problem is that events aren't reported reliably.

So I just wanted you to keep in mind that a chip having
interrupt support doesn't automatically mean it can be used.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/