From: Ira W. Snyder on
On Mon, Mar 22, 2010 at 08:17:10PM +0100, Wolfgang Grandegger wrote:
> Ira W. Snyder wrote:
> > On Sat, Mar 20, 2010 at 08:55:16AM +0100, Wolfgang Grandegger wrote:
> >> Ira W. Snyder wrote:
> [snip]
> >>> Does this seem right? It seems pretty good to me.
> >> Yes, I'm just missing an error-passive message. What state does "ip -d
> >> link show can0" report.
> >>
> >
> > Ok, here is what I did:
> >
> > $ ip link set can0 up type can bitrate 1000000
> > $ ip link set can1 up type can bitrate 1000000 berr-reporting on
> > $ ip -d -s link
> > 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> > link/can
> > can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> > bitrate 1000000 sample-point 0.750
> > tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> > janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> > clock 8000000
> > re-started bus-errors arbit-lost error-warn error-pass bus-off
> > 0 0 0 0 0 0
> > RX: bytes packets errors dropped overrun mcast
> > 0 0 0 0 0 0
> > TX: bytes packets errors dropped carrier collsns
> > 0 0 0 0 0 0
> > 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> > link/can
> > can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> > bitrate 1000000 sample-point 0.750
> > tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> > janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> > clock 8000000
> > re-started bus-errors arbit-lost error-warn error-pass bus-off
> > 0 0 0 0 0 0
> > RX: bytes packets errors dropped overrun mcast
> > 0 0 0 0 0 0
> > TX: bytes packets errors dropped carrier collsns
> > 0 0 0 0 0 0
> >
> > Now, in seperate windows, I ran cansequence and candump. I stopped
> > cansequence when it could not send any more packets (due to the cable
> > being unplugged).
> >
> > $ cansequence -v -e -p can0
> > $ cansequence -v -e -p can1
> > $ candump any,0~0,#FFFFFFFF
> > can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> > can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >
> > This last message is repeated lots more times. That's the flooding we're
> > avoiding with berr-reporting off.
> >
> > I see two types of messages here:
> > 1) bus error (only on can1)
> > 2) controller problems -- tx warning limit reached (both)
> >
> > Am I missing some message? My error frame generation was mostly copied
> > from the sja1000 driver.
>
> It seem that you are not getting the error passive interrupt even...
>
> > $ ip -d -s link
> > 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> > link/can
> > can state ERROR-WARNING (berr-counter tx 128 rx 0) restart-ms 0
>
> if the hardware already reports >= 128 errors --^.
>

Re-reading the documentation, it appears that the firmware uses the
error interrupt for two different indications. In the SJA1000 driver,
they map to IRQ_EI and IRQ_EPI.

The documentation says that you can tell when you get an error-passive
only by checking the rxerr + txerr registers in the message. You'll note
I omitted the IRQ_EPI-equivalent code from my driver when I copied the
sja1000.c implementation.

I've added an if-statement in the CEVTIND_EI path, which now looks like
this. It handles both cases now.

/* error warning interrupt */
if (isrc == CEVTIND_EI) {
u8 rxerr = msg->data[4];
u8 txerr = msg->data[5];

dev_dbg(mod->dev, "error warning interrupt\n");
if (status & SR_BS) {
state = CAN_STATE_BUS_OFF;
cf->can_id |= CAN_ERR_BUSOFF;
can_bus_off(dev);
} else if (status & SR_ES) {
if (rxerr >= 127 || txerr >= 127)
state = CAN_STATE_ERROR_PASSIVE;
else
state = CAN_STATE_ERROR_WARNING;
} else {
state = CAN_STATE_ERROR_ACTIVE;
}
}

The only change is in the "else if (status & SR_ES)" path. I had to add
the if-statement that checks the rxerr and txerr registers. Does that
seem ok? I got the 127 values from this webpage (provided to me on this
mailing list).

http://www.softing.com/home/en/industrial-automation/products/can-bus/more-can-bus/error-handling/error-states.php?navanchor=3010510

> > bitrate 1000000 sample-point 0.750
> > tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> > janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> > clock 8000000
> > re-started bus-errors arbit-lost error-warn error-pass bus-off
> > 0 0 0 1 0 0
> > RX: bytes packets errors dropped overrun mcast
> > 16 0 2 0 0 0
> > TX: bytes packets errors dropped carrier collsns
> > 513 513 0 0 0 0
> > 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> > link/can
> > can <BERR-REPORTING> state ERROR-WARNING (berr-counter tx 128 rx 0) restart-ms 0
> > bitrate 1000000 sample-point 0.750
> > tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> > janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> > clock 8000000
> > re-started bus-errors arbit-lost error-warn error-pass bus-off
> > 0 126 0 1 0 0
>
> But that's mabe because you stopped the test too early (just 126 bus errors).
>

This is the best I could do. Without the cable connected, that's where
the controller stops sending messages (cansequence just hangs waiting
for buffer space to become available).

> > RX: bytes packets errors dropped overrun mcast
> > 1024 0 254 0 0 0
> > TX: bytes packets errors dropped carrier collsns
> > 513 513 0 0 0 0
>
> When I send out messages without cable connected I get:
>
> -bash-3.2# ./ip -d -s link show can0
> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> link/can
> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0) restart-ms 0
> bitrate 500000 sample-point 0.875
> tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 8000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 54101 0 1 1 0
> RX: bytes packets errors dropped overrun mcast
> 432808 54101 54101 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 0 0 0 0 0 0
>
> The following output is without BERR-REPORTING:
>
> -bash-3.2# ./candump -t d any,0:0,#FFFFFFFF
> (0.000000) can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> (0.000474) can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
> ^ ^
> TX RX error counter

With my newest changes, I get:

8: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
link/can
can state ERROR-PASSIVE (berr-counter tx 128 rx 0) restart-ms 0
bitrate 1000000 sample-point 0.750
tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
clock 8000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 3 3 0
RX: bytes packets errors dropped overrun mcast
236045 235949 12 0 0 0
TX: bytes packets errors dropped carrier collsns
235938 235938 0 0 0 0

can1 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
can1 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME

So it looks like both drivers agree (finally!). :)

With berr-reporting on, I get the same flood of bus-error messages, with
these two messages as well.

>
> The patch I mentioned also copies the rx and tx error counter values to
> the data field 6 and 7.
>

I missed this. It has been added. Thanks for pointing it out.

I haven't heard back from Samuel Ortiz yet about the changes for the mfd
layer. Would you like me to send out my latest CAN driver changes, or
should I just wait until I hear back?

Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ira W. Snyder on
On Mon, Mar 22, 2010 at 08:23:42PM +0100, Wolfgang Grandegger wrote:
> Wolfgang Grandegger wrote:
> > Ira W. Snyder wrote:
> >> On Sat, Mar 20, 2010 at 08:55:16AM +0100, Wolfgang Grandegger wrote:
> >>> Ira W. Snyder wrote:
> > [snip]
> >>>> Does this seem right? It seems pretty good to me.
> >>> Yes, I'm just missing an error-passive message. What state does "ip -d
> >>> link show can0" report.
> >>>
> >> Ok, here is what I did:
> >>
> >> $ ip link set can0 up type can bitrate 1000000
> >> $ ip link set can1 up type can bitrate 1000000 berr-reporting on
> >> $ ip -d -s link
> >> 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> >> link/can
> >> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> >> bitrate 1000000 sample-point 0.750
> >> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> >> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> >> clock 8000000
> >> re-started bus-errors arbit-lost error-warn error-pass bus-off
> >> 0 0 0 0 0 0
> >> RX: bytes packets errors dropped overrun mcast
> >> 0 0 0 0 0 0
> >> TX: bytes packets errors dropped carrier collsns
> >> 0 0 0 0 0 0
> >> 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> >> link/can
> >> can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> >> bitrate 1000000 sample-point 0.750
> >> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> >> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> >> clock 8000000
> >> re-started bus-errors arbit-lost error-warn error-pass bus-off
> >> 0 0 0 0 0 0
> >> RX: bytes packets errors dropped overrun mcast
> >> 0 0 0 0 0 0
> >> TX: bytes packets errors dropped carrier collsns
> >> 0 0 0 0 0 0
> >>
> >> Now, in seperate windows, I ran cansequence and candump. I stopped
> >> cansequence when it could not send any more packets (due to the cable
> >> being unplugged).
> >>
> >> $ cansequence -v -e -p can0
> >> $ cansequence -v -e -p can1
> >> $ candump any,0~0,#FFFFFFFF
> >> can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
> >>
> >> This last message is repeated lots more times. That's the flooding we're
> >> avoiding with berr-reporting off.
> >>
> >> I see two types of messages here:
> >> 1) bus error (only on can1)
> >> 2) controller problems -- tx warning limit reached (both)
> >>
> >> Am I missing some message? My error frame generation was mostly copied
> >> from the sja1000 driver.
> >
> > It seem that you are not getting the error passive interrupt even...
>
> Because you do not enable/handle it. CEVTIND_EPI seems to be missing:
>
> http://lxr.linux.no/#linux+v2.6.33/drivers/net/can/sja1000/sja1000.c#L403
>

See the message I just sent. In short, the firmware coalesces the IRQ_EI
and IRQ_EPI messages into CEVTIND_EI. You can only tell them apart via
the rxerr and txerr registers.

Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wolfgang Grandegger on
Ira W. Snyder wrote:
> On Mon, Mar 22, 2010 at 08:17:10PM +0100, Wolfgang Grandegger wrote:
>> Ira W. Snyder wrote:
>>> On Sat, Mar 20, 2010 at 08:55:16AM +0100, Wolfgang Grandegger wrote:
>>>> Ira W. Snyder wrote:
>> [snip]
>>>>> Does this seem right? It seems pretty good to me.
>>>> Yes, I'm just missing an error-passive message. What state does "ip -d
>>>> link show can0" report.
>>>>
>>> Ok, here is what I did:
>>>
>>> $ ip link set can0 up type can bitrate 1000000
>>> $ ip link set can1 up type can bitrate 1000000 berr-reporting on
>>> $ ip -d -s link
>>> 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>> link/can
>>> can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>> bitrate 1000000 sample-point 0.750
>>> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
>>> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 8000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 0 0 0 0 0
>>> RX: bytes packets errors dropped overrun mcast
>>> 0 0 0 0 0 0
>>> TX: bytes packets errors dropped carrier collsns
>>> 0 0 0 0 0 0
>>> 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>> link/can
>>> can <BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>>> bitrate 1000000 sample-point 0.750
>>> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
>>> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 8000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 0 0 0 0 0
>>> RX: bytes packets errors dropped overrun mcast
>>> 0 0 0 0 0 0
>>> TX: bytes packets errors dropped carrier collsns
>>> 0 0 0 0 0 0
>>>
>>> Now, in seperate windows, I ran cansequence and candump. I stopped
>>> cansequence when it could not send any more packets (due to the cable
>>> being unplugged).
>>>
>>> $ cansequence -v -e -p can0
>>> $ cansequence -v -e -p can1
>>> $ candump any,0~0,#FFFFFFFF
>>> can0 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000004 [8] 00 08 00 00 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>> can1 20000088 [8] 00 00 80 19 00 00 00 00 ERRORFRAME
>>>
>>> This last message is repeated lots more times. That's the flooding we're
>>> avoiding with berr-reporting off.
>>>
>>> I see two types of messages here:
>>> 1) bus error (only on can1)
>>> 2) controller problems -- tx warning limit reached (both)
>>>
>>> Am I missing some message? My error frame generation was mostly copied
>>> from the sja1000 driver.
>> It seem that you are not getting the error passive interrupt even...
>>
>>> $ ip -d -s link
>>> 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>> link/can
>>> can state ERROR-WARNING (berr-counter tx 128 rx 0) restart-ms 0
>> if the hardware already reports >= 128 errors --^.
>>
>
> Re-reading the documentation, it appears that the firmware uses the
> error interrupt for two different indications. In the SJA1000 driver,
> they map to IRQ_EI and IRQ_EPI.
>
> The documentation says that you can tell when you get an error-passive
> only by checking the rxerr + txerr registers in the message. You'll note
> I omitted the IRQ_EPI-equivalent code from my driver when I copied the
> sja1000.c implementation.
>
> I've added an if-statement in the CEVTIND_EI path, which now looks like
> this. It handles both cases now.
>
> /* error warning interrupt */
> if (isrc == CEVTIND_EI) {
> u8 rxerr = msg->data[4];
> u8 txerr = msg->data[5];
>
> dev_dbg(mod->dev, "error warning interrupt\n");
> if (status & SR_BS) {
> state = CAN_STATE_BUS_OFF;
> cf->can_id |= CAN_ERR_BUSOFF;
> can_bus_off(dev);
> } else if (status & SR_ES) {
> if (rxerr >= 127 || txerr >= 127)
> state = CAN_STATE_ERROR_PASSIVE;
> else
> state = CAN_STATE_ERROR_WARNING;
> } else {
> state = CAN_STATE_ERROR_ACTIVE;
> }
> }
>
> The only change is in the "else if (status & SR_ES)" path. I had to add
> the if-statement that checks the rxerr and txerr registers. Does that
> seem ok? I got the 127 values from this webpage (provided to me on this
> mailing list).

It should be >= 128.

> http://www.softing.com/home/en/industrial-automation/products/can-bus/more-can-bus/error-handling/error-states.php?navanchor=3010510
>
>>> bitrate 1000000 sample-point 0.750
>>> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
>>> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 8000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 0 0 1 0 0
>>> RX: bytes packets errors dropped overrun mcast
>>> 16 0 2 0 0 0
>>> TX: bytes packets errors dropped carrier collsns
>>> 513 513 0 0 0 0
>>> 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>>> link/can
>>> can <BERR-REPORTING> state ERROR-WARNING (berr-counter tx 128 rx 0) restart-ms 0
>>> bitrate 1000000 sample-point 0.750
>>> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
>>> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>>> clock 8000000
>>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>>> 0 126 0 1 0 0
>> But that's mabe because you stopped the test too early (just 126 bus errors).
>>
>
> This is the best I could do. Without the cable connected, that's where
> the controller stops sending messages (cansequence just hangs waiting
> for buffer space to become available).
>
>>> RX: bytes packets errors dropped overrun mcast
>>> 1024 0 254 0 0 0
>>> TX: bytes packets errors dropped carrier collsns
>>> 513 513 0 0 0 0
>> When I send out messages without cable connected I get:
>>
>> -bash-3.2# ./ip -d -s link show can0
>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
>> link/can
>> can <BERR-REPORTING> state ERROR-PASSIVE (berr-counter tx 128 rx 0) restart-ms 0
>> bitrate 500000 sample-point 0.875
>> tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>> sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
>> clock 8000000
>> re-started bus-errors arbit-lost error-warn error-pass bus-off
>> 0 54101 0 1 1 0
>> RX: bytes packets errors dropped overrun mcast
>> 432808 54101 54101 0 0 0
>> TX: bytes packets errors dropped carrier collsns
>> 0 0 0 0 0 0
>>
>> The following output is without BERR-REPORTING:
>>
>> -bash-3.2# ./candump -t d any,0:0,#FFFFFFFF
>> (0.000000) can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
>> (0.000474) can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
>> ^ ^
>> TX RX error counter
>
> With my newest changes, I get:
>
> 8: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN qlen 10
> link/can
> can state ERROR-PASSIVE (berr-counter tx 128 rx 0) restart-ms 0
> bitrate 1000000 sample-point 0.750
> tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
> janz-ican3: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> clock 8000000
> re-started bus-errors arbit-lost error-warn error-pass bus-off
> 0 0 0 3 3 0
> RX: bytes packets errors dropped overrun mcast
> 236045 235949 12 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 235938 235938 0 0 0 0
>
> can1 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> can1 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
>
> So it looks like both drivers agree (finally!). :)
>
> With berr-reporting on, I get the same flood of bus-error messages, with
> these two messages as well.

Looks good now.

>> The patch I mentioned also copies the rx and tx error counter values to
>> the data field 6 and 7.
>>
>
> I missed this. It has been added. Thanks for pointing it out.

You could even add the tx/rx values for each error message (for both,
state changes and bus-errors).

> I haven't heard back from Samuel Ortiz yet about the changes for the mfd
> layer. Would you like me to send out my latest CAN driver changes, or
> should I just wait until I hear back?

As you need patch 1/3 anyway, just wait some more time. From my point of
view the next version of the patch will be OK.

Wolfgang.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Ira W. Snyder on
On Mon, Mar 22, 2010 at 09:28:25PM +0100, Wolfgang Grandegger wrote:

[ big snip ]

>
> You could even add the tx/rx values for each error message (for both,
> state changes and bus-errors).
>

Ok, with that change, I get the following:

berr-reporting on:

can0 20000088 [8] 00 00 80 19 00 00 08 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 10 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 18 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 20 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 28 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 30 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 38 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 40 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 48 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 50 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 58 00 ERRORFRAME
can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 60 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 68 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 70 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 78 00 ERRORFRAME
can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 80 00 ERRORFRAME
can0 20000088 [8] 00 00 80 19 00 00 80 00 ERRORFRAME

And now lots more of this last frame repeated, until the controller
decides to stop. Seems fine. It has always done this.

berr-reporting off:

can1 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
can1 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME


Same as before. Excellent.

> > I haven't heard back from Samuel Ortiz yet about the changes for the mfd
> > layer. Would you like me to send out my latest CAN driver changes, or
> > should I just wait until I hear back?
>
> As you need patch 1/3 anyway, just wait some more time. From my point of
> view the next version of the patch will be OK.
>

Ok, I'll wait a few more days before pinging him again. He's CC'd on all
of these emails anyway. :)

Thanks for all the help,
Ira
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Wolfgang Grandegger on
Ira W. Snyder wrote:
> On Mon, Mar 22, 2010 at 09:28:25PM +0100, Wolfgang Grandegger wrote:
>
> [ big snip ]
>
>> You could even add the tx/rx values for each error message (for both,
>> state changes and bus-errors).
>>
>
> Ok, with that change, I get the following:
>
> berr-reporting on:
>
> can0 20000088 [8] 00 00 80 19 00 00 08 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 10 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 18 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 20 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 28 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 30 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 38 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 40 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 48 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 50 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 58 00 ERRORFRAME
> can0 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 60 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 68 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 70 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 78 00 ERRORFRAME
> can0 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 80 00 ERRORFRAME
> can0 20000088 [8] 00 00 80 19 00 00 80 00 ERRORFRAME
>
> And now lots more of this last frame repeated, until the controller
> decides to stop. Seems fine. It has always done this.
>
> berr-reporting off:
>
> can1 20000004 [8] 00 08 00 00 00 00 60 00 ERRORFRAME
> can1 20000004 [8] 00 20 00 00 00 00 80 00 ERRORFRAME
>
>
> Same as before. Excellent.

Yes, below is some more theory from the AT91 CAN manual, in case you are
interested in technical details.

Wolfgang.

-----------------------------------------------------------------------
o REC: Receive Error Counter
When a receiver detects an error, REC will be increased by one, except
when the detected error is a BIT ERROR while sending an ACTIVE ERROR
FLAG or an OVERLOAD FLAG. When a receiver detects a dominant bit as
the first bit after sending an ERROR FLAG, REC is increased by 8.
When a receiver detects a BIT ERROR while sending an ACTIVE ERROR
FLAG, REC is increased by 8. Any node tolerates up to 7 consecutive
dominant bits after sending an ACTIVE ERROR FLAG, PASSIVE ERROR FLAG
or OVERLOAD FLAG. After detecting the 14th consecutive dominant bit
(in case of an ACTIVE ERROR FLAG or an OVER-LOAD FLAG) or after
detecting the 8th consecutive dominant bit following a PASSIVE ERROR
FLAG, and after each sequence of additional eight consecutive dominant
bits, each receiver increases its REC by 8. After successful reception
of a message, REC is decreased by 1 if it was between 1 and 127. If
REC was 0, it stays 0, and if it was greater than 127, then it is set
to a value between 119 and 127.

o TEC: Transmit Error Counter
When a transmitter sends an ERROR FLAG, TEC is increased by 8 except
when:
- the transmitter is error passive and detects an ACKNOWLEDGMENT ERROR
because of not detecting a dominant ACK and does not detect a
dominant bit while sending its PASSIVE ERROR FLAG.
- the transmitter sends an ERROR FLAG because a STUFF ERROR occurred
during arbitration and should have been recessive and has been sent
as recessive but monitored as dominant.
When a transmitter detects a BIT ERROR while sending an ACTIVE ERROR
FLAG or an OVERLOAD FLAG, the TEC will be increased by 8.
Any node tolerates up to 7 consecutive dominant bits after sending an
ACTIVE ERROR FLAG, PASSIVE ERROR FLAG or OVERLOAD FLAG. After
detecting the 14th consecutive dominant bit (in case of an ACTIVE
ERROR FLAG or an OVERLOAD FLAG) or after detecting the 8th consecutive
dominant bit following a PASSIVE ERROR FLAG, and after each
sequence of additional eight consecutive dominant bits every
transmitter increases its TEC by 8. After a successful transmission
the TEC is decreased by 1 unless it was already 0.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/