-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Describe the bug
If I boot directly to an up-to-date Raspbian, then the network starts up in some kind of "no-functional" state. Means that dhcpcd does not manage to get an IP and trying to manually run dhcpcd on the interface hangs forever.
The problem resets if I unplug and replug the network cable. This triggers fetching a valid IP and properly enables the network interface.
It is also possible to reset from the non-functional state by running
sudo mii-tool -r eth0
This also "unblocks" the network card and makes dhcpcd get a new IP.
To reproduce
Seems like not everyone is able to reproduce this bug. Maybe it's even some kind of "hardware problem".
But on affected Raspberry Pi 4 board, everything you have to do is to reboot. Result will be non-functional network.
Expected behaviour
Network should come up without problems every time.
Actual behaviour
Network hangs until mii-tool -r is called or the network cable is unplugged and replugged.
Logs
I already published some logs here:
https://www.raspberrypi.org/forums/viewtopic.php?f=66&t=244061#p1488426
I can provide more if needed.
It doesn't seem to be DHCP issues only. I configured my RPi 4 for static IP and rebooted several times. The journal always says that IP, route and DNS are set properly but it is impossible to reach the RPi.
Then I tried the switch thing. I still have some old 100MBit switch and connected it in place of my 1GBit one (D-Link DGS-108 https://www.amazon.de/dp/B000BCC0LO/). With this switch in place I was able to reboot 5 times and network was always available.
So yes, this changes with changing the switch. But of course I would prefer to run the 1GBit card on a 1GBit switch 😛
So I think if I buy a second one, then this will show exactly the same problem on this switch?
Activity
M-Reimer commentedon Jul 28, 2019
Would be nice to get feedback about what to do with my "problematic" board.
If it helps in any way to debug this issue, I would send it in.
But maybe it would even be helpful for debugging if I keep the unit as I have the required test setup to trigger the issue. If an update is published, I could try if it also fixes the issue in my "test environment".
Anyway it would be nice to get some comment about this soon as currently the board is just collecting dust. I have no use for it if network fails regularly.
pelwell commentedon Jul 28, 2019
This sounds like an auto-negotiation failure. It isn't hard to imagine how some switches might trigger the problem while others don't, but it's harder to explain differences between multiple Pi 4s running the same image unless there is a marginal timing somewhere, e.g. (and this is just a hypothetical example) the first round of auto-negotiation takes too long and one side either gives up completely or, when trying again, falls foul of a driver bug in an error path.
M-Reimer commentedon Jul 28, 2019
I have a combination (RPi 4 and switch) which makes it possible to reproduce the issue every time.
So if there is a way to find out what is causing the problem, then I could try. So far the logs, I got, don't provide something useful.
pelwell commentedon Jul 28, 2019
Can you post the output of
mii-tool -vv eth0
before and after runningmii-tool -r eth0
?M-Reimer commentedon Jul 28, 2019
Before:
After:
pelwell commentedon Jul 28, 2019
Only one register is different, the Basic Mode Control Register at offset 0: 0x1140 becomes 0x1000. However, the bits that are different are either not relevant when auto-negotiation is enabled (bit 8 - Duplex Mode) or reserved (bit 6). It could be that resetting the PHY is (just) a way of shaking the Ethernet driver out of its broken state.
M-Reimer commentedon Jul 29, 2019
The problem doesn't occur every time. In some rare cases I get network directly after booting the "problematic" Pi. So I tried to catch this case where the network works directly after boot and got this directly after booting:
So I think it's safe to say that the register values don't matter in this case.
M-Reimer commentedon Jul 29, 2019
I used "diff" to compare the dmesg output of a "good" and a "bad" start. No relevant differences.
Is it possible to get additional info logged from this network card driver so maybe some difference can be found there?
I have no knowledge about kernel debugging but recompiling a kernel would be no problem for me if needed. I created my first Pi 4 compatible Arch Linux ARM kernel on my own, too.
M-Reimer commentedon Jul 29, 2019
I think that's the driver:
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/broadcom/genet/bcmgenet.c
And it seems to have nearly no debug output messages in there. Adding some without knowing which functions may be interesting doesn't make sense.
If someone here can provide a patch to make this driver a bit more communicative at the interesting positions, I could apply this, compile the kernel and check if there is any difference between the outputs in "good" or "bad" state.
pelwell commentedon Jul 29, 2019
I've got one of the D-Link switches on order, so I hope we can find a switch+Pi 4 combination that exhibits the problem.
iammer commentedon Jul 30, 2019
I am also having problems with a Pi 4 and a DGS-108. In my case the Ethernet disconnects/reconnects during heavy traffic. See: https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=247257
M-Reimer commentedon Jul 30, 2019
Interesting. So this switch seems to be problematic in general and it's not just my switch.
But there is still the problem that there are RPi 4 boards that work well on this switch. I hope @pelwell finds one which works for reproducing the issue.
Restarting the switch does not help for my case. The switch is restarted daily but the problem persists.
M-Reimer commentedon Jul 30, 2019
Today I received two more boards with 2GB RAM.
I want to use them to do some tests with in-home network services to find out a bit more about server performance.
Of course, network reliability is important there.
So I rebootet each of the boards 10 times via SSH to see if the network works on every boot. And it did. No problems at all with the two new boards.
So I guess the problem may be a bit rare.
It requires the right switch and the right RPi 4 board.
If you don't find a board to reproduce the issue, I can still offer you to send mine. But I would recommend that you maybe tell me your full name so I can put a note in to the package that it has to be forwarded directly to you so it isn't sent back to me after testing in an environment where the problem is not triggered.
pelwell commentedon Jul 30, 2019
Drop me an email - phil@raspberrypi.org - and we can exchange details.
846 remaining items
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail
net: bcmgenet: Workaround #2 for Pi4 Ethernet fail