Closed
Description
My ESP8266 regularly stops responding to ARP requests. I raised this in #6873 but I was asked to troubleshoot with the Tasmota Devs first (Github Issue) and provide a MCVE example. Tasmota ruled out an issue with their code, but did identify that this may be an interoperability issue between the ESP8266 and Mikrotik access points.
Basic Infos
- This issue complies with the issue POLICY doc.I have read the documentation at readthedocs and the issue is not addressed there.I have tested that the issue is present in current master branch (aka latest git).I have searched the issue tracker for a similar issue.If there is a stack dump, I have decoded it.I have filled out all fields below.
Platform
- Hardware: ESP8266
- Core Version: 2.6.2
- Development Env: Arduino IDE
- Operating System: MacOS
Problem Description
I have discovered if If wifi_set_sleep_type
is light or modem ARP requests will consistently fail. When set to none I reliably get responses to ARP requests. This appears to affect ESP8266 devices connected to Mikrotik Access Points.
ARP responses when sleep type set to none: (1 lost is acceptable IMO)
sudo nping --arp-type ARP 10.0.130.109 -c 20
...
SENT (19.0551s) ARP who has 10.0.130.109? Tell 10.0.130.102
RCVD (19.0663s) ARP reply 10.0.130.109 is at 5C:CF:7F:88:E3:2F
...
Raw packets sent: 20 (840B) | Rcvd: 19 (532B) | Lost: 1 (5.00%)
ARP responses when sleep type set to light:
sudo nping --arp-type ARP 10.0.130.109 -c 20
...
SENT (1.0130s) ARP who has 10.0.130.109? Tell 10.0.130.102
RCVD (1.0276s) ARP reply 10.0.130.109 is at 5C:CF:7F:88:E3:2F
...
Raw packets sent: 20 (840B) | Rcvd: 5 (140B) | Lost: 15 (75.00%)
ARP responses when sleep type set to modem:
sudo nping --arp-type ARP 10.0.130.109 -c 20
...
SENT (1.0130s) ARP who has 10.0.130.109? Tell 10.0.130.102
RCVD (1.0276s) ARP reply 10.0.130.109 is at 5C:CF:7F:88:E3:2F
...
Raw packets sent: 20 (840B) | Rcvd: 3 (84B) | Lost: 17 (85.00%)
MCVE Sketch
#include <ESP8266WiFi.h>
#ifndef STASSID
#define STASSID [YOUR SSID]
#define STAPSK [YOUR PASSWORD]
#endif
const char* ssid = STASSID;
const char* password = STAPSK;
void setup() {
Serial.begin(115200);
Serial.println();
Serial.println();
Serial.print("Connecting to ");
Serial.println(ssid);
wifi_set_sleep_type(NONE_SLEEP_T);
//wifi_set_sleep_type(LIGHT_SLEEP_T);
//wifi_set_sleep_type(MODEM_SLEEP_T);
WiFi.mode(WIFI_STA);
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) {
delay(500);
Serial.print(".");
}
Serial.println("");
Serial.println("WiFi connected");
Serial.println("IP address: ");
Serial.println(WiFi.localIP());
}
void loop() {
Serial.println("I'm alive");
delay(30000);
}
Metadata
Metadata
Assignees
Labels
No labels
Activity
TD-er commentedon Dec 7, 2019
In ESPEasy I added a send Gratuitous ARP option to overcome this issue. (hide the symptoms actually)
It is not limited to MikroTik AP's, also with Fritzbox I noticed these issues.
kugelkopf123 commentedon Dec 7, 2019
I can confirm that! (Fritzbox)
Sent with GitHawk
ascillato commentedon Dec 7, 2019
@TD-er cool, can you share the snippet of code for that. May be it is worth to add it directly to the core because is needed when using sleep. Thanks
TD-er commentedon Dec 7, 2019
@ascillato Well it is already shared as you may know ;)
See: https://github.com/letscontrolit/ESPEasy/search?q=gratuitous&unscoped_q=gratuitous
I think this is where most of the magic happens:
https://github.com/letscontrolit/ESPEasy/blob/0c6e6ca915e5440de54213246dc8c1a8e6ca9cce/src/Networking.ino#L15-L39
https://github.com/letscontrolit/ESPEasy/blob/0c6e6ca915e5440de54213246dc8c1a8e6ca9cce/src/Networking.ino#L938-L962
Some of the magic lies in how often I send it.
There is a dynamic interval to send these packets.
This interval does start low and gradually increases over time.
It is sent immediately when some kind of connect attempt fails.
This can be a DNS lookup, a NTP request, or any other wifi client connect attempt.
Such an immediate interfering action does also reset the interval timer to the lowest value.,
The Gratuitous ARP packet is also sent right after the connection to WiFi is fully active (got IP event + few 100 msec)
This is still not a fix for the problem, as you sometimes experience when the first try to connect to a node after some time does take a few seconds.
Especially with my "eco" mode enabled (calling
delay()
when my scheduler has nothing to do) this problem is more apparent.ascillato commentedon Dec 7, 2019
Great, thanks. That is simple to add, but IMHO this should be managed inside the core. What do you think? @d-a-v @devyte @earlephilhower
Is there a chance to be added inside wifi client library? Or must be managed by the sketches/projects ?
TD-er commentedon Dec 7, 2019
Well, Gratuitous ARP is not a fix for the problem, it is to hide the symptoms.
It is an answer to a question nobody asked, just to make sure all hosts, switches and AP's in the network keep it in their MAC tables.
The real problem is that when the question is asked, it may not be answered and that's core stuff, not part of this repo.
ascillato commentedon Dec 7, 2019
Searching a bit more, this issue has been discussed before but not added to the core. There are workarounds inside the projects.
#5998
xoseperez/espurna#1877
ascillato commentedon Dec 7, 2019
Exactly. But this only happens if using sleep. So, the SDK don't answer it because the ARP request is being done while the device is sleeping (request outside the DTIM time) ?
Or because the SDK ignores ARP request when it wakes up? I don't know this part of the wifi protocol.
And the weird thing is that this issue is not happening in other routers. Just in mikrotiks and fritzbox AFAIK
d-a-v commentedon Dec 7, 2019
#6484 made the FW much more stable, but this issue is still beyond our control.
We can add a gratuitous ARP trigger API. I think only STA interface is relevant.
We can make it recurrent and transparently called based on some time interval.
That would help projects not using tasmota/esp-easy-mega/espurna projects that are already implementing gratuitous ARP packets.
TD-er commentedon Dec 7, 2019
That's right. With AP mode active the WiFi is not put to sleep.
However, it would be a welcome addition to send out the Gratuitous ARP as soon as a connection is made.
I use wifi events, but as far as I know Tasmota doesn't.
So for those that rely on the reconnect of the core, it is unknown when a reconnect takes place and may be hard to implement it.
marrold commentedon Dec 7, 2019
Have any of you been able to identify the low level cause of the issue? My guess is some interoperability issue with the power saving negotiation on these vendors.
I plan on sniffing the WiFi later and seeing if I can spot anything of interest.
TD-er commentedon Dec 7, 2019
I don't think it is limited to these brands (Fritzbox/MikroTik).
But what I do find odd and still have no explanation for is this.
ARP packets (and also UDP packets) may get lost when the ESP is in some kind of sleep mode.
However, a ping will always be answered. It may take up-to 900 msec, for the first answer, but it will be answered.
After the first ping, the ESP will up its current consumption and the replies come in much faster.
This looks like the AP does try to send out the ping packet more often (or the ESP does receive it while dormant?) and packets that don't expect a reply like ARP or UDP will not be attempted again.
If it is indeed the AP that does these re-transmits of the packet, then there may be a difference between vendors on what packets will be re-transmitted and which don't.
Maybe also the AP does know it has some MAC/IP combination connected, so it may answer if the connected device doesn't?
d-a-v commentedon Dec 7, 2019
Not to my knowledge
Please have a look to #2330 (I hope you'll not see me as evil for suggesting)
#6889 is aimed at anyone concerned with this issue.
Please review
33 remaining items