Description
Hi guys,
I had this bug, then tried to migrate to 2.4.0rc but there the behavior is more unpredictible (details here). So, I came back to 2.3.0 and tried to debug.
the infinite loop happens when you are connected with a secured TCP socket in a long duration connection, is connected to a wifi router but the wifi router lost internet access.
ClientContext write
tries to send the package and after a few tries logs the error :wr !ERR_OK
and returns 0
err_t err = tcp_write(_pcb, data, will_send, 0);
if(err != ERR_OK) {
DEBUGV(":wr !ERR_OK\r\n");
return 0;
}
but the tls1.c send_raw_packet
function (from igrr/axtls-8266 library - used this tree version to build and debug) has this while
code that just gives error with a ret < 0
while (sent < pkt_size)
{
ret = SOCKET_WRITE(ssl->client_fd,
&ssl->bm_all_data[sent], pkt_size-sent);
if (ret >= 0)
sent += ret;
else
{
#ifdef WIN32
if (GetLastError() != WSAEWOULDBLOCK)
#else
if (errno != EAGAIN && errno != EWOULDBLOCK)
#endif
return SSL_ERROR_CONN_LOST;
}
#ifndef ESP8266
/* keep going until the write buffer has some space */
if (sent != pkt_size)
{
fd_set wfds;
FD_ZERO(&wfds);
FD_SET(ssl->client_fd, &wfds);
/* block and wait for it */
if (select(ssl->client_fd + 1, NULL, &wfds, NULL, NULL) < 0)
return SSL_ERROR_CONN_LOST;
}
#endif
}
so it never detects the error and stay in loop.
Is this analysis right?
I was thinking to change the ClientContext write to return err
on error. But I cannot predict all the impacts of this change.
Solution (so far so good)
ClientContext write
err_t err = tcp_write(_pcb, data, will_send, 0);
if(err != ERR_OK) {
DEBUGV(":wr !ERR_OK\r\n");
return err; //<- CHANGED HERE
}
tsl1.c send_raw_packet
while (sent < pkt_size)
{
ret = SOCKET_WRITE(ssl->client_fd,
&ssl->bm_all_data[sent], pkt_size-sent);
if (ret >= 0)
sent += ret;
else
{
errno = ret; //? can I do this? //<- CHANGED HERE
#ifdef WIN32
if (GetLastError() != WSAEWOULDBLOCK)
#else
if (errno != EAGAIN && errno != EWOULDBLOCK)
#endif
return SSL_ERROR_CONN_LOST;
}
#ifndef ESP8266
/* keep going until the write buffer has some space */
if (sent != pkt_size)
{
fd_set wfds;
FD_ZERO(&wfds);
FD_SET(ssl->client_fd, &wfds);
/* block and wait for it */
if (select(ssl->client_fd + 1, NULL, &wfds, NULL, NULL) < 0)
return SSL_ERROR_CONN_LOST;
}
#endif
}
Activity
cristovao-trevisan commentedon Aug 21, 2017
Hello @odelot, where is the tsl1.c filr? I am having what seems to be same problem, but cannot find the file to test your solution.
odelot commentedon Aug 21, 2017
@cristovao-trevisan this file is from igrr/axtls-8266 library - they include it compiled
as it does not have tag versions (and I am not very good using git) I took the date of 2.3.0 release to get a version of the source code. I've used this snapshot ->
https://github.com/igrr/axtls-8266/tree/ab516f799dd43faccaf01ce5eacd5f7a1b9b109a
I´ve had to configure a linux environment to compile it. I can send you the binary, but I am not sure if this is the right fix. Need to test more and have the opinion from those who are used to these libs and files.
cristovao-trevisan commentedon Aug 22, 2017
Since errno is global there is no need to change the axtls library. My solution was to create a retry count and close the connection after 25 tries (got the number somewhere in this repo). Here it is (in file libraries/ESP8266WiFi/src/WiFiClientSecure.cpp):
Not sure about the global variable, but
ax_port_write
is already global (and theClientContext
is static), so I don't think there is a problem. Also didn't check for memory leaks, but it does prevent the issue for now.Can someone confirm this solution so I can make a PR out of it?
Fixes esp8266#3537
odelot commentedon Aug 22, 2017
Wow. nice. Way better than my fix (Y) I didn't realize that errno was global.
Have you tried this fix on 2.4.0rc? In this release the WifiClientSecure.write instead of went to a infinite loop was returning that it has sent data even without internet. I didn't debug this code to find why.
cristovao-trevisan commentedon Aug 23, 2017
I will try to dig more on this (maybe on sunday), I didn't check for the result
martinren87 commentedon Sep 27, 2017
Hello guys, are there any news regarding to this issue? Will it be fixed soon? Or can anybody share with me a fixed unofficial version of the library? Thank you!
cristovao-trevisan commentedon Sep 28, 2017
If you're using v2.3 follow my instructions above and replace the code. For v2.4 take a look at my PR (fork referenced here), though I didn't test it (2.4, cause 2.3 fix works for sure).
devyte commentedon May 29, 2018
The referenced code in ClientContext has since been rewritten, and seems to have a different behavior now.
Also, BearSSL is merged in #4273 , with alternate BearSSL::WiFi* classes. Although axtls-based classes are still available and even the default, they are planned for deprecation and then retirement, hence won't be fixed. Any issues with BearSSL-based classes should be reported in new issues.
Closing this. If the issue is still relevant, please open a new issue.