-
Notifications
You must be signed in to change notification settings - Fork 499
Description
I encountered a situation where once a DNS request fails, the DNS socket gets continuously polled, causing the network stack to consume 100% CPU.
This situation occurs when the ARP request for the gateway address, the first hop where the DNS packet is sent, continuously fails.
When the DNS query times out, the dns::Socket::get_query_result method frees the slot related to the query. From that moment on, the dns::Socket::poll_at method correctly returns PollAt::Ingress, since there are no more pending DNS queries.
The Interface::poll_at method calls the Meta::poll_at method, which returns PollAt::Time(silent_until), since neighbor_state = Waiting because the ARP request has not yet been resolved.
However, silent_until is no longer updated in Interface::socket_egress because the Meta::neighbor_missing method is no longer invoked: this happens because, since there are no more pending DNS queries, the dns::Socket::dispatch method returns Ok. The Meta::neighbor_missing method is only invoked if Socket::dispatch returns EgressError::Dispatch.
Therefore poll_at returns a "stale" timestamp, thus causing continuous polling of the socket and consequently 100% CPU usage.
A possible solution, as implemented in ONE-S-r-l@5ed7cf6, is that the Interface::poll_at method doesn't return any instant (i.e., returns None) when the Socket::poll_at method returns PollAt::Ingress, without invoking the Meta::poll_at method.
Is this solution acceptable?
Thanks in advance