Skip to content

DNS Socket Poll Returns Stale Timestamp Causing 100% CPU #1096

@nicola-orlandi-one

Description

@nicola-orlandi-one

I encountered a situation where once a DNS request fails, the DNS socket gets continuously polled, causing the network stack to consume 100% CPU.
This situation occurs when the ARP request for the gateway address, the first hop where the DNS packet is sent, continuously fails.
When the DNS query times out, the dns::Socket::get_query_result method frees the slot related to the query. From that moment on, the dns::Socket::poll_at method correctly returns PollAt::Ingress, since there are no more pending DNS queries.
The Interface::poll_at method calls the Meta::poll_at method, which returns PollAt::Time(silent_until), since neighbor_state = Waiting because the ARP request has not yet been resolved.
However, silent_until is no longer updated in Interface::socket_egress because the Meta::neighbor_missing method is no longer invoked: this happens because, since there are no more pending DNS queries, the dns::Socket::dispatch method returns Ok. The Meta::neighbor_missing method is only invoked if Socket::dispatch returns EgressError::Dispatch.
Therefore poll_at returns a "stale" timestamp, thus causing continuous polling of the socket and consequently 100% CPU usage.
A possible solution, as implemented in ONE-S-r-l@5ed7cf6, is that the Interface::poll_at method doesn't return any instant (i.e., returns None) when the Socket::poll_at method returns PollAt::Ingress, without invoking the Meta::poll_at method.
Is this solution acceptable?

Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions