Skip to content

[perf] MCAD constantly throttled #434

Open
@kpouget

Description

@kpouget

When looking at the MCAD logs, I see that it is constantly being throttled, and it seems to be requesting all the CRDs available in the cluster:

I0626 13:23:58.178716       1 request.go:591] Throttling request took 537.935392ms, request: GET:https://172.30.0.1:443/apis/monitoring.coreos.com/v1alpha1?timeout=32s
I0626 13:23:58.189164       1 request.go:591] Throttling request took 548.378517ms, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1alpha1?timeout=32s
I0626 13:23:58.198620       1 request.go:591] Throttling request took 557.834296ms, request: GET:https://172.30.0.1:443/apis/scheduling.k8s.io/v1?timeout=32s
I0626 13:23:58.209032       1 request.go:591] Throttling request took 568.24236ms, request: GET:https://172.30.0.1:443/apis/imageregistry.operator.openshift.io/v1?timeout=32s
I0626 13:23:58.218479       1 request.go:591] Throttling request took 577.692789ms, request: GET:https://172.30.0.1:443/apis/serving.kserve.io/v1alpha1?timeout=32s
I0626 13:23:58.229013       1 request.go:591] Throttling request took 588.223945ms, request: GET:https://172.30.0.1:443/apis/network.openshift.io/v1?timeout=32s
I0626 13:23:58.238502       1 request.go:591] Throttling request took 597.708975ms, request: GET:https://172.30.0.1:443/apis/autoscaling.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.249066       1 request.go:591] Throttling request took 608.272682ms, request: GET:https://172.30.0.1:443/apis/coordination.k8s.io/v1?timeout=32s
I0626 13:23:58.258451       1 request.go:591] Throttling request took 617.672599ms, request: GET:https://172.30.0.1:443/apis/helm.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.268227       1 request.go:591] Throttling request took 627.428484ms, request: GET:https://172.30.0.1:443/apis/cloud.network.openshift.io/v1?timeout=32s
I0626 13:23:58.278857       1 request.go:591] Throttling request took 638.056671ms, request: GET:https://172.30.0.1:443/apis/node.k8s.io/v1?timeout=32s
I0626 13:23:58.288982       1 request.go:591] Throttling request took 648.181891ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1beta1?timeout=32s
I0626 13:23:58.298386       1 request.go:591] Throttling request took 657.590819ms, request: GET:https://172.30.0.1:443/apis/network.operator.openshift.io/v1?timeout=32s
I0626 13:23:58.308934       1 request.go:591] Throttling request took 668.127925ms, request: GET:https://172.30.0.1:443/apis/discovery.k8s.io/v1?timeout=32s
I0626 13:23:58.318269       1 request.go:591] Throttling request took 677.466823ms, request: GET:https://172.30.0.1:443/apis/cloudcredential.openshift.io/v1?timeout=32s
I0626 13:23:58.328651       1 request.go:591] Throttling request took 687.847126ms, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v2?timeout=32s
I0626 13:23:58.338012       1 request.go:591] Throttling request took 697.203084ms, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta2?timeout=32s
I0626 13:23:58.348316       1 request.go:591] Throttling request took 707.516076ms, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
I0626 13:23:58.358629       1 request.go:591] Throttling request took 717.817759ms, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1?timeout=32s
I0626 13:23:58.368962       1 request.go:591] Throttling request took 728.145512ms, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta1?timeout=32s
I0626 13:23:58.378359       1 request.go:591] Throttling request took 737.54763ms, request: GET:https://172.30.0.1:443/apis/migration.k8s.io/v1alpha1?timeout=32s
I0626 13:23:58.388688       1 request.go:591] Throttling request took 747.866993ms, request: GET:https://172.30.0.1:443/apis/config.openshift.io/v1?timeout=32s
I0626 13:23:58.398111       1 request.go:591] Throttling request took 757.287052ms, request: GET:https://172.30.0.1:443/apis/kfdef.apps.kubeflow.org/v1?timeout=32s
I0626 13:23:58.408403       1 request.go:591] Throttling request took 767.582294ms, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1?timeout=32s
I0626 13:23:58.418845       1 request.go:591] Throttling request took 778.011698ms, request: GET:https://172.30.0.1:443/apis/apps.openshift.io/v1?timeout=32s
I0626 13:23:58.428329       1 request.go:591] Throttling request took 787.499978ms, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
I0626 13:23:58.438809       1 request.go:591] Throttling request took 797.974212ms, request: GET:https://172.30.0.1:443/apis/authorization.openshift.io/v1?timeout=32s
I0626 13:23:58.448058       1 request.go:591] Throttling request took 807.243089ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1alpha1?timeout=32s
I0626 13:23:58.458594       1 request.go:591] Throttling request took 817.752805ms, request: GET:https://172.30.0.1:443/apis/build.openshift.io/v1?timeout=32s
I0626 13:23:58.467979       1 request.go:591] Throttling request took 827.137372ms, request: GET:https://172.30.0.1:443/apis/oauth.openshift.io/v1?timeout=32s
I0626 13:23:58.478054       1 request.go:591] Throttling request took 837.214141ms, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v1?timeout=32s
I0626 13:23:58.488482       1 request.go:591] Throttling request took 847.641595ms, request: GET:https://172.30.0.1:443/apis/project.openshift.io/v1?timeout=32s
I0626 13:23:58.498475       1 request.go:591] Throttling request took 857.702755ms, request: GET:https://172.30.0.1:443/apis/codeflare.codeflare.dev/v1alpha1?timeout=32s
I0626 13:23:58.507913       1 request.go:591] Throttling request took 867.086812ms, request: GET:https://172.30.0.1:443/apis/kubeflow.org/v1?timeout=32s

this cannot be a good thing for performance.

Activity

asm582

asm582 commented on Jun 26, 2023

@asm582
Member

Agree, we need to remove unused controllers in MCAD, here is a PR that we can revive: #277

changed the title [-][perf] MCAD constantly thottled[/-] [+][perf] MCAD constantly throttled[/+] on Aug 2, 2023
asm582

asm582 commented on Aug 24, 2023

@asm582
Member

@astefanutti MCAD from the main branch has some remediations around this problem, can you recommend what more could be improved?

astefanutti

astefanutti commented on Aug 25, 2023

@astefanutti
Contributor

@asm582 these messages seem to be caused by an excessive usage of the discovery API, which are being client-side throttle, despite the QPS and maximum burst limits have already been increased.

We could speculatively close this, given the large refactoring that has happened lately, but a quick search in the code points to the genericresource.go file, that calls the discovery API for mapping the generic resources GVK to the API resource.

My suggestion would be to look at it more closely, and consider putting some caching or rate limiting mechanisms in place, for consuming the discovery API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @astefanutti@asm582@kpouget

        Issue actions

          [perf] MCAD constantly throttled · Issue #434 · project-codeflare/multi-cluster-app-dispatcher