Skip to content
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
4e5794f
feat: bounded cache for informers (#1718)
csviri Feb 21, 2023
6c5fafa
fix: typo caffein -> caffeine (#1795)
metacosm Mar 3, 2023
0ae01d1
feat: now possible to only output non-resource related metrics
metacosm Mar 15, 2023
f761dbc
refactor: extract abstract test fixture to add tests with variations
metacosm Mar 16, 2023
0d75b87
fix: add missing annotation
metacosm Mar 16, 2023
805fbe0
tests: add more test variations
metacosm Mar 16, 2023
76d25f9
fix: make operator non-static so it's registered once per test subclass
metacosm Mar 16, 2023
dd2d360
feat: introduce builder for MicrometerMetrics, fix test
metacosm Mar 16, 2023
57cf246
fix: exclude more tags when not collecting per resource
metacosm Mar 16, 2023
b51ea84
fix: registry should be per-instance to ensure test independence
metacosm Mar 17, 2023
9b14151
fix: make sure we wait a little to ensure event is properly processed
metacosm Mar 17, 2023
54ddeec
fix: make things work on Java 11, format
metacosm Mar 17, 2023
feb8b06
fix: also clean metrics on finalizer removal
metacosm Mar 17, 2023
edc9530
fix: format
metacosm Mar 17, 2023
6d14663
refactor: extract common tags
metacosm Mar 20, 2023
916d849
feat: make per-resource collecting finer-grained
metacosm Mar 20, 2023
bc5b5f4
fix: do not create tag for group if not present
metacosm Mar 20, 2023
445c891
fix: remove unreliable no-delay implementation, defaulting to 1s delay
metacosm Mar 22, 2023
7565e1b
refactor: renamed & documented factory methods to make things clearer
metacosm Mar 23, 2023
e22dc75
docs: updated metrics section for code changes
metacosm Mar 23, 2023
9c8d77e
feat: avoid emitting tag on empty value
metacosm Mar 23, 2023
2624f7b
docs: update
metacosm Mar 23, 2023
3fd613e
fix: format
metacosm Mar 23, 2023
ee88028
refactor: use Tag more directly, avoid unneeded work, use constants
metacosm Mar 24, 2023
40441f0
fix: change will happen instead of might
metacosm Mar 24, 2023
3bf2045
docs: add missing timer
metacosm Mar 24, 2023
70462a8
docs: fix wrong & missing information
metacosm Mar 24, 2023
4505761
refactor: add constants
metacosm Mar 24, 2023
31d1326
fix: wording
metacosm Mar 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 53 additions & 24 deletions docs/documentation/features.md
Original file line number Diff line number Diff line change
@@ -774,33 +774,62 @@ ConfigurationServiceProvider.overrideCurrent(overrider->overrider.withMetrics(me

### Micrometer implementation

The micrometer implementation records a lot of metrics associated to each resource handled by the operator by default.
In order to be efficient, the implementation removes meters associated with resources when they are deleted. Since it
might be useful to keep these metrics around for a bit before they are deleted, it is possible to configure a delay
before their removal. As this is done asynchronously, it is also possible to configure how many threads you want to
devote to these operations. Both aspects are controlled by the `MicrometerMetrics` constructor so changing the defaults
is a matter of instantiating `MicrometerMetrics` with the desired values and tell `ConfigurationServiceProvider` about
it as shown above.
The micrometer implementation is typically created using one of the provided factory methods which, depending on which
is used, will return either a ready to use instance or a builder allowing users to customized how the implementation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't builder pattern be consistent so we have only one entry point to the configuration ? MicrometerMetrics.builder() and then be self documented with javadoc. By default, the builder is not configuring per resource and you can switch to the per resource builder if needed.

MicrometerMetrics.newMicrometerMetricsBuilder(new LoggingMeterRegistry())
    .collectingMetricsPerResource(perResourceBuilder ->
        perResourceBuilder.withCleanUpDelayInSeconds(60))
    .build();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having several entry points allows to provide often used configurations without users having to create them with the builder. It also allows to have stable API calls since these factory method implementations can be changed if we change the default behavior while the semantics of the method won't so I'd be in favor of keeping things like they are for now.

behaves, in particular when it comes to the granularity of collected metrics. It is, for example, possible to collect
metrics on a per-resource basis via tags that are associated with meters. This is the default, historical behavior but
this will change in a future version of JOSDK because this dramatically increases the cardinality of metrics, which
could lead to performance issues.

The micrometer implementation records the following metrics:
To create a `MicrometerMetrics` implementation that behaves how it has historically behaved, you can just create an
instance via:

```java
MeterRegistry registry= …;
Metrics metrics=new MicrometerMetrics(registry)
```

Note, however, that this constructor is deprecated and we encourage you to use the factory methods instead, which either
return a fully pre-configured instance or a builder object that will allow you to configure more easily how the instance
will behave. You can, for example, configure whether or not the implementation should collect metrics on a per-resource
basis, whether or not associated meters should be removed when a resource is deleted and how the clean-up is performed.
See the relevant classes documentation for more details.

| Meter name | Type | Tags | Description |
|-----------------------------------------------------------|----------------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| operator.sdk.reconciliations.executions.<reconciler name> | gauge | group, version, kind | Number of executions of the named reconciler |
| operator.sdk.reconciliations.queue.size.<reconciler name> | gauge | group, version, kind | How many resources are queued to get reconciled by named reconciler |
| operator.sdk.<map name>.size | gauge map size | | Gauge tracking the size of a specified map (currently unused but could be used to monitor caches size) |
| operator.sdk.events.received | counter | group, version, kind, name, namespace, scope, event, action | Number of received Kubernetes events |
| operator.sdk.events.delete | counter | group, version, kind, name, namespace, scope | Number of received Kubernetes delete events |
| operator.sdk.reconciliations.started | counter | group, version, kind, name, namespace, scope, reconciliations.retries.last, reconciliations.retries.number | Number of started reconciliations per resource type |
| operator.sdk.reconciliations.failed | counter | group, version, kind, name, namespace, scope, exception | Number of failed reconciliations per resource type |
| operator.sdk.reconciliations.success | counter | group, version, kind, name, namespace, scope | Number of successful reconciliations per resource type |
| operator.sdk.controllers.execution.reconcile.success | counter | controller, type | Number of successful reconciliations per controller |
| operator.sdk.controllers.execution.reconcile.failure | counter | controller, exception | Number of failed reconciliations per controller |
| operator.sdk.controllers.execution.cleanup.success | counter | controller, type | Number of successful cleanups per controller |
| operator.sdk.controllers.execution.cleanup.failure | counter | controller, exception | Number of failed cleanups per controller |

As you can see all the recorded metrics start with the `operator.sdk` prefix.
For example, the following will create a `MicrometerMetrics` instance configured to collect metrics on a per-resource
basis, deleting the associated meters after 5 seconds when a resource is deleted, using up to 2 threads to do so.

```java
MicrometerMetrics.newPerResourceCollectingMicrometerMetricsBuilder(registry)
.withCleanUpDelayInSeconds(5)
.withCleaningThreadNumber(2)
.build()
```

The micrometer implementation records the following metrics:

| Meter name | Type | Tag names | Description |
|-----------------------------------------------------------|----------------|-----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
| operator.sdk.reconciliations.executions.<reconciler name> | gauge | group, version, kind | Number of executions of the named reconciler |
| operator.sdk.reconciliations.queue.size.<reconciler name> | gauge | group, version, kind | How many resources are queued to get reconciled by named reconciler |
| operator.sdk.<map name>.size | gauge map size | | Gauge tracking the size of a specified map (currently unused but could be used to monitor caches size) |
| operator.sdk.events.received | counter | <resource metadata>, event, action | Number of received Kubernetes events |
| operator.sdk.events.delete | counter | <resource metadata> | Number of received Kubernetes delete events |
| operator.sdk.reconciliations.started | counter | <resource metadata>, reconciliations.retries.last, reconciliations.retries.number | Number of started reconciliations per resource type |
| operator.sdk.reconciliations.failed | counter | <resource metadata>, exception | Number of failed reconciliations per resource type |
| operator.sdk.reconciliations.success | counter | <resource metadata> | Number of successful reconciliations per resource type |
| operator.sdk.controllers.execution.reconcile | timer | <resource metadata>, controller | Time taken for reconciliations per controller |
| operator.sdk.controllers.execution.cleanup | timer | <resource metadata>, controller | Time taken for cleanups per controller |
| operator.sdk.controllers.execution.reconcile.success | counter | controller, type | Number of successful reconciliations per controller |
| operator.sdk.controllers.execution.reconcile.failure | counter | controller, exception | Number of failed reconciliations per controller |
| operator.sdk.controllers.execution.cleanup.success | counter | controller, type | Number of successful cleanups per controller |
| operator.sdk.controllers.execution.cleanup.failure | counter | controller, exception | Number of failed cleanups per controller |

As you can see all the recorded metrics start with the `operator.sdk` prefix. `<resource metadata>`, in the table above,
refers to resource-specific metadata and depends on the considered metric and how the implementation is configured and
could be summed up as follows: `group?, version, kind, [name, namespace?], scope` where the tags in square
brackets (`[]`) won't be present when per-resource collection is disabled and tags followed by a question mark are
omitted if the associated value is empty. Of note, when in the context of controllers' execution metrics, these tag
names are prefixed with `resource.`. This prefix might be removed in a future version for greater consistency.

## Optimizing Caches

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
package io.javaoperatorsdk.operator.monitoring.micrometer;

import java.time.Duration;
import java.util.HashSet;
import java.util.Set;

import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.TestInstance;
import org.junit.jupiter.api.extension.RegisterExtension;

import io.fabric8.kubernetes.api.model.ConfigMap;
@@ -21,29 +21,31 @@
import static org.assertj.core.api.Assertions.assertThat;
import static org.awaitility.Awaitility.await;

public class MetricsCleaningOnDeleteIT {
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
public abstract class AbstractMicrometerMetricsTestFixture {
@RegisterExtension
static LocallyRunOperatorExtension operator =
LocallyRunOperatorExtension operator =
LocallyRunOperatorExtension.builder().withReconciler(new MetricsCleaningTestReconciler())
.build();

private static final TestSimpleMeterRegistry registry = new TestSimpleMeterRegistry();
private static final int testDelay = 1;
private static final MicrometerMetrics metrics = new MicrometerMetrics(registry, testDelay, 2);
private static final String testResourceName = "cleaning-metrics-cr";
protected final TestSimpleMeterRegistry registry = new TestSimpleMeterRegistry();
protected final MicrometerMetrics metrics = getMetrics();
protected static final String testResourceName = "micrometer-metrics-cr";

protected abstract MicrometerMetrics getMetrics();

@BeforeAll
static void setup() {
void setup() {
ConfigurationServiceProvider.overrideCurrent(overrider -> overrider.withMetrics(metrics));
}

@AfterAll
static void reset() {
void reset() {
ConfigurationServiceProvider.reset();
}

@Test
void removesMetersAssociatedWithResourceAfterItsDeletion() throws InterruptedException {
void properlyHandlesResourceDeletion() throws Exception {
var testResource = new ConfigMapBuilder()
.withNewMetadata()
.withName(testResourceName)
@@ -55,21 +57,29 @@ void removesMetersAssociatedWithResourceAfterItsDeletion() throws InterruptedExc
await().until(() -> !operator.get(ConfigMap.class, testResourceName)
.getMetadata().getFinalizers().isEmpty());

// check that we properly recorded meters associated with the resource
final var meters = metrics.recordedMeterIdsFor(ResourceID.fromResource(created));
assertThat(meters).isNotNull();
assertThat(meters).isNotEmpty();
final var resourceID = ResourceID.fromResource(created);
final var meters = preDeleteChecks(resourceID);

// delete the resource and wait for it to be deleted
operator.delete(testResource);
await().until(() -> operator.get(ConfigMap.class, testResourceName) == null);

// check that the meters are properly removed after the specified delay
Thread.sleep(Duration.ofSeconds(testDelay).toMillis());
assertThat(registry.removed).isEqualTo(meters);
assertThat(metrics.recordedMeterIdsFor(ResourceID.fromResource(created))).isNull();
postDeleteChecks(resourceID, meters);
}

protected Set<Meter.Id> preDeleteChecks(ResourceID resourceID) {
// check that we properly recorded meters associated with the resource
final var meters = metrics.recordedMeterIdsFor(resourceID);
// metrics are collected per resource
assertThat(registry.getMetersAsString()).contains(resourceID.getName());
assertThat(meters).isNotNull();
assertThat(meters).isNotEmpty();
return meters;
}

protected void postDeleteChecks(ResourceID resourceID, Set<Meter.Id> recordedMeters)
throws Exception {}

@ControllerConfiguration
private static class MetricsCleaningTestReconciler
implements Reconciler<ConfigMap>, Cleaner<ConfigMap> {
@@ -84,7 +94,7 @@ public DeleteControl cleanup(ConfigMap resource, Context<ConfigMap> context) {
}
}

private static class TestSimpleMeterRegistry extends SimpleMeterRegistry {
static class TestSimpleMeterRegistry extends SimpleMeterRegistry {
private final Set<Meter.Id> removed = new HashSet<>();

@Override
@@ -93,5 +103,9 @@ public Meter remove(Meter.Id mappedId) {
this.removed.add(removed.getId());
return removed;
}

public Set<Meter.Id> getRemoved() {
return removed;
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
package io.javaoperatorsdk.operator.monitoring.micrometer;

import java.util.Collections;
import java.util.Set;

import io.javaoperatorsdk.operator.processing.event.ResourceID;
import io.micrometer.core.instrument.Meter;

import static org.assertj.core.api.Assertions.assertThat;

public class DefaultBehaviorIT extends AbstractMicrometerMetricsTestFixture {
@Override
protected MicrometerMetrics getMetrics() {
return MicrometerMetrics.newMicrometerMetricsBuilder(registry).build();
}

@Override
protected Set<Meter.Id> preDeleteChecks(ResourceID resourceID) {
// no meter should be recorded because we're not tracking anything to be deleted later
assertThat(metrics.recordedMeterIdsFor(resourceID)).isEmpty();
// metrics are collected per resource by default for now, this will change in a future release
assertThat(registry.getMetersAsString()).contains(resourceID.getName());
return Collections.emptySet();
}

@Override
protected void postDeleteChecks(ResourceID resourceID, Set<Meter.Id> recordedMeters) {
// meters should be neither recorded, nor removed by default
assertThat(registry.getRemoved()).isEmpty();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package io.javaoperatorsdk.operator.monitoring.micrometer;

import java.time.Duration;
import java.util.Set;

import io.javaoperatorsdk.operator.processing.event.ResourceID;
import io.micrometer.core.instrument.Meter;

import static org.assertj.core.api.Assertions.assertThat;

public class DelayedMetricsCleaningOnDeleteIT extends AbstractMicrometerMetricsTestFixture {

private static final int testDelay = 1;

@Override
protected MicrometerMetrics getMetrics() {
return MicrometerMetrics.newPerResourceCollectingMicrometerMetricsBuilder(registry)
.withCleanUpDelayInSeconds(testDelay).withCleaningThreadNumber(2).build();
}

@Override
protected void postDeleteChecks(ResourceID resourceID, Set<Meter.Id> recordedMeters)
throws Exception {
// check that the meters are properly removed after the specified delay
Thread.sleep(Duration.ofSeconds(testDelay).toMillis());
assertThat(registry.getRemoved()).isEqualTo(recordedMeters);
assertThat(metrics.recordedMeterIdsFor(resourceID)).isNull();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
package io.javaoperatorsdk.operator.monitoring.micrometer;

import java.util.Collections;
import java.util.Set;

import io.javaoperatorsdk.operator.processing.event.ResourceID;
import io.micrometer.core.instrument.Meter;

import static org.assertj.core.api.Assertions.assertThat;

public class NoPerResourceCollectionIT extends AbstractMicrometerMetricsTestFixture {
@Override
protected MicrometerMetrics getMetrics() {
return MicrometerMetrics.withoutPerResourceMetrics(registry);
}

@Override
protected Set<Meter.Id> preDeleteChecks(ResourceID resourceID) {
assertThat(metrics.recordedMeterIdsFor(resourceID)).isEmpty();
assertThat(registry.getMetersAsString()).doesNotContain(resourceID.getName());
return Collections.emptySet();
}
}
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@ public interface Metrics {
/**
* Do initialization if necessary;
*/
default void controllerRegistered(Controller<?> controller) {}
default void controllerRegistered(Controller<? extends HasMetadata> controller) {}

/**
* Called when an event has been accepted by the SDK from an event source, which would result in
@@ -39,6 +39,7 @@ default void receivedEvent(Event event, Map<String, Object> metadata) {}
* @deprecated Use {@link #reconcileCustomResource(HasMetadata, RetryInfo, Map)} instead
*/
@Deprecated(forRemoval = true)
@SuppressWarnings("unused")
default void reconcileCustomResource(ResourceID resourceID, RetryInfo retryInfo,
Map<String, Object> metadata) {}

@@ -58,6 +59,7 @@ default void reconcileCustomResource(HasMetadata resource, RetryInfo retryInfo,
* @deprecated Use {@link #failedReconciliation(HasMetadata, Exception, Map)} instead
*/
@Deprecated(forRemoval = true)
@SuppressWarnings("unused")
default void failedReconciliation(ResourceID resourceID, Exception exception,
Map<String, Object> metadata) {}

@@ -112,6 +114,7 @@ default void finishedReconciliation(ResourceID resourceID) {
* @deprecated Use {@link #finishedReconciliation(HasMetadata, Map)} instead
*/
@Deprecated(forRemoval = true)
@SuppressWarnings("unused")
default void finishedReconciliation(ResourceID resourceID, Map<String, Object> metadata) {}

/**
Original file line number Diff line number Diff line change
@@ -1,11 +1,6 @@
package io.javaoperatorsdk.operator.processing;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.Set;
import java.util.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@@ -25,16 +20,7 @@
import io.javaoperatorsdk.operator.api.config.ControllerConfiguration;
import io.javaoperatorsdk.operator.api.monitoring.Metrics;
import io.javaoperatorsdk.operator.api.monitoring.Metrics.ControllerExecution;
import io.javaoperatorsdk.operator.api.reconciler.Cleaner;
import io.javaoperatorsdk.operator.api.reconciler.Constants;
import io.javaoperatorsdk.operator.api.reconciler.Context;
import io.javaoperatorsdk.operator.api.reconciler.ContextInitializer;
import io.javaoperatorsdk.operator.api.reconciler.DeleteControl;
import io.javaoperatorsdk.operator.api.reconciler.EventSourceContext;
import io.javaoperatorsdk.operator.api.reconciler.EventSourceInitializer;
import io.javaoperatorsdk.operator.api.reconciler.Ignore;
import io.javaoperatorsdk.operator.api.reconciler.Reconciler;
import io.javaoperatorsdk.operator.api.reconciler.UpdateControl;
import io.javaoperatorsdk.operator.api.reconciler.*;
import io.javaoperatorsdk.operator.api.reconciler.dependent.EventSourceNotFoundException;
import io.javaoperatorsdk.operator.api.reconciler.dependent.EventSourceProvider;
import io.javaoperatorsdk.operator.api.reconciler.dependent.EventSourceReferencer;
@@ -56,6 +42,13 @@ public class Controller<P extends HasMetadata>
RegisteredController<P> {

private static final Logger log = LoggerFactory.getLogger(Controller.class);
private static final String CLEANUP = "cleanup";
private static final String DELETE = "delete";
private static final String FINALIZER_NOT_REMOVED = "finalizerNotRemoved";
private static final String RECONCILE = "reconcile";
private static final String RESOURCE = "resource";
private static final String STATUS = "status";
private static final String BOTH = "both";

private final Reconciler<P> reconciler;
private final ControllerConfiguration<P> configuration;
@@ -103,7 +96,7 @@ public UpdateControl<P> reconcile(P resource, Context<P> context) throws Excepti
new ControllerExecution<>() {
@Override
public String name() {
return "reconcile";
return RECONCILE;
}

@Override
@@ -113,12 +106,12 @@ public String controllerName() {

@Override
public String successTypeName(UpdateControl<P> result) {
String successType = "resource";
String successType = RESOURCE;
if (result.isUpdateStatus()) {
successType = "status";
successType = STATUS;
}
if (result.isUpdateResourceAndStatus()) {
successType = "both";
successType = BOTH;
}
return successType;
}
@@ -154,7 +147,7 @@ public DeleteControl cleanup(P resource, Context<P> context) {
new ControllerExecution<>() {
@Override
public String name() {
return "cleanup";
return CLEANUP;
}

@Override
@@ -164,7 +157,7 @@ public String controllerName() {

@Override
public String successTypeName(DeleteControl deleteControl) {
return deleteControl.isRemoveFinalizer() ? "delete" : "finalizerNotRemoved";
return deleteControl.isRemoveFinalizer() ? DELETE : FINALIZER_NOT_REMOVED;
}

@Override
Original file line number Diff line number Diff line change
@@ -238,6 +238,7 @@ synchronized void eventProcessingFinished(
cleanupForDeletedEvent(executionScope.getResourceID());
} else if (postExecutionControl.isFinalizerRemoved()) {
state.markProcessedMarkForDeletion();
metrics.cleanupDoneFor(resourceID, metricsMetadata);
} else {
postExecutionControl
.getUpdatedCustomResource()
Original file line number Diff line number Diff line change
@@ -9,7 +9,8 @@
import org.takes.http.Exit;
import org.takes.http.FtBasic;

import io.fabric8.kubernetes.client.*;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClientBuilder;
import io.javaoperatorsdk.operator.Operator;
import io.javaoperatorsdk.operator.monitoring.micrometer.MicrometerMetrics;
import io.javaoperatorsdk.operator.sample.dependent.ResourcePollerConfig;
@@ -25,7 +26,8 @@ public static void main(String[] args) throws IOException {

KubernetesClient client = new KubernetesClientBuilder().build();
Operator operator = new Operator(client,
overrider -> overrider.withMetrics(new MicrometerMetrics(new LoggingMeterRegistry())));
overrider -> overrider
.withMetrics(MicrometerMetrics.withoutPerResourceMetrics(new LoggingMeterRegistry())));

MySQLSchemaReconciler schemaReconciler = new MySQLSchemaReconciler();