container-remove event does not set exit code

### Feature request description

cockpit-podman is plagued with lots of race conditions and flaky tests. I have [investigated many of them](https://github.com/cockpit-project/cockpit-podman/pull/1324), but the remaining ones are due to a fundamental issue with the monitoring API.

The UI uses the [libpod/events API](https://docs.podman.io/en/latest/_static/api.html#tag/system/operation/SystemEventsLibpod), which notifies about high-level actions such as `start` or `died`, for example:
```json
{"status":"start","id":"39c0313e0e35c49f56fa3b8a0c228cc6a58455846d5271c05365fa0df56876a2","from":"localhost/test-busybox:latest","Type":"container","Action":"start","Actor":{"ID":"39c0313e0e35c49f56fa3b8a0c228cc6a58455846d5271c05365fa0df56876a2","Attributes":{"containerExitCode":"0","image":"localhost/test-busybox:latest","name":"swamped-crate","podId":""}},"scope":"local","time":1688533611,"timeNano":1688533611933738591}
```

However, this does not contain any (or at least most) of the properties that the UI needs to show, so [in reaction to these](https://github.com/cockpit-project/cockpit-podman/blob/17e9f8f84a707d27d84f7a97dd3d361aeb96ae14/src/app.jsx#L380), the UI does a `containers/json` query for that container:

```json
{"method":"GET","path":"/v1.12/libpod/containers/json","body":"","params":{"all":true,"filters":"{\"id\":[\"39c0313e0e35c49f56fa3b8a0c228cc6a58455846d5271c05365fa0df56876a2\"]}"}}
```

which then responds with all the info that the UI needs:
```json
[{"AutoRemove":false,"Command":["sh","-c","echo 123; sleep infinity"],"Created":"2023-07-05T05:06:41.355776664Z","CreatedAt":"","Exited":false,"ExitedAt":1688533611,"ExitCode":0,"Id":"39c0313e0e35c49f56fa3b8a0c228cc6a58455846d5271c05365fa0df56876a2","Image":"localhost/test-busybox:latest","ImageID":"24ac8b76cfb0440579ade1908a8a765d3c8a62bd366058cf84e1a7d6754ee585","IsInfra":false,"Labels":null,"Mounts":[],"Names":["swamped-crate"],"Namespaces":{},"Networks":[],"Pid":19176,"Pod":"","PodName":"","Ports":null,"Size":null,"StartedAt":1688533611,"State":"running","Status":""}]
```

The problem is that this is racy: The /containers/json call is necessarily async, and when events come in bursts, they will then overlap. But their *replies* from podman are not coming in in the same order. This is a log capture from a part of the test where it does a few container operations like stopping and restarting a container. I stripped out all the JSON data for clarity, the important bit is the ordering:

```
> debug: podman user call 44:
> debug: podman user call 45:
> debug: podman user call 45 result:
> debug: podman user call 46:
> debug: podman user call 44 result:
> debug: podman user call 47:
> debug: podman user call 46 result:
> debug: podman user call 47 result:
> debug: podman user call 48:
> debug: podman user call 43 result:
> debug: podman user call 49:
> debug: podman user call 49 result:
> debug: podman user call 50:
> debug: podman user call 48 result:
> debug: podman user call 51:
> debug: podman user call 50 result
> debug: podman user call 52:
> debug: podman user call 52 result:
> debug: podman user call 51 result:
```

So if the container moves from "Running" → "Exited" → "Stopped" → "Restarting" → "Running", a jumbled response order can lead to swaps, and the final state reported in the UI is e.g. "Restarting" or "Exited". The latter happened in [this run](https://cockpit-logs.us-east-1.linodeobjects.com/pull-1324-20230705-045900-5a28f0d8-fedora-38-pybridge/log.html#25), where the [screenshot](https://cockpit-logs.us-east-1.linodeobjects.com/pull-1324-20230705-045900-5a28f0d8-fedora-38-pybridge/TestApplication-testLifecycleOperationsUser2-fedora-38-127.0.0.2-2301-FAIL.png) says "Exited", but `podman ps` says "Up" (i.e. "Running"), as can be seen in the "----- user containers -----" dump in the log.

### Suggest potential solution

My prefered solution would be to avoid having to call `/containers/json` after a "start" or "rename" event in the first place. That only leads to additional API traffic and thus more computational overhead on both the podman and the UI side, and is prone to these  kinds of race conditions. D-Bus services like systemd or udisks generally solved this with the [PropertiesChanged signal](https://dbus.freedesktop.org/doc/dbus-specification.html#standard-interfaces-properties), i.e. there is a notification with the set of changed properties each time when there is a change. These are naturally ordered correctly, and the watcher can tally them up to always have an accurate model of the state without having to do extra "get" calls.

For the podman API, this cannot just be squeezed into the existing `start` (or `remove`, etc.) events, as the container properties can change more often, and also independently  from the coarse-grained lifecycle events.

Perhaps this could introduce a new event type `changed` that gets fired whenever any property changes, and deliver the /containers/json info for the container(s) which changed. Both "your" (podman) and "my" (cockpit-podman) sides already have the code to generate/parse this information, it would just mean some minor plumbing changes.

If this is expensive, it may also be adequate to explicitly opt into getting these notifications, although connecting to /events generally already means that the listener wants to know this kind of information.

### Have you considered any alternatives?

It may also be possible to change podman to not reply to requests out of order. I don't know how easy/hard that is with Go and coroutines. I know that it is very hard in JavaScript on the client side to reorder the replies.

It might be easier on our side to completely serialize all API calls, but that would make the UI very slow especially if there are many containers. These are independent from each other, so serializing calls is not conceptually necessary.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

container-remove event does not set exit code #19124

Feature request description

Suggest potential solution

Have you considered any alternatives?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

container-remove event does not set exit code #19124

Description

Feature request description

Suggest potential solution

Have you considered any alternatives?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions