Postgres cluster failing over almost daily

Hi there,

we've been using this project in production for almost 6 months now and we've noticed that the database cluster is failing over almost daily - sometime multiple times per day. 

Now our hardware nodes are not failing neither were they restarted. We only noticed this recently since some of our UPDATE queries have failed since the node the client was connected to switched to read-only mode. 

Bellow is the output of the `patronictl history`:

```
+----+--------------+------------------------------+---------------------------+
| TL |          LSN |            Reason            |         Timestamp         |
+----+--------------+------------------------------+---------------------------+
| 40 | 138415350016 | no recovery target specified | 2020-09-19T10:31:39+00:00 |
| 41 | 147132529184 | no recovery target specified | 2020-09-29T13:21:39+00:00 |
| 42 | 147174790504 | no recovery target specified | 2020-09-29T14:56:34+00:00 |
| 43 | 147180413280 | no recovery target specified | 2020-09-29T15:12:55+00:00 |
| 44 | 148553492152 | no recovery target specified | 2020-10-01T06:46:56+00:00 |
| 45 | 148556602160 | no recovery target specified | 2020-10-01T06:54:06+00:00 |
| 46 | 148560542520 | no recovery target specified | 2020-10-01T07:15:46+00:00 |
| 47 | 148563980512 | no recovery target specified | 2020-10-01T07:31:26+00:00 |
| 48 | 148567915040 | no recovery target specified | 2020-10-01T07:40:18+00:00 |
| 49 | 148570174560 | no recovery target specified | 2020-10-01T07:40:50+00:00 |
| 50 | 148574039736 | no recovery target specified | 2020-10-01T07:48:40+00:00 |
| 51 | 148577000976 | no recovery target specified | 2020-10-01T07:58:09+00:00 |
| 52 | 148579532488 | no recovery target specified | 2020-10-01T07:59:21+00:00 |
| 53 | 148581642320 | no recovery target specified | 2020-10-01T08:00:33+00:00 |
| 54 | 149048863192 | no recovery target specified | 2020-10-01T22:15:07+00:00 |
| 55 | 151318348672 | no recovery target specified | 2020-10-04T17:59:06+00:00 |
| 56 | 151333603376 | no recovery target specified | 2020-10-04T18:58:56+00:00 |
| 57 | 151336409800 | no recovery target specified | 2020-10-04T19:04:45+00:00 |
| 58 | 151339036912 | no recovery target specified | 2020-10-04T19:07:05+00:00 |
| 59 | 151340848080 | no recovery target specified | 2020-10-04T19:07:37+00:00 |
| 60 | 152692595768 | no recovery target specified | 2020-10-06T10:46:46+00:00 |
| 61 | 153344788488 | no recovery target specified | 2020-10-07T06:12:16+00:00 |
| 62 | 154048413760 | no recovery target specified | 2020-10-08T03:40:06+00:00 |
| 63 | 154385293904 | no recovery target specified | 2020-10-08T13:08:26+00:00 |
| 64 | 154387331144 | no recovery target specified | 2020-10-08T13:10:40+00:00 |
+----+--------------+------------------------------+---------------------------+
```

Our cluster configuration yaml:

```yaml
apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: cluster-db 
  namespace: database  
spec:
  dockerImage: registry.opensource.zalan.do/acid/spilo-12:1.6-p2
  teamId: "cluster" 
  volume:
    size: 50Gi
  numberOfInstances: 3
  postgresql: 
    version: "12"       
    parameters:
      max_connections: "250"
  tolerations:
  - key: postgres
    operator: Exists
    effect: NoSchedule
```

Is there a way to fix this? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Postgres cluster failing over almost daily #1163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Postgres cluster failing over almost daily #1163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions