You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
specs: VEP-2272 Complete Data Job Configuration Persistence Part 2 (#2302)
This change introduces VEP-2272, which aims at proposing an improvement
to Versatile Data Kit by switching the source of truth from Kubernetes
to a database.
Signed-off-by: Miroslav Ivanov miroslavi@vmware.com
---------
Signed-off-by: Miroslav Ivanov miroslavi@vmware.com
By using [Scheduler Lock](https://github.com/lukas-krecan/ShedLock), we can achieve an active-passive pattern across the
106
+
Control Service instances. In simpler terms, only one instance of the Control Service will be able to synchronize Cron
107
+
Jobs at any given time. The process will be executed with a fixed period between invocations, where the period will be
108
+
configurable and the appropriate one will be determined during the implementation.
109
+
110
+
The overall purpose of this process is to iterate through the data job deployment configurations stored in the database
111
+
and determine whether the corresponding Cron Job in Kubernetes is synchronized and up to date. If the Cron Job is not
112
+
synchronized with the database, the process will initiate a new data job deployment. This deployment may involve tasks
113
+
such as building an image or updating the Cron Job template.
114
+
115
+
In order to enhance the process further, the proposed asynchronous process will incorporate a mechanism to determine
116
+
whether a Cron Job needs to be updated or not. This will be achieved by utilizing the hash
117
+
code (`deployment_version_sha`) of the
118
+
Cron Job YAML configuration, which will be stored in the database after each deployment.
119
+
120
+
By introducing this asynchronous process, you can ensure that the database and Kubernetes remain in sync, maintaining
121
+
data integrity and enabling smooth operation of the system.
99
122
100
-
### Kubernetes Cron Jobs Synchronizer Process
101
-
102
-
In order to maintain data integrity and synchronization between database and Kubernetes we will introduce new
103
-
asynchronous process. The process will be based on Spring Scheduler and Scheduler Lock similar
104
-
to [DataJobMonitorCron.watchJobs()](https://github.com/vmware/versatile-data-kit/blob/main/projects/control-service/projects/pipelines_control_service/src/main/java/com/vmware/taurus/service/monitoring/DataJobMonitorCron.java#L84).
105
-
106
-
Basically, it will iterate over the data job deployment configurations from the database and check whether
107
-
the Cron Job is already synchronized or it needs to be synchronized. If it is not in sync with the database the process
108
-
will initiate new data job deployment which may include building image or just changing the template. It will be
109
-
based on the sha of the cronjob template, which will be generated right after the job deployment.
110
-
111
-
<!--
112
-
Dig deeper into each component. The section can be as long or as short as necessary.
113
-
Consider at least the below topics but you do not need to cover those that are not applicable.
114
-
115
-
### Capacity Estimation and Constraints
116
-
* Cost of data path: CPU cost per-IO, memory footprint, network footprint.
117
-
* Cost of control plane including cost of APIs, expected timeliness from layers above.
118
-
### Availability.
119
-
* For example - is it tolerant to failures, What happens when the service stops working
120
-
### Performance.
121
-
* Consider performance of data operations for different types of workloads.
122
-
Consider performance of control operations
123
-
* Consider performance under steady state as well under various pathological scenarios,
124
-
e.g., different failure cases, partitioning, recovery.
125
-
* Performance scalability along different dimensions,
126
-
e.g. #objects, network properties (latency, bandwidth), number of data jobs, processed/ingested data, etc.
127
123
### Database data model changes
128
-
### Telemetry and monitoring changes (new metrics).
129
-
### Configuration changes.
130
-
### Upgrade / Downgrade Strategy (especially if it might be breaking change).
131
-
* Data migration plan (it needs to be automated or avoided - we should not require user manual actions.)
124
+
125
+
To ensure comprehensive synchronization between Kubernetes and the database, it is important to replicate all relevant
126
+
data job configuration properties from Kubernetes to the database. Here is a list of properties that should be
127
+
replicated:
128
+
129
+
*`git_commit_sha`
130
+
*`vdk_version`
131
+
*`python_version`
132
+
*`cpu_request`
133
+
*`cpu_limit`
134
+
*`memory_request`
135
+
*`memory_limit`
136
+
*`deployed_by`
137
+
*`deployed_date`
138
+
*`deployment_version_sha`
139
+
140
+
The columns shown above were found by comparing the properties of the current Cron Job with the columns in the database.
141
+
142
+
In order to accommodate the replication of the additional data job configuration properties from Kubernetes to the
143
+
database, it would be necessary to add a table called `data_job_deployment` to the existing database model.
144
+
This table will have the mentioned columns, and the relationship between `data_job` and `data_job_deployment` tables
145
+
will be one-to-one. This approach will normalize the database model and make it suitable for accommodating
146
+
multiple deployments for each data job in the future.
147
+
148
+
By making these database model changes, the system can effectively store and synchronize all the required data job
149
+
configuration details between Kubernetes and the database.
0 commit comments