How to handle state in the workflow #80
josecelano
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We have a workflow that checks if some Gold images have been updated. We could have new Gold images or some of them could be deleted. For every Gold image, we have the corresponding Base image which should be kept in sync. The base image is a copy from the Gold image with a different size and icc profile. The workflow has some actions to do all the small tasks.
This is the current workflow state:
For the time being, we see two kinds of actions:
The first ones are actions like
dvc-diff,image-resize, etcetera. The second type are the ones that are specific for the media library:validate-filename,validate-folder.Every action has its inputs and outputs and in most cases, inputs are obtained from the previous step outputs. This is a sample output for the
dvc-diffaction:We use that output as an input for the
validate-filenameaction.We have been discussing how we should connect input/ouputs and what format we should use.
Regarding the format, we have been considering JSON.
Prons
Cons
Regarding how to make the data flow in the pipeline we were talking about different approaches:
1.- Totally independent actions.
2.- Long-term process manager.
1.- Totally independent actions
dvc-diffoutput example we do not need the list of images that are not in the dvc cache on the next step.validate-filenameaction is taking the full version of thedvc- diffoutput and extracting the list of added/modified/renamed files. IF we want this action to be totally independent we should remove that code from that action, and maybe add a new action in the middle to transform the output into the next input.2.- Long-term process manager
@da2ce7 proposed an alternative approach. We can consider the whole workflow as a long-term process (kind of saga). WE have to do a lot of small steps/tasks in order to complete/finish the workflow. Every action it's going to get data from the current state and modify the current state. For example:
dvc-diffoutput it's going to be the new next state. The new state contains the list of changed files (in dvc).validate-filename) is going to use the current state as its input. It's going to get data from it and add more data to the state. For example:That's only an example. The idea is every action is going to get this data and add or modify it. This pattern could be similar to Flux Pattern. Instead of passing only the data every action needs, the action can use the full state and generate a new version.
Prons
Cons
Conclusions
job(not workflow) state. This json is the database for the job.We could define for the POC an initial JSON structure. We also have different approaches:
@yeraydavidrodriguez is going to re-write images actions with the first implementation of this approach for the POC.
Beta Was this translation helpful? Give feedback.
All reactions