Skip to content

Detect that Redis Server has Gone Down and Exit Gracefully #80

Open
@melrom

Description

@melrom

From Jack Smith:

BTW, is there a way for a BigJob (BJ) script to detect that the Redis server has shut down while the script is running (i.e., having survived the initial startup while Redis is up)? I submitted a BJ script yesterday, and this morning I found the BJ script still running, but it had not made any progress. And then I noticed that no batch pilot job (PJ) was running or waiting. I then peeked at the PJ’s stderr file and saw the “Please start Redis server!” message. This apparently happened about 23 hrs after the script was launched.

The .sge prolog/epilog file times confirm that was the time the SGE batch PJ finally starting running and almost immediately terminated. So apparently the Redis server had shut down sometime in the interim. However the BJ script seemed to be oblivious to both the Redis server being down and the batch PJ terminating.

I'm not entirely sure if this is possible for BigJob, but maybe if the PJ.get_state reports a Fail or Cancel or can't access the batch PJ anymore, we could do some kind of error handling or at least a warning, "Cannot reach redis server"

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions