Healthcheck

Overview

Heathcheck is a feature within Docker Swarm automating the detection and recovery of unhealthy containers. It operates by periodically assessing the health of running containers based on predefined criteria. These criteria are typically defined in the service configuration and can include custom scripts, HTTP endpoints, or other mechanisms providing insight into the container’s health.

In OnSphere the implementation of healthcheck is specific to each modules. We try to have the most accurate healthcheck available depending of the context. When there is no advanced methodology to check the health of a module a default one is used.

Resources

See healthcheck documentation See module capabilities for table

Default healthcheck

The healthcheck is used by Swarm to monitor the state of service and restart them if they are failing.

By default the health check has the following properties :

  • Runs with a 5 seconds interval (wait time before launching the next one)

  • Has a start period of 300 seconds

  • Has 15 seconds timeout

Validation process

The modules write the current time to a file. This file is checked at each interval and considered invalid if the time exceeds 20 seconds.

Overriding healthcheck

It’s always possible to override the default healthcheck configuration inside the module.service file :

services:
    ${{service-name}}:
    image: ${{image-repository}}osp-<module-name>${{image-version}}
    networks:
    - "back"
    - "outside_access"
    ${{ports}}:
    - ${{debug-port:5005}}
    secrets:
    - ${{auto-generated-secret-access}}
    healthcheck:
        test: ["CMD", "curl -f http://localhost:2368 || exit 1"]
        timeout: 30s
        interval: 1m
        retries: 3

See the docker official documentation

Nginx healthcheck

The nginx healthcheck try to wget an HTTPS or HTTP file with default host configured.