Docker Container Monitoring inetd-style with Chaperone

      Comments Off on Docker Container Monitoring inetd-style with Chaperone

Recently, Chaperone got an exciting new feature that makes it possible to dynamically launch scripts when a port connection occurs.  This is an old idea, and in fact was the purpose of the original inetd “super server” that was part of BSD 4.3 in 1986!

It turns out that this feature is really ideal for lean Docker microservice containers because no processes are started unless a port connection occurs.  It makes it possible to publish one or more ports that perform auxiliary functions without consuming a daemon process to do so.

One extremely useful application is putting monitoring scripts inside the container so they can use knowledge of the application itself to perform better service monitoring. The monitoring scirpts are also part of the container codebase rather than external to it, so as the software evolves, the monitoring scripts can be kept up-to-date by the application developer.

One of our clients shared some code to do just that with a container which implements a single-process Python directory microservice.   It relies upon a RethinkDB database which lives in another container, but latency issues have caused the service to often fail unexpectedly.

I’ll start with a simple example that’s similar to theirs, then show you the script they ended up with.

The Simple and Easy Solution

With just a few lines of Python, here is an internal service monitor which returns a consistent JSON response to describe the container’s health:

#!/usr/bin/python3

import json
import urllib.request
import urllib.error

status = {'status': 'OK'}     # status if no problems

try:
     with urllib.request.urlopen("http://localhost:5001/version") as response:
          jresp = json.loads(response.read().decode())
except Exception as err:
     status = {'status': 'ERROR',
               'message': "Service problem: " + str(err)}

print(json.dumps(status))    # send the result over the socket port

The script above checks to see if the directory service (running on port 5001) responds normally with its version number (a JSON response).

Then, here is the Chaperone configuration which “connects” the port to the above script (which was /serv_app/monitor inside their container):

# Define monitor service on port 7101
monitor-port.service: {
  type: inetd, port: 7101,
  command: "/serv_app/monitor",
}

The above configuration tells Chaperone to listen on TCP port 7101 for any connections, then when one is received, launch the /serv_app/monitor script and connect the socket to stdin and stdout.  That makes it easy for simple scripts to act as genuine TCP services.

Once the container is running, any TCP client can check its status.  For example, assume that the container exposes port 7101 at service.example.com.  Accessing the port yields a simple JSON result if the service is ok:

$ nc service.example.com 7101
{"status": "OK"}

and a structured error response if there is a problem:

{"status": "ERROR", "message": "Service does not respond: <urlopen error [Errno 111] Connection refused>"}

That’s all there is to it!   Now, port 7101 can be used by a wide variety of service monitoring tools, and it’s possible to support consistent monitoring for all containers.

The Final Monitoring Script

The above is quite simplified, and our client wanted to do a bit more.  Primarily, they wanted to not only check to see if the service was running in the container, but also wanted to check to be sure the RethinkDB instance was reachable, and provide diagnostic information to the monitoring tool.

The final script ended up looking like this:

#!/usr/bin/python3

import os
import json
import rethinkdb
import urllib.request
import urllib.error

def result(err = None):    # returns an OK-result or error in JSON
     r = {'monitor-version': 1.0,
          'status': 'OK'}
     if err:
          r['status'] = 'ERROR'
          r['message'] = err
     print(json.dumps(r))
     exit(0)

def check_services():
    # Try the service INSIDE the container first
    try:
         with urllib.request.urlopen("http://localhost:5000/version") as response:
              jresp = json.loads(response.read().decode())
              version = jresp['dirserv-version']
              if version != 1.0:
                   result("dirserv-version has unexpected value: " + str(version))
    except urllib.error.URLError as err:
         result("Service does not respond: " + str(err))

    HOST = os.environ["HOST_IP"]
    PORT = 28015

    # The local service is OK, make sure we are also seeing the
    # RethinkDB instance.
    try:
         db = rethinkdb.connect(HOST, PORT)
    except rethinkdb.errors.ReqlDriverError as derr:
         result("RethinkDB endpoint error: " + str(derr))

    result()

try:
     check_services()
except Exception as ex:
     result("Unexpected error: " + str(ex))

result()

Their entire Chaperone configuration for both the app and the monitoring service is here:

settings: {
  env_set: { 
    # Derive the IP of our docker host from the default route
    HOST_IP: "`ip route | awk '/default/ {print $3}'`",
   }
}

# Define cluster directory service
clusterdir.service: {
  type: simple,
  command: "python3 /serv_app/cluster-directory.py",
  service_groups: "IDLE",
}

# Define monitor service on port 7101
monitor-port.service: {
  type: inetd, port: 7101,
  command: "/serv_app/monitor",
}

Note above how HOST_IP is derived, and points to the interface of the Docker host.  In their actual script, the HOST_IP is provided on the docker run command line, since the RethinkDB instance is not always running on the same host.

Futher Reading

Comments

comments