Docker deployments with zero downtime
The swarm mode with docker service command introduced in version 1.12.0 aims to be a good tool for scaling your application and one of the nice feature promised is zero downtime deployments, which I’m going to try in this post.
UPD: Unfortunately latest docker versions keeps forwarding new connections to removed service tasks,
while containers got SIGTERM
and started graceful shutdown,
which makes no-downtime rolling updates impossible, at least without using an external load balancer.
There are few related issues which should fix it (still open):
You can subscribe for issue notifications to get updates, meanwhile - get familiar with deployment process …
Application
First - we need our mission critical application to be packed in a docker image, and should respond to SIGTERM
signal to gracefully shutdown in timely manner.
Bootstrap
I’ll use a basic expressjs app generated with express-generator, which will just respond with a plain text response on it’s root path (source code can be found on github).
$ npm install -g express-generator
$ express --view=pug --git -f docker-deploy-test
create : docker-deploy-test
create : docker-deploy-test/package.json
create : docker-deploy-test/app.js
create : docker-deploy-test/.gitignore
create : docker-deploy-test/public
create : docker-deploy-test/public/javascripts
create : docker-deploy-test/public/images
create : docker-deploy-test/public/stylesheets
create : docker-deploy-test/public/stylesheets/style.css
create : docker-deploy-test/routes
create : docker-deploy-test/routes/index.js
create : docker-deploy-test/routes/users.js
create : docker-deploy-test/views
create : docker-deploy-test/views/index.pug
create : docker-deploy-test/views/layout.pug
create : docker-deploy-test/views/error.pug
create : docker-deploy-test/bin
create : docker-deploy-test/bin/www
install dependencies:
$ cd docker-deploy-test && npm install
run the app:
$ DEBUG=docker-deploy-test:* npm start
Install dependencies
$ cd docker-deploy-test && npm install
<long output of installed dependencies tree should be displayed here>
And run the app:
$ DEBUG=docker-deploy-test:* npm start
> [email protected] start /Users/vadim/projects/lostintime/docker-deploy-test
> node ./bin/www
docker-deploy-test:server Listening on port 3000 +0ms
Now on http://localhost:3000/ you can see this nice looking page.
Let’s run some benchmarks on this and have a base to compare later results with.
$ ab -c 10 -n 1000 "http://localhost:3000/"
This is ApacheBench, Version 2.3 <$Revision: 1748469 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Server Software:
Server Hostname: localhost
Server Port: 3000
Document Path: /
Document Length: 170 bytes
Concurrency Level: 10
Time taken for tests: 4.438 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 366000 bytes
HTML transferred: 170000 bytes
Requests per second: 225.31 [#/sec] (mean)
Time per request: 44.382 [ms] (mean)
Time per request: 4.438 [ms] (mean, across all concurrent requests)
Transfer rate: 80.53 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 7 44 6.0 42 68
Waiting: 7 44 6.0 42 68
Total: 8 44 6.0 42 68
Percentage of the requests served within a certain time (ms)
50% 42
66% 43
75% 44
80% 47
90% 54
95% 57
98% 61
99% 63
100% 68 (longest request)
To summarize the previous output:
- Longest request took
68ms
- Complete requests:
1000
- Failed requests:
0
Docker image
Building docker image for nodejs application is damn simple, just use official onbuild
node image, the Dockerfile
will look like this:
FROM node:6.9-onbuild
CMD ["node", "./bin/www"]
One thing to notice here - is custom CMD
, this is caused by the issue with npm, not handling SIGTERM
properly: https://github.com/npm/npm/issues/4603, https://github.com/dickeyxxx/npm-register/issues/43.
Build the image and push to docker hub (or your private registry):
$ docker build -t lostintime/docker-deploy-test:v1 .
< long build process output here>
$ docker images|grep docker-deploy-test
lostintime/docker-deploy-test:v1 latest 88e65257a7ca 10 seconds ago 671 MB
$ docker push lostintime/docker-deploy-test:v1
...
To be sure it works - run the app with docker (don’t forget to stop previously running node application, ctrl+c
may help with this), and then stop container with docker stop
:
$ docker run --rm -p 127.0.0.1:3000:3000 -e "DEBUG=docker-deploy-test:*" lostintime/docker-deploy-test:v1
Thu, 09 Feb 2017 19:03:46 GMT docker-deploy-test:server Listening on port 3000
Thu, 09 Feb 2017 19:03:57 GMT docker-deploy-test:server Got SIGTERM
Thu, 09 Feb 2017 19:03:57 GMT docker-deploy-test:server Server bind closed
Our benchmark looks pretty similar at this step too:
ab -c 10 -n 1000 "http://localhost:3000/"
...
Time taken for tests: 5.256 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 366000 bytes
HTML transferred: 170000 bytes
Requests per second: 190.26 [#/sec] (mean)
Time per request: 52.561 [ms] (mean)
Time per request: 5.256 [ms] (mean, across all concurrent requests)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 11 52 6.9 50 88
Waiting: 11 52 6.8 50 87
Total: 12 52 6.9 50 88
....
Docker Service
Next thing to do - is to prepare our cluster, if it’s not yet - init swarm cluster:
$ docker swarm init
Swarm initialized: current node (zzzzzzzzzzzzzzz) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-some-long-token-here \
192.168.65.2:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
Create network:
$ docker network create deployme --driver overlay
...
Create service and scale to 6 instances:
$ docker service create \
--env "DEBUG=docker-deploy-test:*" \
--name "deployme" \
--endpoint-mode "vip" \
--mode "replicated" \
--replicas 1 \
--update-parallelism 1 \
--update-delay 10s \
--stop-grace-period 5s \
--restart-condition "any" \
--restart-max-attempts 10 \
--publish "3000:3000" \
--network "deployme" \
lostintime/docker-deploy-test:v1
$ docker service ls
ID NAME MODE REPLICAS IMAGE
mdd4g9fnzhxe deployme replicated 1/1 lostintime/docker-deploy-test:v1
$ docker service scale deployme=6
$ docker service ls
ID NAME MODE REPLICAS IMAGE
mdd4g9fnzhxe deployme replicated 6/6 lostintime/docker-deploy-test:v1
Now, let’s try to scale service down to 2 items while putting service under load (run commands same time):
$ ab -c 40 -n 5000 "http://localhost:3000/"
...
Concurrency Level: 40
Time taken for tests: 16.180 seconds
Complete requests: 5000
Failed requests: 0
Total transferred: 1830000 bytes
HTML transferred: 850000 bytes
Requests per second: 309.01 [#/sec] (mean)
Time per request: 129.444 [ms] (mean)
Time per request: 3.236 [ms] (mean, across all concurrent requests)
Transfer rate: 110.45 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 2.6 0 39
Processing: 5 128 114.8 111 792
Waiting: 4 127 114.7 110 791
Total: 5 128 114.8 113 792
...
$ docker service scale deployme=2
Pretty good, all requests succeed, let’s scale service up:
$ docker service scale deployme=6
$ ab -c 40 -n 5000 "http://localhost:3000/"
This is ApacheBench, Version 2.3 <$Revision: 1706008 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking deployme (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
apr_socket_recv: Connection refused (111)
Total of 2771 requests completed
Oops, looks like containers are added to service before node socket binding gets ready, which probably can be fixed with another great feature released with docker 1.12
- HEALTHCHECK
(can also be added on container build time).
For checking service is up - will use simple curl command: curl --fail http://localhost:3000
(which is available by default in node
docker images, at least the one used for our app, didn’t check in slim versions)
Re-create our service with healthcheck instructions:
$ docker service rm deployme
$ docker service create \
--env "DEBUG=docker-deploy-test:*" \
--name "deployme" \
--endpoint-mode "vip" \
--mode "replicated" \
--replicas 1 \
--update-parallelism 1 \
--update-delay 10s \
--stop-grace-period 5s \
--restart-condition "any" \
--restart-max-attempts 10 \
--network "deployme" \
--publish "3000:3000" \
--health-cmd "curl --fail http://localhost:3000" \
--health-interval 3s \
--health-retries 5 \
--health-timeout 2s \
lostintime/docker-deploy-test:v1
And benchmark again:
$ ab -c 40 -n 20000 -l -k "http://localhost:3000/"
...
While scaling it up and down:
$ docker service scale deployme=6
$ docker service scale deployme=2
...
Concurrency Level: 40
Time taken for tests: 53.098 seconds
Complete requests: 20000
Failed requests: 0
Keep-Alive requests: 19973
Total transferred: 7409983 bytes
HTML transferred: 3395410 bytes
Requests per second: 376.66 [#/sec] (mean)
Time per request: 106.196 [ms] (mean)
Time per request: 2.655 [ms] (mean, across all concurrent requests)
Transfer rate: 136.28 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 8
Processing: 8 106 189.0 94 5136
Waiting: 0 99 53.6 93 942
Total: 8 106 189.0 94 5136
Tadaaa! All requests succeed.
For deploy process we will use docker service update that technically will do the same: scale service down and back up with new options, ex: --image
.
Create new version of our app:
$ docker tag lostintime/docker-deploy-test:v1 lostintime/docker-deploy-test:v2
And finally deploy:
$ docker service update --image "lostintime/docker-deploy-test:v2" deployme
While benchmarking:
$ ab -c 40 -n 20000 -l -k "http://deployme:3000/"
...
Concurrency Level: 40
Time taken for tests: 46.566 seconds
Complete requests: 20000
Failed requests: 0
Keep-Alive requests: 19985
Total transferred: 7414435 bytes
HTML transferred: 3397450 bytes
Requests per second: 429.50 [#/sec] (mean)
Time per request: 93.132 [ms] (mean)
Time per request: 2.328 [ms] (mean, across all concurrent requests)
Transfer rate: 155.49 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.5 0 15
Processing: 6 93 157.7 76 5927
Waiting: 0 89 47.3 75 414
Total: 6 93 157.7 76 5927
Conclusion
Health check is a very important part of deploy process and let us controll when exactly container is ready to be added to the swarm. Of course you should tune --helth-*
parameters for your requirements and ideally create a separate endpoint which will incapsulate all your healthcheck logic. For deploy process you’ll need at least one container running at a time or more, that can handle the load, so please also check --update-parallelism
, --update-delay
and --stop-grace-period 5s
params in more depth.
Useful links
Here are some useful links which I discovered while solving this problem.
- Graceful shutdown in nodejs: http://joseoncode.com/2014/07/21/graceful-shutdown-in-node-dot-js/
- Reducing Deploy Risk With Docker’s New Health Check Instruction: https://blog.newrelic.com/2016/08/24/docker-health-check-instruction/
- Docker healthcheck documentation: https://docs.docker.com/engine/reference/builder/#/healthcheck
PS: sorry for style of the writing and predominant shell outputs, the post is just a dirty proof-of-concept instructions set :).