Architecture, Digital Security, Machine Learning, Big Data
Post date June 23, 2025
Author: gopalsharma2001
180 |
0 |
0
Description:
In today's fast-paced development environment, deploying new features and updates to production can be a nerve-wracking experience. It might feel like walking a tightrope. How do you ensure that changes don't disrupt the user experience? One wrong step and your users could face outages, performance issues, or critical bugs. This fear often leads to slow, cautious release cycles, hindering innovation. This is where canary and shadow deployments come into play, and Nginx can be your powerful ally. Canary and shadow deployments can be your safety nets for risk-free releases and Nginx, your versatile reverse proxy and load balancer, is an excellent tool to make them happen.
The Deployment Dilemma
Imagine launching a new feature that crashes your production environment, or rolling out an update that introduces subtle data corruption. Traditional "big bang" deployments, where a new version replaces the old one entirely, are inherently risky. If something goes wrong, the impact is immediate and widespread, leading to frantic rollbacks, unhappy customers, and potential revenue loss. Even well-tested software can behave unexpectedly in a live production environment with its unique traffic patterns and data.
This is where progressive delivery strategies like Canary and Shadow deployments shine, allowing you to test in production with controlled exposure.
What are Canary Deployments?
Inspired by the canaries used in coal mines to detect toxic gases, a canary deployment involves releasing a new version of your application to a small subset of your users (the "canary group") before making it available to everyone. Think of it as a "test group." If the new version performs well and no issues are detected, you gradually increase the traffic routed to it until it serves 100% of the users. If problems arise, you can quickly revert the small canary group to the old version, minimizing impact.
Here are key characteristics of canary deployments:
What are Shadow Deployments?
Shadow deployments differ in that they send real traffic to both the existing production version and the new version simultaneously, without affecting user experience. Also known as traffic mirroring, shadow deployment sends a copy of live production traffic to a new version of your application that runs in parallel. The key difference from canary is that the responses from the shadowed (new) version are ignored by the client. This allows you to test the new application's behavior (performance, errors, resource consumption) under realistic load without impacting your live users.
Key features of shadow deployments include:
Why Nginx?
Nginx, a traffic architect, is a lightweight and scalable load balancer which can handle 100K+ requests/sec, and with dynamic configuration feature, can hot reload without downtime. Moreover, its built-in modules help in implementing different deployment strategies seamlessly, particularly split_clients for canary and mirror for shadow deployment.
And the advanced version of Nginx, Nginx Plus, has a new feature key‑value store for HTTP traffic. This feature provides an API for dynamically maintaining values that can be
used as part of the NGINX Plus configuration, without requiring a reload of the configuration. This feature helps us to dynamically update the split percentage without doing re-installation or restart of nginx server.
Using Nginx for Canary Deployments
Nginx, acting as a reverse proxy and load balancer, can intelligently route traffic based on various criteria, making it perfect for managing canary releases. Nginx facilitates canary by:
Nginx Configuration Example (Percentage-based):
A common approach for percentage-based routing in Nginx without complex modules is to use a hash based on some client identifier (like IP or a cookie).
Nginx.conf
http {
# Define traffic split logic. Generate a consistent hash for percentage routing (e.g., based on client IP) set $hash_value $remote_addr;
# Or combine with $uri, $http_user_agent etc.
split_clients "$remote_addr" $canary_version {
10% "v2"; # 10% to new version
* "v1"; # Rest to current version
}
upstream backend_v1 {
server 10.0.0.1:80;
}
upstream backend_v2 {
server 10.0.0.2:80;
}
server {
listen 80;
location / {
# Route based on split logic
set $backend $canary_version;
proxy_pass http://backend_$backend;
# Monitor errors (integration with Prometheus)
proxy_next_upstream error timeout;
}
}
}
In this example:
Implementing Shadow Deployments with Nginx
Shadow deployment, also known as traffic mirroring, sends a copy of live production traffic to a new version of your application that runs in parallel. The key difference from canary is that the responses from the shadowed (new) version are ignored by the client. This allows you to test the new application's behavior (performance, errors, resource consumption) under realistic load without impacting your live users.
Shadow deployments can be achieved using Nginx's ngx_http_mirror_module. This module offers a mirror directive which is specifically designed for this purpose, allowing it to send a copy of a request to a mirrored server or location.
Nginx.conf
http {
upstream primary {
server 10.0.0.1:80; # Current production (v1)
}
upstream shadow {
server 10.0.0.2:80; # New version (v2)
}
server {
listen 80;
location / {
# Send primary traffic to v1
proxy_pass http://primary;
# Duplicate traffic to v2 (ignores responses)
mirror /mirror;
mirror_request_body on;
}
location = /mirror {
internal; # Hide from external access
proxy_pass http://shadow$request_uri;
# Prevent shadow responses to client
proxy_intercept_errors on;
proxy_pass_request_body on;
proxy_pass_request_headers on;
}
}
}
This setup sends traffic to the stable server while also mirroring it to the shadow server for evaluation.
Zero-Downtime Releases with Built-in Safety Nets
Here's a complete Nginx configuration implementing canary deployments, shadow deployments, and critical fallback mechanisms to handle failures automatically. The fallback location configuration which takes care of instant solutions to standard errors thrown by the new system can be configured for other error codes as we deem fit. It can serve as built-in safety nets for zero-downtime release.
# ===== ERROR HANDLING =====
error_page 500 502 503 504 /fallback;
location = /fallback {
# Final fallback to stable
proxy_pass http://<stable_backend>;
access_log /var/log/nginx/fallback.log;
}
But there can be functional or other issues as well where standard error codes are not thrown. In this scenario, we should be able to increase or decrease the split percentage for incoming traffic.
Nginx.conf
http {
# Upstream definitions
upstream stable_backend {
server 10.0.0.1:80; # Primary stable version
server 10.0.0.2:80; # Secondary stable instance
keepalive 32;
}
upstream canary_backend {
server 10.0.0.3:80; # Canary version
server 10.0.0.4:80 backup; # Fallback server
}
upstream shadow_backend {
server 10.0.0.5:80; # Shadow version
}
# Health check endpoint
server {
listen 127.0.0.1:9000;
location /health {
access_log off;
return 200 "OK";
add_header Content-Type text/plain;
}
}
# Main server configuration
server {
listen 80;
server_name app.example.com;
# ===== CANARY DEPLOYMENT =====
# Traffic split: 10% to canary, 90% to stable
split_clients "${remote_addr}${http_user_agent}" $canary_version {
10% "canary";
* "stable";
}
# ===== FALLBACK MECHANISM =====
# Check canary health before routing
map $canary_version $backend {
"canary" ${canary_healthy};
default "stable_backend";
}
# Health check for canary backend
map $upstream_status $canary_healthy {
default "stable_backend"; # Fallback to stable if canary fails
~^[23] "canary_backend"; # Use canary if healthy (2xx/3xx)
}
# ===== SHADOW DEPLOYMENT =====
# Conditional mirroring based on canary health
map $canary_healthy $shadow_active {
"canary_backend" 1; # Mirror only when canary is healthy
default 0;
}
# ===== MAIN LOCATION BLOCK =====
location / {
# Primary request routing
proxy_pass http://$backend;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Health check for canary backend
proxy_next_upstream error timeout http_500 http_502 http_503 http_504;
health_check uri=/health interval=5s fails=2 passes=2;
# Shadow traffic (mirroring)
mirror /mirror; # Primary mirror
mirror_request_body on;
}
# ===== SHADOW MIRROR LOCATION =====
location = /mirror {
internal;
# Conditional mirroring
if ($shadow_active = 0) { return 200; } # Skip if shadow inactive
proxy_pass http://shadow_backend$request_uri;
proxy_set_header X-Shadow-Request "true";
proxy_set_header Host $host;
# Critical: Prevent shadow responses from affecting client
proxy_intercept_errors on;
proxy_ignore_client_abort on;
proxy_pass_request_body on;
proxy_pass_request_headers on;
# Fast-fail settings
proxy_connect_timeout 1s;
proxy_read_timeout 2s;
proxy_send_timeout 2s;
}
# ===== ADMIN OVERRIDES =====
# Force canary via header (for testing)
location @canary_override {
proxy_pass http://canary_backend;
}
# Force stable version (emergency fallback)
location @stable_fallback {
proxy_pass http://stable_backend;
}
# Header-based routing
if ($http_x_canary = "true") {
rewrite ^ @canary_override last;
}
if ($http_x_force_stable = "true") {
rewrite ^ @stable_fallback last;
}
# ===== ERROR HANDLING =====
error_page 500 502 503 504 /fallback;
location = /fallback {
# Final fallback to stable
proxy_pass http://stable_backend;
access_log /var/log/nginx/fallback.log;
}
}
}
The above nginx setup has a three-layer fallback system.
In this configuration, there is a health check for canary. Shadow traffic is sent only when the canary is healthy. Moreover, timeouts prevent shadow from affecting the primary.
Administrators can override and force canary or stable routing during testing.
curl -H "X-Canary: true"
curl -H "X-Force-Stable: true"
Operation Playbook for Canary Deployment
During the transition phase, it is important to constantly monitor the new system for any issues or failure or performance. If there is an issue, we should be able to take remedial or fallback action swiftly. If it's performing well, we can gradually increase the traffic on the new system while still monitoring it.
As mentioned above, Nginx Plus gives a major advantage with respect to seamlessly ramping up or down the traffic on the new system. By leveraging the key-val api, the split percentage can be updated in the runtime. Let’s see how.
Define mapping of different split percentages in the nginx configuration file.
# Set up a key‑value store to specify the percentage to send to each upstream group based on the 'Host' header.
keyval_zone zone=split:64k state=/etc/nginx/state_files/split.json;
keyval $host $split_level zone=split;
split_clients $split_param $split0 {
* old;
}
split_clients $split_param $split5 {
5% new;
* old;
}
split_clients $split_param $split10 {
10% new;
* old;
}
split_clients $split_param $split25 {
25% new;
* old;
}
split_clients $split_param $split50 {
50% new;
* old;
}
split_clients $split_param $split100 {
* new;
}
map $split_level $migration {
0 $split0;
5 $split5;
10 $split10;
25 $split25;
50 $split50;
100 $split100;
default $split0;
}
# In each 'split_clients' block above, '$split_param' controls which application receives each request. For a production application, we set it to '$remote_addr' (the client IP address).
server {
—---
—---
set $split_param $remote_addr;
—---
—---
}
We then make a call to the nginx api to first create a split keyval attribute in the Nginx and make changes to the split percentage.
For example,
curl -iX POST -d '{<ip_address>:50}' http://localhost:8008/api/9/http/keyvals/split/
curl -iX PATCH -d '{<ip_address>:0}' http://localhost:8008/api/9/http/keyvals/split/
The first call creates the split attribute with value 50, which is 50% split. The second call updates the split to 0%, i.e. no request to the new system. We can choose any value from the mapping to increase or decrease the split.
To view what is the current split
curl -X GET http://localhost:8008/api/9/http/keyvals/split/
Please note that the Nginx API server is deployed at port 8008. We can choose an appropriate port for its deployment in the API configuration file. If the port is exposed then we can call the API from outer systems as well. But it's recommended to keep it secure.
Steps
curl -iX PATCH -d '{<ip_address>:0}' http://localhost:8008/api/9/http/keyvals/split/
In case of issues, verify that the old system url is correctly set and the network is configured properly to accept requests from Nginx Plus Server.
curl -iX PATCH -d '{<ip_address>:5}' http://localhost:8008/api/9/http/keyvals/split/
Verify that the system is working as expected. The new system is handling traffic successfully.
curl -iX PATCH -d '{<ip_address>:10}' http://localhost:8008/api/9/http/keyvals/split/
If there is an increase in error rates, decrease the traffic to 5% to the new system, follow step 2.
curl -iX PATCH -d '{<ip_address>:25}' http://localhost:8008/api/9/http/keyvals/split/
curl -iX PATCH -d '{<ip_address>:100}' http://localhost:8008/api/9/http/keyvals/split/
Conclusion
Canary and Shadow deployments, powerfully enabled by Nginx, transform the intimidating process of deploying new software into a controlled, low-risk, and highly informative exercise. By intelligently directing or mirroring traffic, you gain the confidence to innovate rapidly, test rigorously in real environments, and ensure a seamless experience for your users.
June 23, 2025
May 30, 2025
Sept. 2, 2020
Sept. 1, 2018
March 30, 2017