Project Background

Business Pain Points

With business growth, there are currently over a dozen independent private network environments (ZStack/TStack) overseas, for example:

  • Overseas Private Network Region A
  • Overseas Private Network Region B
  • Overseas Private Network Region C

Each network is an independent entity requiring different VPN clients to dial in for resource access, posing significant challenges for daily O&M management:

  • O&M personnel need to frequently switch between multiple VPN clients.
  • Lack of unified permission management and access control.
  • Complex network configuration and difficult troubleshooting.
  • Inability to achieve cross-network automated O&M.

Solution

Adopting the Headscale (Server) + Tailscale (Client) architecture to build an enterprise-grade Zero Trust VPN network:

  • Deploy the Headscale control server on the Alibaba Cloud Singapore node.
  • Install Tailscale clients on private network nodes and register them with Headscale.
  • Achieve one-time connection to access all private network resources.
  • Support ACL access control and SSO.
  • Tailscale has excellent multi-platform support: iOS, macOS, Android, Windows, Linux, and supports IPv6.

Headscale Architecture Overview

Core Components

Headscale is an open-source implementation of the Tailscale control server, fully compatible with the Tailscale protocol:

  • Control Server (Headscale): Manages node registration, key distribution, address allocation, and ACL policies.
  • DERP Server: Relay server for traffic forwarding when NAT traversal fails.
  • Client (Tailscale): Deployed on nodes, establishes encrypted tunnels based on WireGuard.

Technical Advantages

Compared to traditional VPN solutions:

  • Zero Trust Architecture: Deny all access by default, explicitly authorize via ACL.
  • NAT Traversal: Supports STUN protocol for automatic hole punching, no public IP required.
  • Mesh Network: Direct communication between nodes without passing through a central server.
  • Auto Reconnection: Automatically re-establishes connections upon network changes.
  • Cross-platform Support: Linux/Windows/macOS/BSD/iOS/Android/FreeBSD.

Headscale Server Deployment

Environment Preparation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# System Requirements
# Ubuntu 20.04+ / CentOS 8+ / Debian 11+
# Public IP Server (Alibaba Cloud Singapore recommended)
# Open UDP ports: 3478 (DERP), 41641 (Headscale)

# Install Headscale
wget https://github.com/juanfont/headscale/releases/download/v0.23.0/headscale_0.23.0_linux_amd64.deb
sudo dpkg -i headscale_0.23.0_linux_amd64.deb

# Or install using yum (RHEL/CentOS)
sudo yum install -y https://github.com/juanfont/headscale/releases/download/v0.23.0/headscale_0.23.0_linux_amd64.rpm

Configuration File

Reference: https://github.com/juanfont/headscale/blob/main/config-example.yaml

Create configuration file /etc/headscale/config.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# Server URL
server_url: https://headscale.wnote.com:443

# Listen address
listen_addr: 0.0.0.0:8080

# Private key (auto-generated or specified)
private_key_path: /var/lib/headscale/private.key

# DERP relay server configuration 
derp:
  # Enable embedded DERP server
  server:
    enabled: false
    region_id: 999                     # Use an unused ID to avoid conflicts
    region_code: "self-derp"           # Short identifier
    region_name: "Self-hosted Co-located DERP"        # Descriptive name
    stun_listen_addr: "0.0.0.0:3478"   # STUN service listening, must open UDP 3478
    # [Important] Enter your server's actual public IPv4 address
    ipv4: "YOUR_SERVER_PUBLIC_IP"
    # If you have public IPv6, you can also fill it in
    # ipv6: "YOUR_SERVER_PUBLIC_IPV6"
    # Automatically add this region to DERP map
    automatically_add_embedded_derp_region: true
    # [Security] Strongly recommended to enable, only allow your Tailnet clients to use this relay
    verify_clients: true

  # You can choose to keep the official DERP as a backup, or disable it completely (clear urls)
  urls:
    - https://controlplane.tailscale.com/derpmap/default   # Keep backup (recommended)
  # If you have additional self-hosted DERP nodes (e.g., Singapore node), keep paths
  paths:
    - /etc/headscale/derp.yaml

  auto_update_enabled: true
  update_frequency: 24h

# Database (Default SQLite, PostgreSQL recommended for production)
database:
  type: sqlite
  sqlite:
    path: /var/lib/headscale/db.sqlite

# ACL policy file
acl_policy_path: /etc/headscale/acl.yaml

# DNS configuration
dns:
  nameservers:
    - 1.1.1.1
    - 8.8.8.8
  magic_dns: true  # Enable MagicDNS (node.namespace.vpn)

# OAuth configuration (optional)
# oidc:
#   issuer: "https://your-oidc-provider.com"
#   client_id: "your-client-id"
#   client_secret: "your-client-secret"

# Log configuration
log:
  format: text
  level: info

Configuring the DERP Relay Server

First, let’s understand the DERP concept:

The DERP server is primarily a reliable, low-latency backup relay solution to ensure devices can always connect when network conditions are poor (e.g., symmetric NAT, strict firewalls) preventing direct connections.

  • Direct Connection (P2P): Headscale’s ultimate goal is to let clients establish peer-to-peer direct connections via the STUN protocol “hole punching”, offering the fastest speed.
  • Relay (DERP): When hole punching fails (e.g., one party is behind strict symmetric NAT), traffic is forwarded via the DERP server. Although latency is higher than direct connection, it ensures uninterrupted connection.

If you want to self-host DERP and finely control multiple DERP nodes, you need to create a configuration file to define your private DERP network regions and node information.

The benefit of self-hosting DERP is that you can avoid the latency issues of official DERP servers overseas and keep relay traffic on your own servers, improving security and stability.

I don’t need this for now, just an example. Create /etc/headscale/derp.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# /etc/headscale/derp.yaml
regions:
  # First self-hosted region, e.g., East China
  901:
    regionid: 901
    regioncode: "cn-east"   # Region code, short
    regionname: "East China Self-hosted Node" # Region name
    nodes:
      - name: 901a
        regionid: 901
        # Enter your independent DERP server domain name here
        hostname: derp-shanghai.yourdomain.com
        # Your DERP server port, usually 443 or custom
        derpport: 12345
        # STUN port, usually 3478
        stunport: 3478
        stunonly: false

  # Second self-hosted region, e.g., North China
  902:
    regionid: 902
    regioncode: "cn-north"
    regionname: "North China Self-hosted Node"
    nodes:
      - name: 902a
        regionid: 902
        hostname: derp-beijing.yourdomain.com
        derpport: 443  # Assume this node uses standard port 443
        stunport: 3478
        stunonly: false

After creating the configuration, you can view the DERP map and connectivity on any Tailscale client:

tailscale debug derp-map
tailscale debug derp headscale # Test connectivity with a specific DERP server

To verify if the client is actually using the self-hosted relay, you can check the connection status via tailscale status. If you see derp=self-derp (or your custom region code), it means traffic is going through your self-hosted relay.

ACL Access Control Policy

Create /etc/headscale/acl.yaml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# User/Group definitions
groups:
  group:admin:
    - admin@example.com
  group:devops:
    - devops1@example.com
    - devops2@example.com
  group:developer:
    - dev1@example.com
    - dev2@example.com

# Host tag definitions
hosts:
  sg-cloud: "10.100.0.1/32"
  network-a: "10.100.1.0/24"
  network-b: "10.100.2.0/24"
  network-c: "10.100.3.0/24"

# Access control rules
acls:
  # Admins can access all resources
  - action: accept
    src:
      - group:admin
    dst:
      - "*:*"

  # Ops team can access all servers via SSH
  - action: accept
    src:
      - group:devops
    dst:
      - "sg-cloud:22"
      - "network-a:22"
      - "network-b:22"
      - "network-c:22"

  # Dev team can access application ports
  - action: accept
    src:
      - group:developer
    dst:
      - "sg-cloud:80,443,8080"
      - "network-a:3306,6379"
      - "network-b:9200,9300"

  # Deny all other access by default
  - action: reject
    src:
      - "*"
    dst:
      - "*:*"

# Tag routing rules (optional)
tagOwners:
  tag:prod:
    - group:admin
    - group:devops
  tag:database:
    - group:admin

Starting the Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Create necessary directories
sudo mkdir -p /var/lib/headscale

# Start Headscale
sudo systemctl enable headscale
sudo systemctl start headscale
sudo systemctl status headscale

# View logs
sudo journalctl -u headscale -f

Configuring Nginx Reverse Proxy

Since we previously applied for a wildcard domain certificate via acme, we can directly specify the SSL certificate path. Configure according to actual scenarios in the enterprise:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
server {
    listen 443 ssl http2;
    server_name headscale.wnote.com;

    ssl_certificate /etc/ssl/certs/headscale.crt;
    ssl_certificate_key /etc/ssl/private/headscale.key;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_buffering off;
        # Long connection timeout settings
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
        proxy_connect_timeout 300s;
    }
    # DERP relay path also needs proxying (embedded DERP serves via this path)
    location /derp {
        proxy_pass http://127.0.0.1:8080/derp;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Opening Firewall Ports

Ensure the server firewall allows the following traffic:

Port	Protocol	Purpose	Description
443	TCP	HTTPS (Control Plane + DERP Relay)	Opened via reverse proxy
3478	UDP	STUN Service	Must open UDP 3478 for NAT traversal detection

If using cloud service providers (like AWS, Alibaba Cloud), you also need to add inbound rules in the security group.

Tailscale Client Deployment

Installing Tailscale Client

Install on each private network node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Debian/Ubuntu
curl -fsSL https://tailscale.com/install.sh | sh

# RHEL/CentOS
curl -fsSL https://tailscale.com/install.sh | sh

# Windows
# Download installer: https://tailscale.com/download/windows

# macOS
brew install --cask tailscale

# Android/iOS
# Search "Tailscale" in App Store to install

Registering Nodes to Headscale

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Method 1: Use Auth Key (Recommended)

# Create Auth Key on server
sudo headscale apikeys create -e 8760h -o api-key.txt

# Client registration (replace YOUR_AUTH_KEY)
sudo tailscale up --login-server=https://headscale.wnote.com \
                  --auth-key=YOUR_AUTH_KEY \
                  --advertise-routes=10.0.0.0/24,192.168.1.0/24 \
                  --advertise-exit-node

# Method 2: Register via command line
# 1. Create user on server
sudo headscale users create user@example.com

# 2. Get registration command
sudo headscale namespaces create default

# 3. Execute registration on client and get URL
sudo tailscale up --login-server=https://headscale.wnote.com

# 4. Approve registration on server
sudo headscale nodes register -u user@example.com -n <NodeName>

Subnet Routing Configuration

Configure subnet routing on the gateway nodes of each private network:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Enable IP forwarding
echo "net.ipv4.ip_forward=1" | sudo tee -a /etc/sysctl.conf
echo "net.ipv6.conf.all.forwarding=1" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

# Advertise subnet routes (execute on gateway node)
sudo tailscale up --advertise-routes=10.100.1.0/24,10.100.2.0/24

# Approve routes on server
sudo headscale routes list
sudo headscale routes enable -r <RouteID>

# Verify routing
sudo tailscale status
ping 10.100.1.10  # Test cross-network access

Exit Node Configuration (Optional)

Configure an exit node to access the internet via a specific node (designated internet gateway):

1
2
3
4
5
6
7
8
9
# Enable on exit node
sudo tailscale up --advertise-exit-node

# Approve on server
sudo headscale nodes list
sudo headscale nodes edit --exit-node <NodeID> --enable

# Client uses exit node
sudo tailscale up --exit-node=<ExitNodeIP> --exit-node-allow-lan-access

Specify to forward traffic via and allow simultaneous access to the local LAN. If --exit-node-allow-lan-access is not added, local LAN access (like printers, router admin pages) will also go through the exit node, potentially causing inaccessibility.

Management and Maintenance

Common Management Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# List all nodes
sudo headscale nodes list

# View node details
sudo headscale nodes get -i <NodeID>

# Delete node
sudo headscale nodes delete -i <NodeID>

# Rename node
sudo headscale nodes rename -i <NodeID> --new-name <NewName>

# Set node tags
sudo headscale nodes tag set -i <NodeID> -t "tag:prod,database"

# View routes
sudo headscale routes list

# Enable/Disable routes
sudo headscale routes enable -r <RouteID>
sudo headscale routes disable -r <RouteID>

# View namespaces
sudo headscale namespaces list

# Create namespace
sudo headscale namespaces create <NamespaceName>

# Add user
sudo headscale users create user@example.com

# List API keys
sudo headscale apikeys list

Monitoring and Alerting

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Enable Prometheus metrics
# Add in config.yaml:
metrics:
  listen_addr: 127.0.0.1:9090

# Access metrics
curl http://localhost:9090/metrics

# Grafana Dashboard
# Import Headscale official Dashboard: https://grafana.com/grafana/dashboards/18820

Backup and Recovery

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Backup database
sudo cp /var/lib/headscale/db.sqlite /backup/headscale-$(date +%Y%m%d).sqlite

# Backup configuration files
sudo tar -czf /backup/headscale-config-$(date +%Y%m%d).tar.gz /etc/headscale/

# Recovery
sudo systemctl stop headscale
sudo cp /backup/headscale-xxx.sqlite /var/lib/headscale/db.sqlite
sudo systemctl start headscale

Production Environment Best Practices

High Availability Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Use PostgreSQL as database backend
database:
  type: postgres
  postgres:
    host: pg-cluster.example.com
    port: 5432
    user: headscale
    password: "your-password"
    database: headscale
    sslmode: require

Security Hardening

  1. Use mTLS Authentication
1
2
3
4
5
tls:
  letsencrypt:
    email: admin@example.com
    agree_tos: true
    listen_host: ":443"
  1. Restrict IP Access
1
2
3
4
ip_prefixes:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16
  1. Regularly Rotate Keys
1
2
3
# Regenerate private key
sudo rm /var/lib/headscale/private.key
sudo systemctl restart headscale

Performance Optimization

  1. Deploy Multiple DERP Servers

    • Deploy a DERP node in each major region.
    • Use CDN to accelerate DERP access.
  2. Optimize Keep-alive

1
2
# Client configuration
sudo tailscale up --ping=20s  # Heartbeat interval
  1. Single Point of Failure Risk If official DERP is completely disabled (urls: []) and there is only one self-hosted DERP server, all clients requiring relay will be unable to communicate if that server goes down or has network issues. For production, it is recommended to keep at least the official DERP as a backup or deploy multiple self-hosted DERP nodes.

  2. Certificate Requirements Embedded DERP relies on Headscale’s HTTPS service, so your domain headscale.wnote.com must have a valid SSL certificate.

Troubleshooting

Common Issues

1. Node Cannot Register

1
2
3
4
5
# Check server logs
sudo journalctl -u headscale -n 50

# Check if client registration URL is correct
# Ensure firewall allows ports 8080/443

2. Cannot Access Subnet Routes

1
2
3
4
5
6
7
8
# Check if IP forwarding is enabled
sysctl net.ipv4.ip_forward

# Check if routes are approved
sudo headscale routes list

# Packet capture analysis
sudo tcpdump -i tailscale0 -n host 10.100.x.x

3. DERP Connection Failed

1
2
3
4
5
# Test DERP connectivity
tailscale ping --c 100 <DERPServerIP>

# Check DERP server status
curl https://your-derp-server.com/derp/latency-check

4. ACL Policy Not Taking Effect

1
2
3
4
5
# Validate ACL syntax
sudo headscale acl validate

# View effective ACL
sudo headscale acl get

Summary

The Headscale + Tailscale solution provides an out-of-the-box Zero Trust VPN solution for enterprises:

  • Simple Deployment: One-click server installation, automatic client configuration.
  • Powerful Features: Supports ACL, SSO, routing, and exit nodes.
  • Excellent Performance: Based on WireGuard, low latency, high throughput.
  • Secure and Reliable: End-to-end encryption, fine-grained permission control.

By deploying the Headscale control server on Alibaba Cloud Singapore and unifying access to various private networks, we achieved:

  • O&M personnel access all networks with a single login.
  • ACL-based fine-grained access control.
  • Automated network topology management.
  • Auditable operation logs.

This solution has been running stably in production with 10+ nodes, 100+ daily active connections, and a failure rate of <0.1%.