Monitoring Production Servers: Catching Config Corruption and Security Incidents Before They Cost You Hours

DevOps

Infrastructure

Monitoring

Security

LittleBig.Co

When hosting provider automation corrupts your nginx config, you need visibility and fast recovery. This field note documents a three-layer monitoring stack (Wazuh, auditd, etckeeper) that detects config changes in real-time and restores working configurations in 30 seconds instead of hours of blind debugging.

Full transparency: Links marked with (*) are affiliate links. Yes, we might earn a commission if you buy. No, that doesn’t mean we’re shilling garbage. We recommend what we’d actually use ourselves. Read our full policy.

Context: Running 20 WordPress sites on a single production server with 8 cores/16GB RAM. Hosting provider has a management tool with root access that “helps” manage the server. Until it doesn’t.

The Incident: Hosting provider’s automation corrupted the nginx configuration. Server down for hours. Root cause: couldn’t see what changed or when. Just knew something broke, somewhere.

The Solution: A three-layer monitoring and recovery stack that catches incidents in real-time and restores configurations in seconds, not hours.

The Problem

When you hand root access to a hosting provider’s management tool, you’re trusting black-box automation. Most of the time it works. When it doesn’t, you’re debugging blind:

  • What changed?
  • When did it change?
  • Who (or what process) changed it?
  • How do I roll back?

For WordPress hosting, add another layer: compromises from theme/plugin vulnerabilities. Early disclosure vulnerabilities hit before patches exist. You need to know immediately:

  • Did an attacker inject code?
  • Did they establish persistence?
  • Did they escalate beyond WordPress to system files?

Traditional monitoring tells you something broke. You need to know what broke, how it broke, and have a path to immediate restoration.

The Stack

Three components working together:

  1. Wazuh – File integrity monitoring and threat detection
  2. auditd – System call auditing (who did what, when)
  3. etckeeper + GitLab – Version control for /etc with offsite history

Cost: $0 in licensing (all open source), ~200MB RAM, <3% CPU on the production server.

Component 1: Wazuh File Integrity Monitoring

Wazuh runs as an agent on your production server, reporting to a separate Wazuh manager (recommended: 4 cores/8GB RAM).

Install Wazuh Agent

curl -s https://packages.wazuh.com/key/GPG-KEY-WAZUH | gpg --no-default-keyring --keyring gnupg-ring:/usr/share/keyrings/wazuh.gpg --import && chmod 644 /usr/share/keyrings/wazuh.gpg

echo "deb [signed-by=/usr/share/keyrings/wazuh.gpg] https://packages.wazuh.com/4.x/apt/ stable main" | tee -a /etc/apt/sources.list.d/wazuh.list

apt update
apt install wazuh-agent

# Configure manager IP
echo "WAZUH_MANAGER='your.manager.ip'" >> /var/ossec/etc/ossec.conf

systemctl enable wazuh-agent
systemctl start wazuh-agent

Configure File Integrity Monitoring

Edit /var/ossec/etc/ossec.conf:

<syscheck>
  <!-- Scan frequency: every 6 hours -->
  <frequency>21600</frequency>
  
  <!-- Run heavy scans during low-traffic hours -->
  <scan_time>03:00</scan_time>
  
  <!-- System Configurations - Scheduled Scans -->
  <directories check_all="yes" report_changes="yes">/etc/nginx</directories>
  <directories check_all="yes" report_changes="yes">/etc/systemd</directories>
  <directories check_all="yes" report_changes="yes">/etc/ssh</directories>
  <directories check_all="yes" report_changes="yes">/etc/cron.d</directories>
  <directories check_all="yes" report_changes="yes">/etc/cron.daily</directories>
  
  <!-- WordPress - Scheduled Scans for Themes/Plugins -->
  <directories check_all="yes" report_changes="yes">/var/www/*/wp-content/themes</directories>
  <directories check_all="yes" report_changes="yes">/var/www/*/wp-content/plugins</directories>
  
  <!-- Critical Files - Realtime Monitoring -->
  <directories check_all="yes" report_changes="yes" realtime="yes">/var/www/*/wp-config.php</directories>
  <directories check_all="yes" report_changes="yes" realtime="yes" restrict=".php$">/var/www/*/wp-content/uploads</directories>
  
  <!-- Ignore noise -->
  <ignore>/var/www/*/wp-content/cache</ignore>
  <ignore type="sregex">.log$</ignore>
  <ignore type="sregex">.log$|.swp$</ignore>
  <ignore type="sregex">wp-content/plugins/.*/node_modules</ignore>
  <ignore type="sregex">wp-content/plugins/.*/vendor</ignore>
  <ignore type="sregex">wp-content/themes/.*/node_modules</ignore>
  <ignore type="sregex">wp-content/themes/.*/vendor</ignore>
</syscheck>

Key decisions:

  • Scheduled scans for directories with many files (themes, plugins) – reduces CPU impact
  • Realtime monitoring for critical files only (wp-config.php, PHP in uploads)
  • report_changes=”yes” stores file diffs – you see exactly what changed
  • Pattern matching (/var/www/*/...) monitors all sites from one config

Increase inotify limits

Realtime monitoring uses kernel watches:

echo "fs.inotify.max_user_watches=524288" >> /etc/sysctl.conf
echo "fs.inotify.max_queued_events=32768" >> /etc/sysctl.conf
sysctl -p

Restart Agent

systemctl restart wazuh-agent

Component 2: auditd for Command Tracking

Wazuh tells you what changed in files. auditd tells you who and how.

Install auditd

apt update
apt install auditd audispd-plugins

systemctl start auditd
systemctl enable auditd

Configure Audit Rules

Create /etc/audit/rules.d/wazuh-audit.rules:

# Clear existing rules
-D

# Increase buffer for busy servers
-b 8192

# Fail mode: log errors, don't panic
-f 1

## Track Root Activity ##

# All commands executed by root
-a exit,always -F arch=b64 -F euid=0 -S execve -k root_commands
-a exit,always -F arch=b32 -F euid=0 -S execve -k root_commands

# Privilege escalation
-w /usr/bin/sudo -p x -k privilege_escalation
-w /usr/bin/su -p x -k privilege_escalation

## System Configuration Changes ##

# Web server configs
-w /etc/nginx/ -p wa -k nginx_changes
-w /etc/nginx/nginx.conf -p wa -k nginx_config

# System services
-w /etc/systemd/system/ -p wa -k systemd_changes
-w /lib/systemd/system/ -p wa -k systemd_changes

# SSH configuration
-w /etc/ssh/sshd_config -p wa -k ssh_config

# Cron jobs (backdoor persistence detection)
-w /etc/cron.d/ -p wa -k cron_changes
-w /etc/cron.daily/ -p wa -k cron_changes
-w /var/spool/cron/ -p wa -k cron_changes

# User management
-w /etc/passwd -p wa -k user_changes
-w /etc/group -p wa -k user_changes
-w /etc/shadow -p wa -k user_changes
-w /etc/sudoers -p wa -k sudoers_changes

## WordPress Compromise Detection ##

# Track web server user commands (www-data typically UID 33)
-a exit,always -F arch=b64 -F uid=33 -S execve -k www_data_commands
-a exit,always -F arch=b32 -F uid=33 -S execve -k www_data_commands

# Suspicious binaries (should never be called by www-data)
-w /usr/bin/wget -p x -F uid=33 -k suspicious_downloads
-w /usr/bin/curl -p x -F uid=33 -k suspicious_downloads
-w /bin/nc -p x -F uid=33 -k suspicious_network
-w /bin/bash -p x -F uid=33 -k www_data_shell

# Make rules immutable (comment out during testing)
# -e 2

Load Rules

augenrules --load

# Verify
auditctl -l

Configure Wazuh to Read Audit Logs

Add to /var/ossec/etc/ossec.conf:

<localfile>
  <log_format>audit</log_format>
  <location>/var/log/audit/audit.log</location>
</localfile>

Restart agent:

systemctl restart wazuh-agent
```

### Configure Log Rotation

Edit `/etc/audit/auditd.conf`:
```
num_logs = 10
max_log_file = 50
max_log_file_action = ROTATE

Component 3: etckeeper + GitLab for Config Version Control

Wazuh and auditd tell you what happened. etckeeper lets you restore instantly.

Install etckeeper

apt install etckeeper git

Configure etckeeper

Edit /etc/etckeeper/etckeeper.conf:

VCS="git"
AVOID_DAILY_AUTOCOMMITS=1
PUSH_REMOTE="origin"

Initialize Repository

cd /etc
etckeeper init

git config user.email "[email protected]"
git config user.name "Server Admin"

etckeeper commit "Initial commit - baseline configuration"

Exclude Sensitive Files

Critical: Don’t push passwords, keys, or certificates to GitLab.

Create /etc/.gitignore:

# Password hashes
shadow
shadow-
gshadow
gshadow-

# SSH keys (keep config, not keys)
ssh/ssh_host_*_key
ssh/ssh_host_*_key.pub
ssh/*_rsa
ssh/*_ed25519

# SSL/TLS private keys
ssl/private/*
letsencrypt/accounts/*/private_key.json
letsencrypt/archive/*/privkey*.pem
letsencrypt/live/*/privkey.pem

# Database credentials
mysql/debian.cnf

# Generic patterns
*.key
*.pem
*_rsa
*_ed25519
*secret*
*password*
*credential*

# Temporary files
*.swp
*~
*.tmp
*.log

Verify Nothing Sensitive Is Tracked

cd /etc
git ls-files | grep -E "(shadow|\.key|\.pem|password|secret)"
# Should return nothing

Set Up GitLab Repository

  1. Create private repository in GitLab
  2. Generate deploy key on server:
ssh-keygen -t ed25519 -C "etckeeper-production" -f ~/.ssh/etckeeper_deploy -N ""
cat ~/.ssh/etckeeper_deploy.pub
  1. Add deploy key to GitLab: Repository → Settings → Repository → Deploy Keys
  2. Check “Grant write permissions”

Configure SSH for GitLab

cat >> ~/.ssh/config << 'EOF'
Host gitlab.com
    HostName gitlab.com
    User git
    IdentityFile ~/.ssh/etckeeper_deploy
    IdentitiesOnly yes
EOF

chmod 600 ~/.ssh/etckeeper_deploy

Add Remote and Push

cd /etc
git remote add origin [email protected]:youruser/server-etc-backup.git
git push -u origin main

Remove Already-Tracked Sensitive Files

If you initialized etckeeper before creating .gitignore:

cd /etc
git rm --cached shadow shadow- gshadow gshadow-
git rm --cached -r ssh/ssh_host_*key*
etckeeper commit "Remove sensitive files from tracking"
git push

How It Works in Practice

Scenario 1: Hosting Provider Corrupts Config

Before:

  • nginx fails to reload
  • Hours troubleshooting to find the corrupted config
  • Manual reconstruction or full server restore
  • Total downtime: 2-4 hours

After:

# Wazuh alerts: /etc/nginx/nginx.conf modified at 14:32:15
cd /etc

# See what broke
git diff nginx/nginx.conf
# Output: Missing semicolon on line 45, broken server_name directive

# Restore from last working state
git checkout HEAD~1 -- nginx/nginx.conf

# Test and reload
nginx -t && systemctl reload nginx

# Commit the fix
etckeeper commit "Restored nginx.conf after hosting provider incident"

# Auto-pushes to GitLab
```

**Total downtime: 30 seconds**

### Scenario 2: WordPress Theme Zero-Day

**Detection timeline:**
```
12:34:01 - Wazuh web log: POST to /wp-content/themes/vulnerable-theme/ajax.php
12:34:02 - Wazuh FIM: New file /wp-content/uploads/cache.php
12:34:05 - auditd: www-data executed /bin/bash
12:34:10 - Wazuh: Webshell pattern detected in cache.php

Investigation:

# Did attacker modify system files?
cd /etc
git log --since="12:30" --oneline
# No changes - compromise limited to WordPress

# Check for persistence
git log cron.d/ systemd/system/
# No backdoors in system cron or services

# View file diff in Wazuh dashboard
# Shows: eval(base64_decode(...)) injected into theme file

Response:

  • Restore WordPress site from backup
  • Patch vulnerable theme
  • Confirmed: No system-level compromise

etckeeper immediately shows whether an attacker escalated beyond the web application.

Scenario 3: “What Changed Last Week?”

cd /etc

# See all changes
git log --since="7 days ago" --oneline

# View specific commit
git show abc123

# Compare two points in time
git diff abc123 def456 -- nginx/

# Find when a specific value was added
git log -S "listen 443 ssl" --all

Resource Impact

On production server (8 cores/16GB RAM):

  • Wazuh agent: ~100MB RAM, <1% CPU baseline
  • auditd: ~30MB RAM, 1-2% CPU
  • etckeeper: Negligible (only active during commits)

Total steady-state: ~200MB RAM (1.25%), 2-3% CPU

During scheduled FIM scans:

  • One core busy for 2-5 minutes every 6 hours
  • Brief disk I/O spike
  • Negligible impact on web traffic

Network usage:

  • Agent to manager: ~4MB/day baseline
  • Spikes to 50-100MB during WordPress updates (file diffs)
  • GitLab pushes: Minimal, only on config changes

Custom Wazuh Rules

For more specific alerting, add custom rules to your Wazuh manager.

Create /var/ossec/etc/rules/local_rules.xml:

<group name="local,wordpress,">
  <!-- PHP file in uploads directory -->
  <rule id="100010" level="12">
    <if_sid>550</if_sid>
    <match>/wp-content/uploads/.*\.php</match>
    <description>PHP webshell detected in WordPress uploads directory</description>
  </rule>
  
  <!-- WordPress config modified -->
  <rule id="100011" level="10">
    <if_sid>550</if_sid>
    <match>wp-config.php</match>
    <description>WordPress configuration file modified</description>
  </rule>
  
  <!-- nginx config modified -->
  <rule id="100020" level="8">
    <if_sid>550</if_sid>
    <match>/etc/nginx/</match>
    <description>nginx configuration modified</description>
  </rule>
  
  <!-- Webshell execution detected -->
  <rule id="100030" level="15">
    <if_sid>100010</if_sid>
    <if_sid>80790</if_sid>
    <same_source_ip />
    <timeframe>300</timeframe>
    <description>Webshell uploaded and executed - active compromise</description>
  </rule>
</group>

Common etckeeper Commands

# Daily workflow
cd /etc && git status              # Check uncommitted changes
git diff                            # See what changed
etckeeper commit "Description"     # Commit and auto-push

# Troubleshooting
git log --oneline -10               # Recent changes
git log -p nginx/nginx.conf         # History of specific file
git show HEAD~1:nginx/nginx.conf    # View previous version

# Restoration
git checkout HEAD~1 -- nginx/nginx.conf    # Restore single file
git checkout HEAD~1 -- nginx/              # Restore entire directory
git revert HEAD                            # Undo last commit

# Forensics
git log --since="2024-12-25" --until="2024-12-26"  # Changes in timeframe
git log -S "search_term" --all                      # Find when text was added
git log --all --full-history -- path/to/file        # Complete file history

Testing the Setup

Test 1: FIM Detection

# Make a test change
echo "# test comment" >> /etc/nginx/nginx.conf

# Check Wazuh dashboard
# Should see: FIM alert with diff showing added comment

# Restore
cd /etc && git checkout HEAD -- nginx/nginx.conf

Test 2: auditd Tracking

# Execute command as root
sudo systemctl status nginx

# Check audit log
ausearch -k root_commands --start recent

# Should see the systemctl command logged

Test 3: etckeeper + GitLab

cd /etc
echo "# test" >> test.conf
git add test.conf
etckeeper commit "Test commit"

# Check GitLab web UI - commit should appear
# Browse: https://gitlab.com/youruser/server-etc-backup/-/commits/main

# Clean up
git rm test.conf
etckeeper commit "Remove test"

Maintenance

Monthly:

  • Review Wazuh alerts for patterns
  • Check etckeeper repo size: du -sh /etc/.git
  • Verify GitLab deploy key still works

Quarterly:

  • Tune FIM scan frequency based on actual load
  • Review auditd rules for noise
  • Test restoration procedure

Annually:

  • Rotate GitLab deploy key
  • Review and update .gitignore for new sensitive files
  • Audit which files are being tracked

What This Doesn’t Do

This stack provides detection and recovery, not prevention:

  • Won’t stop the initial exploitation
  • Won’t block attackers at the network layer
  • Won’t prevent hosting provider automation from running

For prevention, you still need:

  • Web Application Firewall
  • Network firewall rules
  • Regular patching
  • Proper WordPress hardening

This stack catches incidents within seconds and gives you immediate recovery capability. Combined with proper backups, it turns potential disasters into minor inconveniences.

Cost-Benefit Analysis

Investment:

  • Setup time: 2-4 hours
  • Ongoing maintenance: ~30 minutes/month
  • Hardware: Separate Wazuh manager (4 cores/8GB, ~$20-40/month VPS)

Return:

  • First incident avoided pays for years of monitoring
  • Reduced debugging time: hours → minutes
  • Complete audit trail for compliance
  • Confidence to delegate infrastructure management

For anyone running production WordPress at scale, this is table stakes.


Optional Enhancements

Git-Crypt Encryption

If your threat model requires encrypting sensitive files in GitLab (e.g., regulatory compliance):

apt install git-crypt
cd /etc
git-crypt init

# Create .gitattributes for selective encryption
cat >> .gitattributes << 'EOF'
shadow filter=git-crypt diff=git-crypt
shadow- filter=git-crypt diff=git-crypt
gshadow filter=git-crypt diff=git-crypt
mysql/debian.cnf filter=git-crypt diff=git-crypt
EOF

# Export encryption key (store OFFLINE)
git-crypt export-key /root/etckeeper-encryption-key

# Files are now encrypted in GitLab, decrypted locally

Tradeoff: You lose the ability to browse encrypted files in GitLab’s web UI. They appear as binary blobs. For most use cases, aggressive .gitignore filtering is sufficient.

Wazuh Active Response

Automatically respond to threats (use with extreme caution on production):

<active-response>
  <command>firewall-drop</command>
  <location>local</location>
  <rules_id>100030</rules_id>
  <timeout>3600</timeout>
</active-response>

This auto-blocks IPs that upload and execute webshells. Test thoroughly before enabling – false positives can block legitimate users.

Advanced auditd Rules

More verbose tracking (generates significant log volume):

# Track all network connections
-a exit,always -F arch=b64 -S socket -S connect -k network_connections

# Track all file deletions
-a exit,always -F arch=b64 -S unlink -S unlinkat -k file_deletion

# Track all file renames
-a exit,always -F arch=b64 -S rename -S renameat -k file_rename

Only enable these if you have specific compliance requirements or are investigating an ongoing incident.

Automated Daily Verification

Script to check for uncommitted changes:

#!/bin/bash
# /usr/local/bin/etckeeper-check.sh

cd /etc

if ! git diff-index --quiet HEAD --; then
    echo "Uncommitted changes detected:"
    git status --short
    etckeeper commit "Auto-commit: Daily check"
fi

# Verify sync with GitLab
LOCAL=$(git rev-parse HEAD)
REMOTE=$(git rev-parse origin/main 2>/dev/null)

if [ "$LOCAL" != "$REMOTE" ]; then
    echo "Pushing to GitLab..."
    git push origin main
fi

Add to cron:

chmod +x /usr/local/bin/etckeeper-check.sh
echo "0 2 * * * /usr/local/bin/etckeeper-check.sh" | crontab -

Multiple Server Management

If expanding to multiple servers:

  1. Per-server repos: Most isolation, more overhead
  2. Shared deploy key: Less management, can’t distinguish which server pushed
  3. Bot user account: Works across repos, uses GitLab seat

For small deployments (2-5 servers), per-server repos with individual deploy keys provides the best balance.

Centralized Log Analysis

For compliance or advanced threat hunting, ship logs to a SIEM:

<!-- Forward to external syslog -->
<syslog_output>
  <server>siem.yourdomain.com</server>
  <port>514</port>
  <format>json</format>
</syslog_output>

Most environments don’t need this – Wazuh’s built-in dashboard handles 90% of use cases.


Conclusion

This stack turns invisible infrastructure changes into immediately visible, reversible events. When a hosting provider’s automation corrupts configs, you restore in 30 seconds. When a WordPress theme vulnerability drops a webshell, you know within seconds and can trace the full attack path.

The alternative – debugging blind for hours – is unacceptable for production systems.

Total cost: ~$20-40/month for the Wazuh manager, a few hours of setup time, and negligible ongoing maintenance. First incident avoided pays for itself many times over.

If you’re running production WordPress or any infrastructure where config integrity matters, implement this or something equivalent. Don’t wait until a hosting provider’s “helpful” automation takes your site offline to wish you had visibility into what changed.


This field note documents a production implementation running across multiple WordPress installations. If you’d rather have someone else implement this for your infrastructure, contact me. I’ve done this enough times to have the sharp edges mapped out.