Fix Disk & RAID Failures — NAS Troubleshooting Guide

The basics

What is RAID and why does it matter?

RAID (Redundant Array of Independent Disks) uses multiple hard drives together so if one drive dies, your data survives. Think of it as a safety net for your files.

⚠️RAID is not a backup. If you accidentally delete a file it is gone instantly from all drives. If your house floods, all drives go with it. You still need a separate offsite backup.

✓ What RAID protects against

✓ A single hard drive dying

✓ Keeping your NAS running while you replace a drive

✓ Gradual bit-rot on spinning disks

✗ What RAID does NOT protect against

✗ Accidental file deletion

✗ Ransomware or viruses

✗ Fire, flood, or theft

✗ Two drives failing at once

RAID types

Which RAID level should you use?

Do not worry about the numbers. Here is what each one means in practice:

RAID levels in plain English

RAID 1 — Mirror (simplest, most beginner-friendly)

2 drives, half capacity (2x4TB = 4TB usable)

✓ One drive dies, the other has a perfect copy. Dead simple.

RAID 5 — Balance of space and safety (needs 3+ drives)

3x4TB = 8TB usable — you lose 1 drive worth to parity

✓ Survives 1 drive failure. Good for most home setups.

RAID 6 / RAIDZ2 — Extra safe (needs 4+ drives)

4x4TB = 8TB usable — you lose 2 drives worth to parity

✓ Survives 2 simultaneous drive failures. Best for large builds.

Synology SHR — Smart mix (great if drives are different sizes)

✓ Automatically optimises capacity across mismatched drive sizes

Check before it breaks

How to check if your drives are healthy

Every hard drive has a built-in health report called SMART. Check it regularly — drives usually warn you before they die.

Step 1Install smartmontools

Pre-installed on most NAS systems. If not:

bash

sudo apt install smartmontools

Step 2Run a health check

bash — replace /dev/sda with your drive

# List your drives first

lsblk

# Check the three numbers that matter most

sudo smartctl -A /dev/sda | grep -E 'Reallocated|Pending|Uncorrectable'

Reallocated_Sector_Ct 0 ← 0 is healthy

Current_Pending_Sector 0 ← 0 is healthy

Reallocated_Sector_Ct 3 ← drive is struggling, watch it

Uncorrectable_Sector_Ct 1 ← replace this drive NOW

💡If Reallocated_Sector_Ct is above 0 and climbing, back up immediately and order a replacement. If Uncorrectable_Sector_Ct is above 0, replace the drive today.

Step 3Schedule automatic checks

bash — add to cron

sudo crontab -e

# Short test every Sunday at 1am (~2 minutes)

0 1 * * 0 smartctl -t short /dev/sda

# Long test first of every month at 2am (~3 hours)

0 2 1 * * smartctl -t long /dev/sda

# Synology: Storage Manager → HDD/SSD → Schedule

# TrueNAS: Storage → Disks → S.M.A.R.T. Tests

Emergency

My array shows DEGRADED — what do I do?

Do not panic. A degraded array is still working — it lost one drive but is surviving on parity. Act quickly but calmly.

⚠️Do NOT power off the NAS while it is in a degraded state unless absolutely necessary. Keep it running and follow the steps below.

Step 1Identify the failed drive

bash — Linux software RAID

cat /proc/mdstat

md0 : active raid5 sda[0] sdc[2]

sdb[1](F) ← (F) = Failed, this is the bad drive

# Synology: Storage Manager → HDD/SSD → look for red X

# TrueNAS: Storage → Pools → click the pool → DEGRADED

# Unraid: Main tab → red drive icon

Step 2Replace the drive and rebuild

bash — Linux mdadm

# Remove failed drive from array

sudo mdadm /dev/md0 --remove /dev/sdb

# Physically swap the drive, then add the new one

sudo mdadm /dev/md0 --add /dev/sdb

# Watch the rebuild (can take hours for large drives)

watch -n 10 cat /proc/mdstat

[==>..................] recovery = 14.2% finish=180min

💡On Synology and TrueNAS you can do all of this through the web UI — follow the repair wizard. The commands above are for bare-metal Linux only.

Keep going

Related guides

Backup & recovery

→

Performance & thermals

→

First-time builds

→