EXSI 磁盘健康状态检查
1 查看所有存储设备
esxcli storage core device list
2 查看HDD 硬盘
2.1 查看具体某个HDD硬盘
esxcli storage core device smart get -d t10.ATA_____HGST_HDN726040ALE614____________________K3H1RKTD____________
esxcli storage core device smart get -d t10.ATA_____HGST_HDN726040ALE614____________________K3H1RKTD____________
esxcli storage core device smart get -d t10.ATA_____WDC_WD80EFAX2D68KNBN0____________________VAGJSVBL____________
2.2 SMART 指标含义读取
SMART 参数 | 含义 | 是否关键 | 正常值 |
---|---|---|---|
5 - Reallocated Sectors Count | 重新分配的坏扇区数(表示磁盘上有损坏的扇区被替换) | 🔴 关键 | = 0(任何增加都表示磁盘正在恶化) |
187 - Reported Uncorrectable Errors | 无法纠正的错误(数据读取失败次数) | 🔴 关键 | = 0(大于 0 可能有问题) |
188 - Command Timeout | 命令超时次数(可能是磁盘故障或电源问题) | 🟠 警告 | = 0(大于 0 可能有问题) |
197 - Current Pending Sector Count | 当前待重映射扇区数(磁盘检测到的即将坏掉的扇区) | 🔴 关键 | = 0(大于 0 说明磁盘快挂了) |
198 - Offline Uncorrectable Sector Count | 无法恢复的坏扇区数 | 🔴 关键 | = 0(出现说明磁盘有物理损坏) |
199 - UDMA CRC Error Count | 数据传输错误次数(可能是 SATA 线或磁盘问题) | 🟠 警告 | = 0(偶尔出现可能是数据线问题) |
9 - Power-On Hours (POH) | 磁盘通电时间(运行了多少小时) | 🟡 参考 | < 30,000 小时(>3.5 年需要注意) |
194 - Temperature (Celsius) | 磁盘温度 | 🟡 参考 | 30~45°C(超过 50°C 需要降温) |
1 - Read Error Rate | 读取错误率(部分厂商不公开详细数值) | 🟡 参考 | 数值不应持续增长 |
3 查看 NVME固态 硬盘
3.1 列出所有NVME 硬盘
esxcli nvme device list
查看固态的 smart
信息
-A
后面跟每一块固态的名字, Percentage Used: 97 %
这个值为磁盘的耐久性使用情况,超过100
就要更换了。
esxcli nvme device log smart get -A vmhba4
esxcli nvme device log smart get -A vmhba5
SMART And Health Info:
Available Spare Space Below Threshold: false
Temperature Warning: false
NVM Subsystem Reliability Degradation: false
Read Only Mode: false
Volatile Memory Backup Device Failure: false
Composite Temperature: 320 K
Available Spare: 100 %
Available Spare Threshold: 10 %
Percentage Used: 26 %
Data Units Read: 0x6b463c9
Data Units Written: 0x7dd173b
Host Read Commands: 0x37792882
Host Write Commands: 0x33d682e1
Controller Busy Time: 0xea8
Power Cycles: 0x434
Power On Hours: 0x16ab
Unsafe Shutdowns: 0x67
Media Errors: 0x0
Number of Error Info Log Entries: 0x4d
Warning Composite Temperature Time: 0 Mins
Critical Composite Temperature Time: 0 Mins
Temperature Sensor 1: 320 K
Temperature Sensor 2: 0 K
Temperature Sensor 3: 0 K
Temperature Sensor 4: 0 K
Temperature Sensor 5: 0 K
Temperature Sensor 6: 0 K
Temperature Sensor 7: 0 K
Temperature Sensor 8: 0 K
3.2 列出对应的NVME 固态硬盘
esxcli nvme device namespace list -A vmhba5