Asm Health Checker Found 1 New Failures -
While the "1 new failure" could technically be anything, it usually falls into one of these three categories: A. Disk Corruption or Metadata Inconsistency
SELECT * FROM V$ASM_OPERATION;
ALTER DISKGROUP DATA DROP DISK 'DATA_0001'; ALTER DISKGROUP DATA ADD DISK '/dev/mapper/asm_data_new' NAME 'DATA_0001';
/sbin/udevadm trigger --subsystem-match=block asm health checker found 1 new failures
Troubleshooting "ASM Health Checker Found 1 New Failures" Oracle Automatic Storage Management (ASM) is the backbone of storage management for Oracle Databases. When Oracle Grid Infrastructure generates an alert stating it means the internal ASM Health Checker (AMHC) framework has detected a specific vulnerability, corruption, or performance degradation within your storage tier.
: If a disk stops responding to cluster heartbeats, the cluster initiates a "dirty detach" to protect data integrity.
This alert is not a sign of imminent database doom, but it is a critical early warning that requires a structured and thorough response. This comprehensive guide provides a complete reference for understanding, diagnosing, and resolving the health checker alert, along with actionable best practices to prevent it from occurring again. While the "1 new failure" could technically be
In conclusion, the ASM health checker’s finding of one new failure should not be dismissed as a minor anomaly nor greeted with alarmist dread. Instead, it should be received with professional respect. It is a precise, actionable signal in a sea of ambient noise. It reminds us that in the architecture of high-availability systems, the smallest crack, left unexamined, can propagate through the structure. By investigating, resolving, and learning from that single failure, an organization does more than fix a disk—it strengthens the resilience of its entire data ecosystem. The silent alarm was never meant to be ignored; it was meant to be heard by those who understand that vigilance is the price of reliability.
When a physical drive develops bad sectors, or the SAN controller drops a drive configuration, the operating system registers a hard kernel fault. Oracle ASM tracks these via asynchronous write errors ( osderr1 , osderr2 ), prompting the health checker to flag a failure. 2. Storage Area Network (SAN) Disconnects
If the disk remains offline, drop it and add a replacement: : If a disk stops responding to cluster
Ensure the IAM role running the health checker has explicit permissions to access the secret.
ALTER DISKGROUP <disk_group_name> CHECK;
ALTER DISKGROUP DATA ONLINE DISK 'DATA_0001' POWER 3; -- wait for rebalance to complete SELECT * FROM v$asm_operation;
Identify if the secret uses the default AWS managed key ( aws/secretsmanager ) or a Customer Managed Key (CMK).