Purpose:
Memory ECC events can be logged in SEL. The FAQ is going to explain how to decrypt SEL information and address to DIMM slot.
Audience:
1. FWA-6080 and other Advantech AMD 7003 series platforms.
2. Linux has installed ipmitool.
ECC (Error-Correcting Code) Memory:
It is a type of computer data storage that can detect and correct the most common kinds of internal data corruption. ECC memory is used in most computers where data corruption cannot be tolerated under any circumstances, such as for scientific or financial computing.
Here's a comparison between ECC and non-ECC (standard) memory:
Explanation:
1. Check whether SEL contains memory ECC events. You can dump SEL via the command.
# ipmitool sel elist -vvv
2. In an ECC event, check the value of "Event Data". Two examples below, we can find Event Data values are "a00000" and "a04100".
SEL Record ID : 0023
Record Type : 02
Timestamp : 10/26/2024 21:17:59
Generator ID : 0021
EvM Revision : 04
Sensor Type : Memory
Sensor Number : 00
Event Type : Sensor-specific Discrete
Event Direction : Assertion Event
Event Data : a00000
Description : Correctable ECC
SEL Record ID : 0056
Record Type : 02
Timestamp : 10/28/2024 05:43:46
Generator ID : 0021
EvM Revision : 04
Sensor Type : Memory
Sensor Number : 00
Event Type : Sensor-specific Discrete
Event Direction : Assertion Event
Event Data : a04100
Description : Correctable ECC
3. Take Event Data "a04100" for instance. The correctable ECC was caused by H1 DIMM slot.
- "a0", indicates correctable ECC events
- "a1", indicates uncorrectable events
- "4", indicates UMC4. DIMM physically located at "H" slot.
- "1", indicates slot 1.
Comments
0 comments
Please sign in to leave a comment.