This article is introducing how to decode EDAC messages to understand which memory device got troubles.
All users which is using Advantech server grade products and the Linux based kernel does support edac EDAC functions.
1. Search the string "CPU_SrcID#x_MC#x_Chan#x_DIMM#x" first.
2. Here is the conversion table:
SrcID#x: x=CPU location (x=0=1st CPU=CPU0)
MC#x : x = memory controller (x=0=1st controller)
*Please contact Advantech representative to understand diagram of memory controller of that processors
Chan#x: x= Channel no. (x=0=1st Channel)
DIMM#x: x= DIMM location (x=0=1st DIMM)
3. As example in below,
ADV-node-2 kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#0_MC#1_Chan#0_DIMM#0 (channel:0 slot:0 page:0x1d51264 offset:0xc00 grain:32 syndrome:0x0 - err_code:0x0080:0x0090 SystemAddress:0x1d51264c00 ProcessorSocketId:0x0 MemoryControllerId:0x1 ChannelAddress:0x734499200 ChannelId:0x0 RankAddress:0x734499200 PhysicalRankId:0x0 DimmSlotId:0x0 Row:0x3aa26 Column:0x240 Bank:0x3 BankGroup:0x0 ChipSelect:0x0 ChipId:0x0)
CPU_SrcID#0 == CPU0
MC1 == 2nd Controller for Channel C or D
Chan#0 == 1st Channel, so now we know it's about channel C
DIMM#0 == 1st DIMM of that channel
Summary = Memory from "CPU0 Channel C1" is now suspected