Je travaille sur CentOS, mais je doute qu'il y ait grande différence avec une Debian pour ce genre de problème. Durant la semaine dernière, j'ai eu plusieurs "Hardware Error" reportée par mcelog. Voici le contenu de la dernière :
Code : Tout sélectionner
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: It has been corrected by h/w and requires no further action
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: event severity: corrected
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: Error 0, type: corrected
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: section_type: memory error
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: physical_address: 0x0000005693d91b00
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: physical_address_mask: 0x00003fffffffffc0
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: node: 1 card: 0 module: 0 rank: 1 column: 912
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: error_type: 2, single-bit ECC
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: Error 1, type: corrected
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: fru_text: Card02, ChnA, DIMM0
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: section_type: memory error
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: error_status: 0x0000000000000000
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: physical_address: 0x0000005693d93f80
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: node: 1 card: 0 module: 0 rank: 1 bank: 0 row: 50910 column: 1016
Dec 2 01:40:04 kw60340 kernel: {15}[Hardware Error]: DIMM location: not present. DMI handle: 0x0000
Dec 2 01:40:04 kw60340 kernel: mce: [Hardware Error]: Machine check events logged
Dec 2 01:40:04 kw60340 kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5693d91 offset:0xb00 grain:32 syndrome:0x0 - err_code:0000:009f socket:0 imc:1 rank:1 bg:2 ba:0 row:1c6de col:390)
Dec 2 01:40:04 kw60340 kernel: EDAC MC1: 0 CE memory read error on CPU_SrcID#0_MC#1_Chan#0_DIMM#0 (channel:0 slot:0 page:0x5693d93 offset:0xf80 grain:32 syndrome:0x0 - err_code:0000:009f socket:0 imc:1 rank:1 bg:2 ba:0 row:1c6de col:3f8)
Dec 2 01:40:04 kw60340 mcelog: Hardware event. This is not a software error.
Dec 2 01:40:04 kw60340 mcelog: MCE 0
Dec 2 01:40:04 kw60340 mcelog: CPU 111 BANK 1 TSC 80cccf0fa16f6
Dec 2 01:40:04 kw60340 mcelog: ADDR 5693d91b00
Dec 2 01:40:04 kw60340 mcelog: TIME 1543704004 Sun Dec 2 01:40:04 2018
Dec 2 01:40:04 kw60340 mcelog: MCG status:
Dec 2 01:40:04 kw60340 mcelog: MCi status:
Dec 2 01:40:04 kw60340 mcelog: Corrected error
Dec 2 01:40:04 kw60340 mcelog: Error enabled
Dec 2 01:40:04 kw60340 mcelog: MCi_ADDR register valid
Dec 2 01:40:04 kw60340 mcelog: MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Dec 2 01:40:04 kw60340 mcelog: Transaction: Memory read error
Dec 2 01:40:04 kw60340 mcelog: STATUS 940000000000009f MCGSTATUS 0
Dec 2 01:40:04 kw60340 mcelog: MCGCAP f000814 APICID 7d SOCKETID 1
Dec 2 01:40:04 kw60340 mcelog: PPIN c691d0b8595dc287
Dec 2 01:40:04 kw60340 mcelog: CPUID Vendor Intel Family 6 Model 85
Dec 2 01:40:04 kw60340 mcelog: Hardware event. This is not a software error.
Dec 2 01:40:04 kw60340 mcelog: MCE 1
Dec 2 01:40:04 kw60340 mcelog: CPU 111 BANK 1 TSC 80cccf0faad50
Dec 2 01:40:04 kw60340 mcelog: ADDR 5693d93f80
Dec 2 01:40:04 kw60340 mcelog: TIME 1543704004 Sun Dec 2 01:40:04 2018
Dec 2 01:40:04 kw60340 mcelog: MCG status:
Dec 2 01:40:04 kw60340 mcelog: MCi status:
Dec 2 01:40:04 kw60340 mcelog: Corrected error
Dec 2 01:40:04 kw60340 mcelog: Error enabled
Dec 2 01:40:04 kw60340 mcelog: MCi_ADDR register valid
Dec 2 01:40:04 kw60340 mcelog: MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Dec 2 01:40:04 kw60340 mcelog: Transaction: Memory read error
Dec 2 01:40:04 kw60340 mcelog: STATUS 940000000000009f MCGSTATUS 0
Dec 2 01:40:04 kw60340 mcelog: MCGCAP f000814 APICID 7d SOCKETID 1
Dec 2 01:40:04 kw60340 mcelog: PPIN c691d0b8595dc287
Dec 2 01:40:04 kw60340 mcelog: CPUID Vendor Intel Family 6 Model 85