Em falla un disc del raid1

Panic! Terror! Aaaghh!

El dimecres passat vaig rebre un email del daemon que controla el raid 1 de dos discs de 120 Gb que tinc montat com a /home del servidor de casa, on està allotjat aquest blog, informant-me de que un dels discs havia fallat:

From: 	mdadm monitoring
Subject: 	Fail event on /dev/md0:s0
Date: 	Wed, 27 Jul 2005 23:12:17 +0200
	
This is an automatically generated mail message from mdadm
running on s0
	
A Fail event had been detected on md device /dev/md0.
	
Faithfully yours, etc.

Fins avui no havia tingut temps de mirar-m’ho amb calma… i sorpresa la mia, estic funcionant només amb un disc!! Això vol dir que si peta el disc, em quedo sense totes les dades que tinc al /home… ara sortiré corrents cap a Rda. St. Antoni a comprar un disc, aixi que el servidor estarà parat aquesta nit mentre el canvii… aprofitaré el downtime per posar-li també una unitat de CD (ara no en té) i una altra X100P per connectar-hi un trunk.

Aquí deixo constància dels passos per comprovar el raid…. com es pot veure hi ha un disk que falla (/dev/hdg):

s0 ~ # lsraid -D -l -a /dev/md0
[dev 33, 0] /dev/hde:
        md version              = 0.90.0
        superblock uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        md minor number         = 0
        created                 = 1068606386 (Wed Nov 12 04:06:26 2003)
        last updated            = 1123003057 (Tue Aug  2 19:17:37 2005)
        raid level              = 1
        chunk size              = 4 KB
        apparent disk size      = 120060800 KB
        disks in array          = 1
        required disks          = 2
        active disks            = 1
        working disks           = 1
        failed disks            = 1
        spare disks             = 0
        position in disk list   = 0
        position in md device   = 0
        state                   = good
	
s0 ~ # lsraid -D -a /dev/md0 -d /dev/hdg
[dev 33, 0] /dev/hde:
        md device       = [dev 9, 0] /dev/md0
        md uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        state           = good
	
[dev 34, 0] /dev/hdg:
        old md device   = [dev 9, 0]
        old md uuid     = 2484EF77.198495F3.E0BD280C.A13A89EF
        state           = unknown

Update: He reiniciat el server sense cambiar cap disc, i he vist que el raid arrancava només amb un disc (/dev/hde), però l’altre disc l’ha detectat correctament la BIOS y el sistema operatiu i el mdadm m’ha informat de que el raid està degradat, es a dir, la informació dels dos discs no està sincronitzada:

s0 ~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Nov 12 04:06:26 2003
     Raid Level : raid1
     Array Size : 120060800 (114.50 GiB 122.94 GB)
    Device Size : 120060800 (114.50 GiB 122.94 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent
	
    Update Time : Tue Aug  2 21:17:24 2005
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
	
           UUID : 2484ef77:198495f3:e0bd280c:a13a89ef
         Events : 0.16568598
	
    Number   Major   Minor   RaidDevice State
       0      33        0        0      active sync   /dev/hde
       1       0        0        -      removed

Així que com aparentment el disc no està cascat, he forçat la reconstrucció del raid:

s0 ~ # raidhotadd /dev/md0 /dev/hdg
	
s0 ~ # dmesg
md: bind<hdg>
RAID1 conf printout:
 --- wd:1 rd:2
 disk 0, wo:0, o:1, dev:hde
 disk 1, wo:1, o:1, dev:hdg
.<6>md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwith (but not more than 200000 KB/sec)
 for reconstruction.
md: using 128k window, over a total of 120060800 blocks.
	
s0 ~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Nov 12 04:06:26 2003
     Raid Level : raid1
     Array Size : 120060800 (114.50 GiB 122.94 GB)
    Device Size : 120060800 (114.50 GiB 122.94 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent
	
    Update Time : Tue Aug  2 21:41:43 2005
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1
	
 Rebuild Status : 12% complete
	
           UUID : 2484ef77:198495f3:e0bd280c:a13a89ef
         Events : 0.16569389
	
    Number   Major   Minor   RaidDevice State
       0      33        0        0      active sync   /dev/hde
       1       0        0        -      removed
	
       2      34        0        1      spare rebuilding   /dev/hdg
	
s0 ~ # lsraid -D -a /dev/md0 -d /dev/hdg
[dev 34, 0] /dev/hdg:
        md device       = [dev 9, 0] /dev/md0
        md uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        state           = spare
	
[dev 33, 0] /dev/hde:
        md device       = [dev 9, 0] /dev/md0
        md uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        state           = good

De moment porta un 12% i sembla que va tot bé… de tota manera he comprat un Maxtor de 200 Gb (7200 RPM 8Mb) per 82 euros (6 euros menys que el que em va costar fa gairebé un any el mateix disc), que igualment posaré al server només per fer backups 😛

Update 2: El raid s’ha acabat de reconstruir correctament, suposo que només devía ser un fallo de sync i que el disc no està afectat.

s0 ~ # dmesg
md: md0: sync done.
RAID1 conf printout:
 --- wd:2 rd:2
 disk 0, wo:0, o:1, dev:hde
 disk 1, wo:0, o:1, dev:hdg
	
s0 ~ # mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Nov 12 04:06:26 2003
     Raid Level : raid1
     Array Size : 120060800 (114.50 GiB 122.94 GB)
    Device Size : 120060800 (114.50 GiB 122.94 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent
	
    Update Time : Wed Aug  3 02:27:38 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
	
           UUID : 2484ef77:198495f3:e0bd280c:a13a89ef
         Events : 0.16578408
	
    Number   Major   Minor   RaidDevice State
       0      33        0        0      active sync   /dev/hde
       1      34        0        1      active sync   /dev/hdg
	
s0 ~ # lsraid -D -l -a /dev/md0
[dev 33, 0] /dev/hde:
        md version              = 0.90.0
        superblock uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        md minor number         = 0
        created                 = 1068606386 (Wed Nov 12 04:06:26 2003)
        last updated            = 1123028858 (Wed Aug  3 02:27:38 2005)
        raid level              = 1
        chunk size              = 4 KB
        apparent disk size      = 120060800 KB
        disks in array          = 2
        required disks          = 2
        active disks            = 2
        working disks           = 2
        failed disks            = 0
        spare disks             = 0
        position in disk list   = 0
        position in md device   = 0
        state                   = good
	
[dev 34, 0] /dev/hdg:
        md version              = 0.90.0
        superblock uuid         = 2484EF77.198495F3.E0BD280C.A13A89EF
        md minor number         = 0
        created                 = 1068606386 (Wed Nov 12 04:06:26 2003)
        last updated            = 1123028928 (Wed Aug  3 02:28:48 2005)
        raid level              = 1
        chunk size              = 4 KB
        apparent disk size      = 120060800 KB
        disks in array          = 2
        required disks          = 2
        active disks            = 2
        working disks           = 2
        failed disks            = 0
        spare disks             = 0
        position in disk list   = 1
        position in md device   = 1
        state                   = good
This entry was posted in linux, pofHQ. Bookmark the permalink.

5 Responses to Em falla un disc del raid1

  1. Juanjo says:

    Jo mai he montat un raid, tota la vida treballant amb només un disc, i MAI m’ha fallat.

    Com es que s’ha fotut el disc? Te molta carrega? Era vell? Ara tinc por!

  2. pof says:

    Pues no se perqué ha fallat, ara ho miraré amb mes calma i faré un update del post quan tingui el sistema restaurat. Era un seagate de 120 Gb que vaig comprar a principis de Novembre del 2003. L’altre disc del raid és exactament igual, comprat la mateixa data i funciona bé.

  3. esteve says:

    Uii quina por!! no facis patir així a la gent! hehe.

  4. pof says:

    tranquil…. q de moment no s’ha perdut cap dada… i después del susto aumentaré la redundància 😉

  5. David says:

    Per desgracia tots els discs durs acaben fallant, només és qüestió de temps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>