[[ch-troubleshooting]]
== Troubleshooting and error recovery

This chapter describes tasks to be performed in the event of hardware
or system failures.

[[s-hard-drive-failure]]

=== Dealing with hard drive failure

indexterm:[drive failure]indexterm:[disk failure]How to deal with hard
drive failure depends on the way DRBD is configured to handle disk I/O
errors (see <<s-handling-disk-errors>>), and on the type of meta data
configured (see <<s-metadata>>).

NOTE: For the most part, the steps described here apply only if you
run DRBD directly on top of physical hard drives. They generally do
not apply in case you are running DRBD layered on top of

* an MD software RAID set (in this case, use `mdadm` to manage drive
  replacement),
* device-mapper RAID (use `dmraid`),
* a hardware RAID appliance (follow the vendor's instructions on how
  to deal with failed drives),
* some non-standard device-mapper virtual block devices (see the
  device mapper documentation).

[[s-detach-hard-drive-manual]]
==== Manually detaching DRBD from your hard drive

indexterm:[drbdadm]If DRBD is <<fp-io-error-pass-on,configured to pass
on I/O errors>> (not recommended), you must first detach the DRBD
resource, that is, disassociate it from its backing storage:

----------------------------
drbdadm detach <resource>
----------------------------

By running the `drbdadm dstate` command, you will now be able to verify
that the resource is now in indexterm:[diskless
mode]indexterm:[diskless (disk state)]indexterm:[disk state]_diskless
mode_:

----------------------------
drbdadm dstate <resource>
Diskless/UpToDate
----------------------------

If the disk failure has occured on your primary node, you may combine
this step with a switch-over operation.

[[s-detach-hard-drive-auto]]
==== Automatic detach on I/O error

If DRBD is <<fp-io-error-detach,configured to automatically detach
upon I/O error>> (the recommended option), DRBD should have
automatically detached the resource from its backing storage already,
without manual intervention. You may still use the `drbdadm dstate`
command to verify that the resource is in fact running in diskless
mode.

[[s-replace-disk-internal-metadata]]
==== Replacing a failed disk when using internal meta data

If using <<s-internal-meta-data,internal meta data>>, it is sufficient
to bind the DRBD device to the new hard disk. If the new hard disk has
to be addressed by another Linux device name than the defective disk,
this has to be modified accordingly in the DRBD configuration file.

This process involves creating a new meta data set, then re-attaching
the resource: indexterm:[drbdadm]

----------------------------
drbdadm create-md <resource>
v08 Magic number not found
Writing meta data...
initialising activity log
NOT initializing bitmap
New drbd meta data block sucessfully created.

drbdadm attach <resource>
----------------------------

Full synchronization of the new hard disk starts instantaneously and
automatically. You will be able to monitor the synchronization's
progress via +/proc/drbd+, as with any background synchronization.

[[s-replace-disk-external-metadata]]
==== Replacing a failed disk when using external meta data

When using <<s-external-meta-data,external meta data>>, the procedure
is basically the same. However, DRBD is not able to recognize
independently that the hard drive was swapped, thus an additional step
is required.

----------------------------
drbdadm create-md <resource>
v08 Magic number not found
Writing meta data...
initialising activity log
NOT initializing bitmap
New drbd meta data block sucessfully created.

drbdadm attach <resource>
drbdadm invalidate <resource>
----------------------------

Here, the `drbdadm invalidate` command triggers synchronization. Again,
sync progress may be observed via +/proc/drbd+.

[[s-node-failure]]
=== Dealing with node failure

indexterm:[node failure]When DRBD detects that its peer node is down
(either by true hardware failure or manual intervention), DRBD changes
its connection state from +Connected+ to +WFConnection+ and waits for
the peer node to re-appear. The DRBD resource is then said to operate
in _disconnected mode_. In disconnected mode, the resource and its
associated block device are fully usable, and may be promoted and
demoted as necessary, but no block modifications are being replicated
to the peer node. Instead, DRBD stores internal information on which
blocks are being modified while disconnected.

[[s-temp-node-failure-secondary]]
==== Dealing with temporary secondary node failure

indexterm:[node failure]If a node that currently has a resource in the
secondary role fails temporarily (due to, for example, a memory
problem that is subsequently rectified by replacing RAM), no further
intervention is necessary — besides the obvious necessity to repair
the failed node and bring it back on line. When that happens, the two
nodes will simply re-establish connectivity upon system
start-up. After this, DRBD replicates all modifications made on the
primary node in the meantime, to the secondary node.

IMPORTANT: At this point, due to the nature of DRBD's
re-synchronization algorithm, the resource is briefly inconsistent on
the secondary node. During that short time window, the secondary node
can not switch to the Primary role if the peer is unavailable. Thus,
the period in which your cluster is not redundant consists of the
actual secondary node down time, plus the subsequent
re-synchronization.

[[s-temp-node-failure-primary]]
==== Dealing with temporary primary node failure

indexterm:[node failure]From DRBD's standpoint, failure of the primary
node is almost identical to a failure of the secondary node. The
surviving node detects the peer node's failure, and switches to
disconnected mode. DRBD does _not_ promote the surviving node to the
primary role; it is the cluster management application's
responsibility to do so.

When the failed node is repaired and returns to the cluster, it does
so in the secondary role, thus, as outlined in the previous section,
no further manual intervention is necessary. Again, DRBD does not
change the resource role back, it is up to the cluster manager to do
so (if so configured).

DRBD ensures block device consistency in case of a primary node
failure by way of a special mechanism. For a detailed discussion,
refer to <<s-activity-log>>.

[[s-perm-node-failure]]
==== Dealing with permanent node failure

indexterm:[node failure]If a node suffers an unrecoverable problem or
permanent destruction, you must follow the following steps:

* Replace the failed hardware with one with similar performance and
  disk capacity.

NOTE: Replacing a failed node with one with worse performance
characteristics is possible, but not recommended. Replacing a failed
node with one with less disk capacity is not supported, and will cause
DRBD to refuse to connect to the replaced node.

* Install the base system and applications.
* Install DRBD and copy +/etc/drbd.conf+ and all of +/etc/drbd.d/+
  from the surviving node.
* Follow the steps outlined in <<ch-configure>>, but stop short of
 <<s-initial-full-sync>>.

Manually starting a full device synchronization is not necessary at
this point, it will commence automatically upon connection to the
surviving primary node.

[[s-resolve-split-brain]]
=== Manual split brain recovery

indexterm:[split brain]DRBD detects split brain at the time
connectivity becomes available again and the peer nodes exchange the
initial DRBD protocol handshake. If DRBD detects that both nodes are
(or were at some point, while disconnected) in the primary role, it
immediately tears down the replication connection. The tell-tale sign
of this is a message like the following appearing in the system log:

----------------------------
Split-Brain detected, dropping connection!
----------------------------

After split brain has been detected, one node will always have the
resource in a indexterm:[StandAlone (connection
state)]indexterm:[connection state]+StandAlone+ connection state. The
other might either also be in the +StandAlone+ state (if both nodes
detected the split brain simultaneously), or in
indexterm:[WFConnection (connection state)] indexterm:[connection
state]+WFConnection+ (if the peer tore down the connection before the
other node had a chance to detect split brain).

At this point, unless you configured DRBD to automatically recover
from split brain, you must manually intervene by selecting one node
whose modifications will be discarded (this node is referred to as the
indexterm:[split brain]_split brain victim_). This intervention is
made with the following commands:

[NOTE]
===========================
The split brain victim needs to be in the connection state of
+StandAlone+ or the following commands will return an error.
You can ensure it is standalone by issuing:

----------
drbdadm disconnect <resource>
----------
===========================

----------------------------
drbdadm secondary <resource>
drbdadm connect --discard-my-data <resource>
----------------------------

On the other node (the indexterm:[split brain]_split brain survivor_),
if its connection state is also +StandAlone+, you would enter:

----------------------------
drbdadm connect <resource>
----------------------------

You may omit this step if the node is already in the
+WFConnection+ state; it will then reconnect automatically.

If the resource affected by the split brain is a
<<s-three-nodes,stacked resource>>, use `drbdadm --stacked` instead
of just `drbdadm`.

Upon connection, your split brain victim immediately changes its
connection state to +SyncTarget+, and has its modifications
overwritten by the remaining primary node.

NOTE: The split brain victim is not subjected to a full device
synchronization. Instead, it has its local modifications rolled back,
and any modifications made on the split brain survivor propagate to
the victim.

After re-synchronization has completed, the split brain is considered
resolved and the two nodes form a fully consistent, redundant
replicated storage system again.
