[[ch-fundamentals]]
== DRBD Fundamentals

The Distributed Replicated Block Device (DRBD) is a software-based,
shared-nothing, replicated storage solution mirroring the content of
block devices (hard disks, partitions, logical volumes etc.) between
hosts.

DRBD mirrors data

* *in real time*. Replication occurs continuously while applications
  modify the data on the device.

* *transparently*. Applications need not be aware that the data is stored on
  multiple hosts.

* *synchronously* or *asynchronously*. With synchronous mirroring, applications
  are notified of write completions after the writes have been carried out on
  all hosts. With asynchronous mirroring, applications are notified of write
  completions when the writes have completed locally, which usually is before
  they have propagated to the other hosts.


[[s-kernel-module]]
=== Kernel module

DRBD's core functionality is implemented by way of a Linux kernel
module. Specifically, DRBD constitutes a driver for a virtual block
device, so DRBD is situated right near the bottom of a system's I/O
stack. Because of this, DRBD is extremely flexible and versatile,
which makes it a replication solution suitable for adding high
availability to just about any application.

DRBD is, by definition and as mandated by the Linux kernel
architecture, agnostic of the layers above it. Thus, it is impossible
for DRBD to miraculously add features to upper layers that these do
not possess. For example, DRBD cannot auto-detect file system
corruption or add active-active clustering capability to file systems
like ext3 or XFS.

[[f-drbd-linux-io-stack]]
.DRBD's position within the Linux I/O stack
image::drbd-in-kernel[scaledwidth="100%"]

[[s-userland]]
=== User space administration tools ===

DRBD comes with a set of administration tools which communicate with the
kernel module in order to configure and administer DRBD resources.

.+drbdadm+
The high-level administration tool of the DRBD program suite.  Obtains all DRBD
configuration parameters from the configuration file +/etc/drbd.conf+ and acts
as a front-end for +drbdsetup+ and +drbdmeta+.  +drbdadm+ has a _dry-run_ mode,
invoked with the +-d+ option, that shows which +drbdsetup+ and +drbdmeta+ calls
+drbdadm+ would issue without actually calling those commands.

.+drbdsetup+
Configures the DRBD module loaded into the kernel. All parameters to
+drbdsetup+ must be passed on the command line. The separation between
+drbdadm+ and +drbdsetup+ allows for maximum flexibility.  Most users will
rarely need to use +drbdsetup+ directly, if at all.

.+drbdmeta+
Allows to create, dump, restore, and modify DRBD meta data structures. Like
+drbdsetup+, most users will only rarely need to use +drbdmeta+ directly.

[[s-resources]]
=== Resources ===

In DRBD, _resource_ is the collective term that refers to all aspects of
a particular replicated data set. These include:

.Resource name
This can be any arbitrary, US-ASCII name not containing whitespace by
which the resource is referred to.

.Volumes
Any resource is a replication group consisting of one of more
_volumes_ that share a common replication stream. DRBD ensures write
fidelity across all volumes in the resource. Volumes are numbered
starting with +0+, and there may be up to 65,535 volumes in one
resource. A volume contains the replicated data set, and a set of
metadata for DRBD internal use.

At the +drbdadm+ level, a volume within a resource can be addressed by the
resource name and volume number as <resource>/<volume>.

// At the +drbdsetup+ level, a volume is addressed by its device minor number.
// At the +drbdmeta+ level, a volume is addressed by the name of the underlying
// device.

// FIXME: Users don't care which major device number is assigned to DRBD.
// Likewise, they don't care about minor device numbers if they don't have to.
// We refer to device as /dev/drbdX almost everywhere, so do we have to mention
// minors here at all?

.DRBD device
This is a virtual block device managed by DRBD. It has a device major
number of 147, and its minor numbers are numbered from 0 onwards, as
is customary. Each DRBD device corresponds to a volume in a
resource. The associated block device is usually named
+/dev/drbdX+, where +X+ is the device minor number. DRBD also allows
for user-defined block device names which must, however, start with
+drbd_+.

NOTE: Very early DRBD versions hijacked NBD's device major number
43. This is long obsolete; 147 is the
http://www.lanana.org/docs/device-list/[LANANA-registered] DRBD device
major.

.Connection
A _connection_ is a communication link between two hosts that share a
replicated data set.  As of the time of this writing, each resource involves
only two hosts and exactly one connection between these hosts, so for the most
part, the terms +resource+ and +connection+ can be used interchangeably.

At the +drbdadm+ level, a connection is addressed by the resource name.

// At the +drbdsetup+ level, a connection is addressed by its two replication
// endpoints identified by address family (optional), address (required), and
// port (optional).

[[s-resource-roles]]
=== Resource roles ===

In DRBD, every <<s-resources,resource>> has a role, which may be
_Primary_ or _Secondary_.

NOTE: The choice of terms here is not arbitrary. These roles were
deliberately not named "Active" and "Passive" by DRBD's
creators. Primary vs. secondary refers to a concept related to
availability of _storage_, whereas active vs. passive refers to the
availability of an _application_. It is usually the case in a
high-availability environment that the primary node is also the active
one, but this is by no means necessary.

* A DRBD device in the primary role can be used unrestrictedly for
  read and write operations. It may be used for creating and mounting
  file systems, raw or direct I/O to the block device, etc.

* A DRBD device in the secondary role receives all updates from the
  peer node's device, but otherwise disallows access completely. It
  can not be used by applications, neither for read nor write
  access. The reason for disallowing even read-only access to the
  device is the necessity to maintain cache coherency, which would be
  impossible if a secondary resource were made accessible in any way.

The resource's role can, of course, be changed, either by
<<s-switch-resource-roles,manual intervention>> or by way of some
automated algorithm by a cluster management application. Changing the
resource role from secondary to primary is referred to as _promotion_,
whereas the reverse operation is termed _demotion_.
