在Linux中介绍存储术语和概念

1 年 ago

科, 雅

5 minutes

简介

Linux在管理硬件设备（包括存储驱动器）方面拥有强大的系统和工具。在本文中，我们将从宏观角度介绍Linux如何表示这些设备，以及如何将原始存储转换为服务器上可用的空间。

什么是区块存储？

块存储是Linux内核对块设备的另一个称呼。块设备是一种可以用来存储数据的硬件，例如传统的旋转硬盘驱动器（HDD）、固态硬盘驱动器（SSD）、闪存盘等等。它被称为块设备，是因为内核通过引用固定大小的块或空间块与硬件交互。

换句话说，块存储是您在计算机上想到的常规磁盘存储。一旦设置完成，它就会作为当前文件系统树的扩展，并且您应该能够随意在每个驱动器上进行读写操作。

磁盘分区是什么？

分区是一种将存储驱动器分割为更小的可用单元的方法。分区是存储驱动器的一部分，可以像独立的驱动器一样操作。

分区允许您将可用空间分割，并将每个分区用于不同的目的。这使用户具有更多的灵活性，可以将单个磁盘分割为多个操作系统、交换空间或特殊文件系统。

尽管可以在不进行分区的情况下格式化和使用磁盘，但操作系统通常期望找到一个分区表，即使只有一个分区被写入磁盘。一般建议为新驱动器进行分区以提供更多的灵活性。

MBR与GPT的比较

在分区磁盘时，了解将使用哪种分区格式非常重要。通常，这会取决于选择MBR（主引导记录）和GPT（GUID分区表）之间的选择。

MBR已经超过30年了。因为它的年龄原因，它有一些严重的限制。例如，它无法用于超过2TB大小的磁盘，并且最多只能有四个主分区。

GPT是一种更现代的分区方案，解决了MBR固有的一些问题。运行GPT的系统可以在每个磁盘上拥有更多的分区。通常，这只受操作系统本身的限制。此外，GPT没有磁盘大小限制，并且分区表信息可在多个位置上得到，以防止损坏。GPT还可以编写“保护性MBR”，以便与仅支持MBR的工具兼容。

在大多数情况下，除非您的操作系统不允许使用GPT，否则GPT是更好的选择。

格式化和文件系统

虽然Linux内核可以识别原始磁盘，但必须进行格式化后才能使用。格式化是将文件系统写入磁盘并准备好进行文件操作的过程。文件系统是一种结构化数据和控制信息如何被写入和检索的系统，没有文件系统，您就无法使用存储设备进行任何标准的文件系统操作。

有很多不同的文件系统格式，每种格式都有取舍，包括操作系统的支持。它们都向用户呈现出类似的磁盘表示，但它们支持的功能和平台可能有很大差异。

一些较为流行的Linux文件系统包括：

Ext4: The most popular default filesystem is Ext4, a successor to Ext2 and Ext3. The Ext4 filesystem is journaled, backwards compatible with legacy systems, stable, and has mature support and tooling. It is a good choice if you have no specialized needs.
XFS: XFS specializes in performance and large data files. It formats quickly and has good throughput characteristics when handling large files and when working with large disks. It also has live snapshotting features. XFS uses metadata journaling as opposed to journaling both the metadata and data. This leads to fast performance, but can potentially lead to data corruption in the event of an abrupt power loss.
Btrfs: Btrfs is a modern, feature-rich copy-on-write filesystem. This architecture allows for some volume management functionality to be integrated within the filesystem layer, including snapshots and cloning. It is used by default on some consumer and commercial NAS (networked-attached storage) hardware, and is popular for dedicated, multi-disk arrays
ZFS: ZFS is another copy-on-write filesystem and volume manager with a robust and mature feature set. It competes fairly directly with Btrfs, has data integrity features, can handle large filesystem sizes, has typical volume features like snapshotting and cloning, and can organize volumes into RAID and RAID-like arrays for redundancy and performance purposes. ZFS has a controversial history due to licensing concerns, but it is not much more or less popular than Btrfs when taking into account commercial support.

此外，Windows主要使用NTFS和ExFAT，而macOS主要使用HFS+和APFS。通常情况下，可以在不同的平台上读取这些文件系统格式，有时也可以进行写入操作，但可能需要额外的兼容工具。

Linux如何管理存储设备

设备文件在/dev目录下。

在Linux中，几乎所有东西都以文件的形式存在于文件系统层次结构中。这包括像存储驱动器这样的硬件，在系统中被表示为/dev目录中的文件。通常，表示存储设备的文件以sd或hd紧接着一个字母开头。例如，服务器上的第一个驱动器通常类似于/dev/sda。

这些驱动器上的分区也有/dev中的文件，通过在驱动器名称的末尾添加分区号来表示。例如，前面示例中的驱动器的第一个分区将是/dev/sda1。

虽然/dev/sd* 和 /dev/hd* 设备文件是传统指代驱动器和分区的方式，但只使用这些值存在一个显著的劣势。Linux内核在每次引导时决定设备得到哪个名称，因此可能导致设备节点发生变化，从而带来混淆的情况。

为了解决这个问题，/dev/disk目录包含与系统上不同且更持久的磁盘和分区标识方式相对应的子目录。这些子目录包含在启动时创建的符号链接，指回正确的/dev/[sh]da* 文件。链接的命名方式根据目录的特征进行命名（例如，在/dev/disk/by-partlabel 目录中按照分区标签进行命名）。这些链接将始终指向正确的设备，因此可以用作存储空间的静态标识符。

在/dev/disk下可能存在以下一部分或全部子目录。

by-label: Most filesystems have a labeling mechanism that allows the assignment of arbitrary user-specified names for a disk or partition. This directory consists of links named after these user-supplied labels.
by-uuid: UUIDs, or universally unique identifiers, are a long, unique string of letters and numbers that can be used as an ID for a storage resource. These are generally not very human-readable, but are almost always unique, even across systems. As such, it might be a good idea to use UUIDs to reference storage that may migrate between systems, since naming collisions are less likely.
by-partlabel and by-partuuid: GPT tables offer their own set of labels and UUIDs, which can also be used for identification. This functions in much the same way as the previous two directories, but uses GPT-specific identifiers.
by-id: This directory contains links generated by the hardware’s own serial numbers and the hardware they are attached to. This is not entirely persistent, because the way that the device is connected to the system may change its by-id name.
by-path: Like by-id, this directory relies on a storage device’s connection to the system itself. The links here are constructed using the system’s interpretation of the hardware used to access the device. This has the same drawbacks as by-id as connecting a device to a different port can alter this value.

通常，按标签或按UUID来进行特定设备的持久标识是最佳选择。

Note

注解：Silicon Cloud块存储卷可以控制向操作系统报告的设备序列号。这样可以确保在该平台上，按照ID分类的方法具有可靠的持久性。这是首次启动时指代Silicon Cloud存储卷的首选方法，因为它既持久又可预测。

挂载块设备

在Linux和其他类Unix操作系统中，整个系统无论涉及多少物理设备，都由一个统一的文件树表示。当驱动器或分区上的文件系统需要使用时，必须将其连接到现有的文件树上。挂载是将格式化的分区或驱动器连接到Linux文件系统中的目录的过程。然后可以通过该目录访问驱动器的内容。

硬盘几乎总是安装在专用的空目录上 – 在非空目录上安装意味着该目录的常规内容在该硬盘卸载之前将无法访问。可以设置许多不同的安装选项来改变已安装设备的行为。例如，可以以只读模式安装硬盘，确保其内容不会被修改。

文件系统层次标准建议在临时挂载的文件系统中使用/mnt或其子目录。对于更长久的存储，它没有提出特定挂载位置的推荐，因此您可以自行选择方案。在许多情况下，/mnt或/mnt的子目录也被用于更长久的存储。

使用/etc/fstab使挂载永久化

Linux系统在启动过程中使用一个名为/etc/fstab（文件系统表）的文件来确定要挂载哪些文件系统。如果这个文件中没有某个文件系统的条目，除非由其他软件进行脚本化，否则该文件系统将不会自动挂载。

/etc/fstab 文件的每一行代表着应该挂载的不同文件系统。这一行指定了块设备、挂载点、驱动器的格式以及挂载选项，还包括一些其他的信息。

更复杂的存储管理 de

尽管这些核心功能可以满足许多使用情况，但对于合并多个磁盘的更复杂管理范例，特别是RAID，还有其他可用的选择。

RAID是什么？

RAID的意思是独立磁盘冗余阵列。RAID是一种存储管理和虚拟化技术，它允许您将硬盘组合在一起，并以单个单位进行管理，并具备额外的功能。

一个RAID阵列的特征取决于其RAID级别，该级别定义了阵列中的磁盘之间的关系。一些更常见的级别包括：

RAID 0: This level indicates drive striping. This means that as data is written to the array, it is split up and distributed linearly among the disks in the set. This offers a performance boost as multiple disks can be written to or read from simultaneously. The downside is that a single drive failure can lose all of the data in the entire array, since no one disk contains enough information about the contents to rebuild. RAID 0 is usually never used in production for this reason, though it can be useful as a point of comparison.
RAID 1: RAID 1 indicates drive mirroring. Anything written to a RAID 1 array is written to multiple disks. Its main advantage is data redundancy, which allows data to survive hard drive loss on either side of the mirror. Because multiple drives will contain the exact same data, your usable capacity is reduced by at least half.
RAID 5: RAID 5 stripes data across multiple drives, similar to RAID 0. However, this level also implements a distributed parity across the drives. This means that if a drive fails, the remaining drives can rebuild the array using the parity information shared between them. Usually, this is enough to rebuild one disk, meaning the array can survive any one disk loss. RAID 5 reduces the available space in an array by the capacity of one disk.
RAID 6: RAID 6 has the same properties as RAID 5, but provides double parity. This means that RAID 6 arrays can withstand the loss of any 2 drives. The capacity of the array is again affected by the parity amount, meaning that the usable capacity is reduced by two disks worth of space.
RAID 10: RAID 10 is a combination of levels 1 and 0. First, two sets of mirrored arrays are made. Then, data is striped across them. This creates an array that has some redundancy characteristics while providing good performance. This requires quite a few drives however, and total capacity is still less than half of the combined disk space.

接下来该去哪里呢？

如果您有一个希望在Linux系统中使用的新存储设备，本文将指导您完成对新文件系统进行分区、格式化和挂载的过程。这对于大多数您主要关注扩展容量的使用情况来说应该已经足够。要了解如何执行存储管理任务，请查看《如何在Linux中执行基本的存储设备管理任务》。