What are disk partitions?
Status: Done
Confidence: Very likely
The first time someone tries to install Linux they often have a
moment of panic when they are faced with partitioning disks. They are
asked about /dev/sda1
, filesystems, and whether the disk
should be labelled an MBR or a GPT. This is a lot of gobbledygook for
what are fairly simple concepts.
Let’s start with the /dev/sda
part. A computer has some
number of disks attached to it by various means. Each operating system
refers to these disks differently, but that is purely a question of
labels. For example, the first three disks attached to a system would be
referred to as:
FreeBSD, macOS | Linux | Windows |
---|---|---|
/dev/disk0 |
/dev/sda 1 |
\\?\Device\Harddisk0 |
/dev/disk1 |
/dev/sdb |
\\?\Device\Harrdisk1 |
/dev/disk2 |
/dev/sdc |
\\?\Device\Harddisk2 |
… | … | … |
You might think that the BSD and Windows naming schemes seem more sensible than Linux’s, and you would be right, but our industry is full of cruft. Even this notion of partitions, we will later is, is kind of historical cruft.
When a computer is turned on and detects a disk, what does it know about it? The disk is divided up into sectors, numbered from 0 to whatever. Basically all modern hard disks have 4096 bytes per sector, but older ones and CDs and DVDs will have smaller ones. The total number of sectors varies with the size of the disk (and the size of the sector on the disk, obviously). When the computer reads from the disk, it transfers a whole sector into RAM. Writing to the disk transfers one sector of data from RAM onto a sector of the disk.
When we turn on a computer and, in looking for an operating system to boot, it finds a disk, how does it know what to do with it? We need to leave a description of what is where on the disk somewhere that the computer knows to look for it. The exact form of that description depends on the kind of computer, which is why a Windows machine cannot make heads or tails of a floppy disk formatted for an old Macintosh and a hard disk configured on an old Sun workstation is unreadable on a MacBook Pro, but on nearly every machine it’s written at sector 0. This special sector is called the boot sector.
The space on disks is partitioned up into contiguous ranges of sectors called partitions (or, on some systems, volumes), and the start and end sector of each partition, plus some other information about it, is stored in the boot sector. The partitions are usually numbered sequentially, so the first three partitions on the third disk would be referred to as
FreeBSD, macOS | Linux | Windows |
---|---|---|
/dev/disk2s1 |
/dev/sdc1 |
\\?\Device\Harddisk2\Partition0 |
/dev/disk2s2 |
/dev/sdc2 |
\\?\Device\Harrdisk2\Partition1 |
/dev/disk2s3 |
/dev/sdc3 |
\\?\Device\Harddisk2\Partition2 |
… | … | … |
The computer, after we turned it on, has found the boot sector and partitions. How does it boot an operating system? One of the things recorded about each partition in the boot sector is whether it is bootable, that is, whether the first sector of the partition is of a form that the computer can use to start booting an operating system. If it is, then the computer can load that first sector of the partition into memory and start executing it, and the instructions therein lay out how to load the operating system and finish booting the computer into a useful environment.
This is as much as the computer without an operating system can do. It has no notion of files or directories. To interpret the raw sequence of sectors that make up a partition as files and directories, we need some convention to be able to interpret the data in the sectors. That convention is called a filesystem, and there have been many, many filesystems over the years.
Some important file system lineages that you might see:
- FAT: This is the family of filesystems used for floppy disks, and later for Windows 95, 98, and Windows ME machines. FAT32 is still a common format for external hard disks.
- NTFS: Windows NT’s filesystem, which became the default for all Windows systems starting with Windows 2000.
- ISO-9660: This is the filesystem used on CD-ROMs.
- ext: The family of filesystems used by Linux. As of 2021, ext4 is the current incarnation.
- UFS: The traditional Unix filesystem, still common on FreeBSD and other BSD systems, and early versions of MacOS X.
- HFS: HFS and HFS+ were the filesystems for classic (pre-OS X) Macintosh.
- AFS: Apple Filesystem, which replaced UFS on MacOS X.
- XFS: A high performance filesystem developed by Silicon Graphics, Inc., which was common for database deployments, though ext4 now often matches its performance.
The boot sector and first sector of a partition are particular to a computer architecture. A disk partitioned for an old Sun workstation can’t be used by an IBM PC to boot an operating system. Filesystems, on the other hand, are implemented by the operating system, so they can cross machine architectures.
Now, if your operating system is tiny and fits in a sector, putting
it in the first sector of a partition is fine, but most operating
systems today are not a raw sector. They are a bunch of files. So how do
we get from the computer executing the first sector of a partition to
load files? Operating systems today use a small program called a
bootloader. It usually can read from the filesystem2, and looks in a conventional path,
such as /boot/vmlinuz
on Linux, to find the operating
system’s kernel to run, and the operating system defines its own
convention for how the bootloader should start the rest of the
system.
So, if we run cfdisk
or some other disk partitioning
program and see output like
Device Start End Sectors Size Type
/dev/sda1 0 12207000 12207000 50G Linux filesystem
/dev/sda2 12207001 14160120 1953120 8G Linux swap
/dev/sda3 14160121 50781120 36621000 150G Linux filesystem
/dev/sda4 50781121 87402120 36621000 150G NTFS
We can easily interpret it. These are four partitions on the first
disk attached to the computer. They have their start and end sectors,
and the size expressed in gigabytes. “Type” is kind of the filesystem.
“Linux filesystem” on modern machines means the latest filesystem in the
ext lineage (today it is ext4). “Linux swap” is a partition that used by
the Linux kernel for swap.3 When you see two Linux
filesystem partitions, one larger than the other, usually the smaller
one is the operating system and programs, and the second is where data
is stored. It might be mounted at /home
so that the
operating system on /dev/sda1
can be replaced without
losing the users’ data on /dev/sda3
, or if this is a
database server or a log aggregation server, it might be where those
systems keep their data. The last partition is NTFS, almost certainly a
Windows installation.
There are a couple of other peculiar entries you may encounter. The first is an EFI filesystem.
Device Start End Sectors Size Type
/dev/sda1 2048 1050623 1048576 512M EFI System
/dev/sda2 1050624 32281630 31231007 14.9G Linux filesystem
It’s always 512MB in size and starts at sector 2048. This is storage for information about booting the system for systems with UEFI firmware (which is most recent PCs). Leave it alone.
On older systems you may see a type called Extended.
Device Start End Sectors Size Type
/dev/sda1 0 12207000 12207000 50G Linux filesystem
/dev/sda2 12207001 50781120 38574120 158G Extended
|-/dev/sda5 12207001 14160120 1953120 8G Linux swap
|-/dev/sda6 14160121 50781120 36621000 150G Linux filesystem
/dev/sda4 50781121 87402120 36621000 150G NTFS
Notice that, visually, it looks like /dev/sda5 and /dev/sda6 are inside /dev/sda2. This is exactly what’s going on. On older PCs, the boot sector is a crufty old format called the Master Boot Record (MBR), which only has slots for four partitions. The Extended partition type isn’t really a partition. Instead, you put a special entry in one of those partition slots that lets you split up its space into more partitions. So /dev/sda5 and /dev/sda6 occupy the space allocated for /dev/sda2. Partitions that the MBR understands are called primary partitions, and the ones inside extended partitions are called logical partitions. Also note that the logical partitions are numbered 5 and 6. 1 to 4 are reserved for the primary partitions, then the logical partitions are numbered sequentially after that.
Fortunately most disks today use a boot sector format called GUID Partition Table (GPT), which doesn’t have this problem.
Epilogue: This layering of boot sectors defining partitions and partitions containing file systems isn’t the only way to do it. To see another approach, look at ZFS, the Zettabyte Filesystem.
If you use ZFS (or its kin btrfs) as the filesystem to hold your
operating system, then it lives in a partition like ext4 or XFS. But ZFS
was designed, among other things, to store data on large numbers of
disks. In this case you would have the disk with your operating system
that is partitioned as we have described,4 and
then a set of other disks for data. Those disks are not partitioned at
all. The whole, raw disk is given to ZFS, including sector 0. There is
nothing special about sector 0 physically. It’s just a sector on the
disk. So ZFS ignores what a computer trying to boot expects to see.
Partitioning tools like cfdisk
and gparted
can’t do anything with sector 0 on these disks, since there is no boot
sector for them to interpret.
Instead, ZFS takes all the disks you allocate to it and treats them as pools of storage. You define logical volumes on these pools. For example, you can tell ZFS that you want this volume to have three replicas on different disks in case one fails, and that volume should have its data cached on a fast SSD, and a whole variety of different things. The layers of boot sector, partitions, and filesystem are entirely merged. This turns out to be an awesome idea.
sd
stands for “SCSI disk.” SCSI is a long-lived standard for attaching disks to computers. On old systems you might also seehd
for IDE hard disks.↩︎Reading from a filesystem is much simpler than writing to it, so this doesn’t actually add that much size to the bootloader.↩︎
Linux conventionally uses a separate partition for its swap. Windows conventionally uses a file in one of its partitions with filesystems for its swap. Both can be configured to do the opposite, and it’s fine. What is not fine is when someone configures the swap to be on a network mounted filesystem, and the system hangs waiting for a network request every time it tries to swap data in and out of memory.↩︎
There is nothing fundamental about booting from partitions. Sun Microsystems, the company that created ZFS, set up the machines they sold to be able to read ZFS and boot from it without a layer of partitions underneath. The x86 and ARM machines we use today, though, were designed to boot from partitions, so that is what we continue to do.↩︎