What are disk partitions?

The first time someone tries to install Linux they often have a moment of panic when they are faced with partitioning disks. They are asked about /dev/sda1, filesystems, and whether the disk should be labelled an MBR or a GPT. This is a lot of gobbledygook for what are fairly simple concepts.

Let’s start with the /dev/sda part. A computer has some number of disks attached to it by various means. Each operating system refers to these disks differently, but that is purely a question of labels. For example, the first three disks attached to a system would be referred to as:

FreeBSD, macOS	Linux	Windows
`/dev/disk0`	`/dev/sda`¹	`\\?\Device\Harddisk0`
`/dev/disk1`	`/dev/sdb`	`\\?\Device\Harrdisk1`
`/dev/disk2`	`/dev/sdc`	`\\?\Device\Harddisk2`
…	…	…

You might think that the BSD and Windows naming schemes seem more sensible than Linux’s, and you would be right, but our industry is full of cruft. Even this notion of partitions, we will later is, is kind of historical cruft.

When a computer is turned on and detects a disk, what does it know about it? The disk is divided up into sectors, numbered from 0 to whatever. Basically all modern hard disks have 4096 bytes per sector, but older ones and CDs and DVDs will have smaller ones. The total number of sectors varies with the size of the disk (and the size of the sector on the disk, obviously). When the computer reads from the disk, it transfers a whole sector into RAM. Writing to the disk transfers one sector of data from RAM onto a sector of the disk.

When we turn on a computer and, in looking for an operating system to boot, it finds a disk, how does it know what to do with it? We need to leave a description of what is where on the disk somewhere that the computer knows to look for it. The exact form of that description depends on the kind of computer, which is why a Windows machine cannot make heads or tails of a floppy disk formatted for an old Macintosh and a hard disk configured on an old Sun workstation is unreadable on a MacBook Pro, but on nearly every machine it’s written at sector 0. This special sector is called the boot sector.

The space on disks is partitioned up into contiguous ranges of sectors called partitions (or, on some systems, volumes), and the start and end sector of each partition, plus some other information about it, is stored in the boot sector. The partitions are usually numbered sequentially, so the first three partitions on the third disk would be referred to as

FreeBSD, macOS	Linux	Windows
`/dev/disk2s1`	`/dev/sdc1`	`\\?\Device\Harddisk2\Partition0`
`/dev/disk2s2`	`/dev/sdc2`	`\\?\Device\Harrdisk2\Partition1`
`/dev/disk2s3`	`/dev/sdc3`	`\\?\Device\Harddisk2\Partition2`
…	…	…

The computer, after we turned it on, has found the boot sector and partitions. How does it boot an operating system? One of the things recorded about each partition in the boot sector is whether it is bootable, that is, whether the first sector of the partition is of a form that the computer can use to start booting an operating system. If it is, then the computer can load that first sector of the partition into memory and start executing it, and the instructions therein lay out how to load the operating system and finish booting the computer into a useful environment.

This is as much as the computer without an operating system can do. It has no notion of files or directories. To interpret the raw sequence of sectors that make up a partition as files and directories, we need some convention to be able to interpret the data in the sectors. That convention is called a filesystem, and there have been many, many filesystems over the years.

Some important file system lineages that you might see:

FAT: This is the family of filesystems used for floppy disks, and later for Windows 95, 98, and Windows ME machines. FAT32 is still a common format for external hard disks.
NTFS: Windows NT’s filesystem, which became the default for all Windows systems starting with Windows 2000.
ISO-9660: This is the filesystem used on CD-ROMs.
ext: The family of filesystems used by Linux. As of 2021, ext4 is the current incarnation.
UFS: The traditional Unix filesystem, still common on FreeBSD and other BSD systems, and early versions of MacOS X.
HFS: HFS and HFS+ were the filesystems for classic (pre-OS X) Macintosh.
AFS: Apple Filesystem, which replaced UFS on MacOS X.
XFS: A high performance filesystem developed by Silicon Graphics, Inc., which was common for database deployments, though ext4 now often matches its performance.

The boot sector and first sector of a partition are particular to a computer architecture. A disk partitioned for an old Sun workstation can’t be used by an IBM PC to boot an operating system. Filesystems, on the other hand, are implemented by the operating system, so they can cross machine architectures.

Now, if your operating system is tiny and fits in a sector, putting it in the first sector of a partition is fine, but most operating systems today are not a raw sector. They are a bunch of files. So how do we get from the computer executing the first sector of a partition to load files? Operating systems today use a small program called a bootloader. It usually can read from the filesystem², and looks in a conventional path, such as /boot/vmlinuz on Linux, to find the operating system’s kernel to run, and the operating system defines its own convention for how the bootloader should start the rest of the system.

So, if we run cfdisk or some other disk partitioning program and see output like

Device         Start       End    Sectors    Size  Type
/dev/sda1          0  12207000   12207000     50G  Linux filesystem
/dev/sda2   12207001  14160120    1953120      8G  Linux swap
/dev/sda3   14160121  50781120   36621000    150G  Linux filesystem
/dev/sda4   50781121  87402120   36621000    150G  NTFS

We can easily interpret it. These are four partitions on the first disk attached to the computer. They have their start and end sectors, and the size expressed in gigabytes. “Type” is kind of the filesystem. “Linux filesystem” on modern machines means the latest filesystem in the ext lineage (today it is ext4). “Linux swap” is a partition that used by the Linux kernel for swap.³ When you see two Linux filesystem partitions, one larger than the other, usually the smaller one is the operating system and programs, and the second is where data is stored. It might be mounted at /home so that the operating system on /dev/sda1 can be replaced without losing the users’ data on /dev/sda3, or if this is a database server or a log aggregation server, it might be where those systems keep their data. The last partition is NTFS, almost certainly a Windows installation.

There are a couple of other peculiar entries you may encounter. The first is an EFI filesystem.

Device         Start       End    Sectors    Size  Type
/dev/sda1       2048   1050623    1048576    512M  EFI System
/dev/sda2    1050624  32281630   31231007   14.9G  Linux filesystem

It’s always 512MB in size and starts at sector 2048. This is storage for information about booting the system for systems with UEFI firmware (which is most recent PCs). Leave it alone.

On older systems you may see a type called Extended.

Device         Start       End    Sectors    Size  Type
/dev/sda1          0  12207000   12207000     50G  Linux filesystem
/dev/sda2   12207001  50781120   38574120    158G  Extended
|-/dev/sda5 12207001  14160120    1953120      8G  Linux swap
|-/dev/sda6 14160121  50781120   36621000    150G  Linux filesystem
/dev/sda4   50781121  87402120   36621000    150G  NTFS

Notice that, visually, it looks like /dev/sda5 and /dev/sda6 are inside /dev/sda2. This is exactly what’s going on. On older PCs, the boot sector is a crufty old format called the Master Boot Record (MBR), which only has slots for four partitions. The Extended partition type isn’t really a partition. Instead, you put a special entry in one of those partition slots that lets you split up its space into more partitions. So /dev/sda5 and /dev/sda6 occupy the space allocated for /dev/sda2. Partitions that the MBR understands are called primary partitions, and the ones inside extended partitions are called logical partitions. Also note that the logical partitions are numbered 5 and 6. 1 to 4 are reserved for the primary partitions, then the logical partitions are numbered sequentially after that.

Fortunately most disks today use a boot sector format called GUID Partition Table (GPT), which doesn’t have this problem.

Epilogue: This layering of boot sectors defining partitions and partitions containing file systems isn’t the only way to do it. To see another approach, look at ZFS, the Zettabyte Filesystem.

If you use ZFS (or its kin btrfs) as the filesystem to hold your operating system, then it lives in a partition like ext4 or XFS. But ZFS was designed, among other things, to store data on large numbers of disks. In this case you would have the disk with your operating system that is partitioned as we have described,⁴ and then a set of other disks for data. Those disks are not partitioned at all. The whole, raw disk is given to ZFS, including sector 0. There is nothing special about sector 0 physically. It’s just a sector on the disk. So ZFS ignores what a computer trying to boot expects to see. Partitioning tools like cfdisk and gparted can’t do anything with sector 0 on these disks, since there is no boot sector for them to interpret.

Instead, ZFS takes all the disks you allocate to it and treats them as pools of storage. You define logical volumes on these pools. For example, you can tell ZFS that you want this volume to have three replicas on different disks in case one fails, and that volume should have its data cached on a fast SSD, and a whole variety of different things. The layers of boot sector, partitions, and filesystem are entirely merged. This turns out to be an awesome idea.

sd stands for “SCSI disk.” SCSI is a long-lived standard for attaching disks to computers. On old systems you might also see hd for IDE hard disks.↩︎
Reading from a filesystem is much simpler than writing to it, so this doesn’t actually add that much size to the bootloader.↩︎
Linux conventionally uses a separate partition for its swap. Windows conventionally uses a file in one of its partitions with filesystems for its swap. Both can be configured to do the opposite, and it’s fine. What is not fine is when someone configures the swap to be on a network mounted filesystem, and the system hangs waiting for a network request every time it tries to swap data in and out of memory.↩︎
There is nothing fundamental about booting from partitions. Sun Microsystems, the company that created ZFS, set up the machines they sold to be able to read ZFS and boot from it without a layer of partitions underneath. The x86 and ARM machines we use today, though, were designed to boot from partitions, so that is what we continue to do.↩︎

« Back to Programming | Home