How do I store images in my database?

Or how do I store documents or any other blobs of data in my database? The short answer:

Why?

There are two criteria in the chart above: how big are your images and how many of them will you be storing.

Why does it matter how big the images are?

Databases are designed for fast, independent reads and updates of lots of structured data. That takes lots of different forms, but in general it means that you have staging areas where updates are accumulated and as the updates reach a certain size they are consolidated. The staging areas are sized for small data, measured in tens or hundreds of bytes. If you start shoving thirty megapixel landscapes through, you will force nearly continuous consolidation, which is worse than just writing the files to disk outside the database.

How big a blob your database can accept without causing problems depends on the database. For Cassandra, the recommendation is at most one to two megabytes. A MySQL expert I know used to recommend at most 4kb for values using Facebook’s MyRocks storage engine. The exact value depends on your database, the hardware it is running on, and how you have tuned it, but as a rule of thumb unless you have reason to think otherwise, a few kilobytes is a good dividing line.

Why it matters how many images there are

This is a bit more subtle. We can generally store gigabytes of plain files more cheaply than gigabytes in a database. It’s not that databases somehow use more disk to store the same data. Often they use less. But we can often use cheaper disks for files than we do for our database without sacrificing much performance.

To understand why, we need to know a little bit about how disks work. All disks read and write a block at a time, and a block is usually at least 4kb in size. On traditional hard drives, it has to rotate the magnetic platters in the disk to reach the right block before reading or writing it (the time to do this is called seek time). That rotation is the limiting factor. If we barely have to rotate, such as when we read a whole sequence of neighboring positions on a platter, hard disks are limited by how fast they can shove data back to the rest of the computer. Solid state drives, on the other hand, don’t have rotation time. Every position is as fast to access as any other.

File systems generally try to put the blocks of the file’s data in sequence on the platters. For each file we pay a rotation time, but after that the speed won’t be significantly different than on a solid state drive. On the other hand, a database generally accesses a few pieces of data in each lots of different blocks to handle all the queries it is running, and rotation time has a bigger impact on performance.

To put some numbers on it, say we read a 10MB image file (which is a typical image produced by a smartphone today) from disk. A modern hard disk has a seek time of four to ten milliseconds. The file is around 2500 4kb blocks. So for that one file, we will pay one seek time. If a database is accessing 2500 random blocks on the disk, it will pay a seek time for each, at worst an extra ten to twenty five seconds. It’s not actually that bad in practice because we aren’t always as far as possible on the disk from the next block to use, and many blocks get used repeatedly in a short period so the database keeps them in memory and doesn’t go back to the disk every time, and there are lots of other tricks databases play to reduce the amount of seeking they have to do.

Loading lots of small files, such as operating systems loading icons, image previews, and lots of shared libraries, benefits from solid state drives the same way as a database does. When solid state drives first came out (and were very expensive), it was common to have a small solid state drive for the operating system, and a much larger spinning disk for the files you were actually working on.

So: if you have a hundred thousand images of a megabyte or so each (a few hundred blocks), you can save a lot of money on hardware by putting them on spinning disks while you keep your database on solid state drives.

How to put images in external file storage

In most cases the flow chart above will recommend that you store your images on a file system or in a blob store and store the path to the image in your database. This needn’t be complicated. Let’s consider a very simple setup where your web server and database run on a single machine. Nothing changes in principle as you go from a local filesystem to blob storage like AWS S3 or to the database served somewhere else. We’ll go through it in MySQL as an example, but the principles are the same in any database.

Create a directory, perhaps /images, and make sure you are backing it up regularly. Then in your database, we’ll create a table images

CREATE TABLE images (
    id INTEGER AUTOINCREMENT PRIMARY KEY,
    path TEXT NOT NULL
);

But what do we provide for path? If two users upload difference image files named my_image.png, and we write one to /images/my_image.png, then overwrite it with the other, now both users see one user’s value. This is nearly always the worst possible thing to do, and may have consequences ranging from contract violations to law suits and even jail time depending on the data.

We could do something like /images/<username>/my_image.png and isolate each user’s data in a subdirectory. This helps, but:

You now have a more complicated way of referring to the image (user plus filename).
If, as is common, you introduce the notion of groups and ways of sharing data between users in the future, this naming scheme is likely to get complex in all kinds of crazy ways.
If there is a problem where names of files get messed up, matching them back to database records is nearly impossible.

There’s a nice trick to get around this. We can calculate a hash of the image, and use that as both the filename of the image and the key of the image in the database. Which has we use isn’t very important. We don’t need a cryptographically secure hash, just one that is fast and produced a big enough key to make collisions negligible. For our examples we’ll use SHA1. It’s a cryptographic hash, which is unnecessary, but it’s ubiquitous, including being built into many databases. For example, if we are going to insert the following image (ubiquitous in computer vision):

We calculate its SHA1 hash CAD34412CC1B88823ECDB68B8CC18A5F54A5208B and put it in a table for images of the form:

CREATE TABLE images (
    id BINARY(20) PRIMARY KEY
);

We insert an image into the database with something like this Python code:

import mysql.connector

db = mysql.connector.connct(...)
cursor = db.cursor()

with open('lena.png') as f:
    imageHash = 'CAD34412CC1B88823ECDB68B8CC18A5F54A5208B'
    shutil.copyfile('lena.png', f'/images/{imageHash}')
    cursor.execute(
        "INSERT INTO images(id) VALUES (UNHEX(%s))",
	(imageHash,)
    )
    db.commit()

cursor.close()
db.close()

Note that we call UNHEX to turn our hex-encoded hash into much more compact binary. When we want to get the filename, we call HEX on the id column.

If two users upload different images named lena.png, there will be no conflict. On the other hand, two users upload the same file, even under different names, it will map to the same file on disk. Even if the same image is uploaded a million times (a popular meme on social media?), there will only be one copy on disk.

When we use hashes as filenames like this, we have to deal with two other things:

A hash by itself is not very useful. We need some context that we might otherwise have gotten from the filename.
If two users upload the same image and one deletes it, we don’t want to delete the file. If they both do, we do want to delete it. This requires more coordination.

Let’s address context first. For example, if users are uploading profile pictures, we might directly add the hash as a column in the users table

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    profile_picture BINARY(20)
);

Anywhere the image is used, we would have a column giving the hash, and when we select a row to use, we get the image hash which lets us refer to the file on disk.

Next, we have the problem of deletion. If users upload the same image a hundred times, and seventy five of them delete it, we cannot delete it. As soon as the last one deletes it, we probably need to delete it, both to save storage space and because we may be required legally to do so depending on what the data is and what legal regime we’re operating under.

There are two approaches we can use:

Run a select statement on a regular schedule to find all images that are no longer used, and delete them from the file system.
Track the number of instances of an image being used and adjust it every time there is an upload or deletion of it (what is called reference counting).

Recurring job approach

We are going to want a way to track all the images we use, across all the various tables. For example, say we have the users table above, and another of messages with optional image attachments. We want to be able to easily select all images in the system and compare them to those that are still referenced by users or messages.

CREATE TABLE images(
    id BINARY(20) PRIMARY KEY
);

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    profile_picture BINARY(20) REFERENCES images(id)
);

CREATE TABLE messages (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    posted_by INTEGER NOT NULL REFERENCES users(id),
    content TEXT,
    attachment BINARY(20) REFERENCES images(id)
);

Then we can select all rows in the images table that have no corresponding row in the users or messages tables. We do so by construction a table where we will have a row for every row in the images table, and columns for that image’s appearance in the users and messages tables. For a particular image, if there is no corresponding row in images or messages that refers to that image, those columns will be null. We filter for only the images where both joined columns are null to find the ones we need to delete.

SELECT images.id FROM images
LEFT JOIN users on users.profile_picture = images.id
LEFT JOIN messages on messages.attachment = images.id
WHERE users.id IS NULL AND messages.attachment IS NULL;

This gives us a list of image files we then go delete from our filesystem. As we add more tables that refer to images, we must remember to update this query.

Reference counting approach

The other approach is to keep a reference count. We add another column to our images table:

CREATE TABLE images (
    id BINARY(20) PRIMARY KEY,
    count INTEGER NOT NULL DEFAULT 1
);

When we insert into another table with an image, we either insert the image and set count to 1 if the image does not yet exist, or increment the count.

BEGIN;

INSERT INTO images(id, count)
VALUES (UNHEX('CAD34412CC1B88823ECDB68B8CC18A5F54A5208B'), 1)
ON DUPLICATE KEY UPDATE count = count + 1;

INSERT INTO messages (posted_by, content, attachment)
VALUES (24, 'Look at this!',
  UNHEX('CAD34412CC1B88823ECDB68B8CC18A5F54A5208B'));

-- Return the new id created by the autoincrement column
-- on messages.
SELECT LAST_INSERT_ID();

COMMIT;

When we delete a row, we decrement the count, and delete the image if its count as reached zero. For example, deleting the messages with id 5226.

BEGIN;

-- Get the image id from the message so we don't
-- have to keep inserting subqeries. We add FOR UPDATE
-- to go ahead and lock the row and avoid deadlocks.
SELECT attachment FROM messages
WHERE id=5226 INTO @imageid FOR UPDATE;

UPDATE images SET count=count-1 WHERE id=@imageid;

DELETE FROM images WHERE id=@imageid AND count=0;

DELETE FROM messages WHERE id=5226;

-- Return whether we deleted the image, so we
-- know whether to go delete the image on the
-- filesystem.
SELECT count(id)=0 AS deleted
FROM images WHERE id=@imageid;

COMMIT;

If the deleted column in the last query is true, we delete the file on our file storage.

If we update a row to replace the image, we need to decrement the count of the old one and increment it for the new one.

BEGIN;

SELECT profile_picture FROM users
WHERE id=26 INTO @imageid FOR UPDATE;

UPDATE images SET count=count-1 WHERE id=@imageid;

INSERT INTO images(id, count) VALUES (UNHEX('...'), 1)
ON DUPLICATE KEY UPDATE count=count+1;

-- This must happen after both the last two
-- queries are run, since they will cancel out
-- if the image hash has not changed. If we
-- run it before the insert, we will erroneously
-- delete images.
DELETE FROM images WHERE id=@imageid AND count=0;

UPDATE users SET profile_picture=UNHEX('...') WHERE id=26;

-- Return whether we deleted the old image, so we
-- know whether to go delete the image on the
-- filesystem. Also return the id of the old
-- image so we know which one to delete.
SELECT count(id)=0 AS deleted, @imageid FROM images WHERE id=@imageid;
COMMIT;

and again we delete the file if the deleted column is true.

This adds complexity, but means we won’t retain images at all after their deletion is requested. Every time we add an image column to a table, we must make sure we only insert, update, and delete with the reference counting in place.

Summary

Create a directory on your file system or a bucket in your blob storage to hold the images or other documents.
Calculate a hash of each file and use that as the filename.
Use the hash as key to refer to images in your database.
Use a separate table for images with foreign key references into it from tables where they are used.
Either run a recurring job to delete images or implement reference counting for them.

Putting images in inline bytestrings

If the result of the flow chart is that you’re going to write your images or other blobs inline, then the process shifts slightly. Instead of just storing the hash, we’ll store the bytes of the image file as well. We’re demonstrating in MySQL, but for whichever database you use, read the documentation around its binary data types in detail.

We will use the MEDIUMBLOB type which can hold up to 16MB. Once again we will use a hash as our primary key, but now we can automatically calculate it in MySQL. We use MySQL’s built-in SHA1 hash function. It returns a string of hex digits, which we use UNHEX to turn back into binary.

CREATE TABLE images (
    id BINARY(20) DEFAULT (UNHEX(SHA1(contents))) PRIMARY KEY,
    contents MEDIUMBLOB NOT NULL
);

Then, using Python for our example, we can insert the same image as above with this variation on the code:

import mysql.connector

db = mysql.connector.connect(...)
cursor = db.cursor()

with open('lena.png') as f:
    hex_string = ''.join(f'{c:02x}' for c in f.read())
    cursor.execute(
        "INSERT INTO images(contents) VALUES (UNHEX(%s))",
	(hex_string,)
    )
    db.commit()

cursor.close()
db.close()

When we query the table, we find

mysql> select hex(id) as id,
     >        hex(contents) as contents
     > from images;
+------------------------------------------+--------------------------+
| id                                       | contents                 |
+------------------------------------------+--------------------------+
| CAD34412CC1B88823ECDB68B8CC18A5F54A5208B | 89504E470D0A1A0A00000... |
+------------------------------------------+--------------------------+
1 row in set (0.00 sec)

We use these images by referring to them from other tables via foreign keys, for example a user table (identical to that in the section on storing images outside the database):

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    profile_picture BINARY(20) REFERENCES images(id)
);

If we only have this one kind of small image, we may not care about saving the storage we get from a separate images table, and we can simplify things by moving the binary blob directly into the users table and skipping the hash entirely.

CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    profile_picture MEDIUMBLOB
);

With this we don’t have to worry about reference counting or recurring cleanup jobs, which removes many possible points of error in the code.

Aside from this, handling inline bytestrings works the same as external files.

« Back to Programming | Home