What is a file?

In our day and age, we take the concept of a “file” for granted. They’re just… there. We click them, and they open. We download them, we upload them, we move them around.

But do you actually know what’s inside a file? Beyond the data that delivers the action we expect when we use them, files are full of headers, data blocks, metadata, and other technical elements that make the file, well, a file!

You may not care that much about how files are designed or work until the file fails. Files can fail for a variety of reasons, often when their data becomes corrupted. Even small changes, such as altering the file’s internal structure or using the wrong file extension, can make files unusable.

Knowing how files work can help you better use them, no matter your needs. So let’s look a bit closer at how files are designed to operate (and why they can get messed up when we least expect them to!)

Ask someone what a file is, and they’ll probably tell you: a file is a file!

At its lowest level, a file is a sequence of bytes stored on a device. They also contain extra bits of information that a particular operating system keeps about it, such as its name and timestamp.

Bytes

The software that uses files relies on specific patterns within the bytes of a file to recognize what the file is and what to do with it once activated by the software.

The byte pattern usually contains a header as well as other structured sections that programs use to use the file properly.

File Headers and Magic Numbers

A file’s header is usually found at the beginning of a file. The header stores the metadata of the file, which can tell the software a lot about the file itself:

• The file’s format version
• The file’s dimensions and length
• Where later sections exist in the file

The header is usually very small, but still incredibly important for the file to operate properly. Hence the name “magic number.” If the header is missing or damaged, the software may misinterpret what the file is and fail to use it as expected.

Many file formats start with a short, identifying pattern of bytes known as a magic number or signature, which makes it easy for software to identify the file type. Think “Hello! I’m a JPEG!” or “Hi! I’m an MP3!”.

Many formats include a magic number as part of the header, though not all headers rely on magic numbers alone.

Data Blocks

A file’s data blocks (also referred to as a file’s payload) are located after the header. This is where the meat of the file lives. Think pixel files for images, letters, or word data for documents, and sound samples for audio files.

Altogether, you can think of a file structure as a book. The blocks make up the chapters, and they are often broken up into chunks that the software reads in order. This chunked layout helps apps and software use large files without having to read every single byte before acting.

What Is Metadata?

Files also have metadata. Think of metadata as “data about data”. It’s the extra information that describes your file so that systems can identify it and use it accurately.

It’s not the contents of a letter that arrives in the mail, but the address and postage that tell the post office where to send it.

There are two types of metadata:

• Embedded Metadata: Some metadata lives inside a file, storing information about the file. A great example is an image file containing data on the image’s dimensions, the camera model used, the lens setting, and the GPS location where the picture was taken.

• File System Metadata: There’s also metadata that lives outside the file, in an operating system’s file system. These are things like file names, creation and modification dates, and permissions for who can open or edit it. This makes it easier to search for specific files based on particular metadata.

How Do Files Get Corrupted?

You’ve likely experienced the frustration of trying to open a file, only to get an error message telling you that the file is “corrupted” or “unusable.”

But files usually don’t “go bad” over time; there’s usually a structural issue causing the problem. File corruption happens when the bytes in a file no longer match the format’s expectations.

There are various reasons for this, such as interrupted saves, failing file storage, malware, or even bugs in the software that used the file. Since the headers and structural metadata of a file are so important to how a file is interpreted, the smallest issue with the data can have a huge impact.

What Are Partially Readable Files?

File corruption doesn’t always mean that the file is completely unusable. Sometimes only pieces of a file are damaged or missing, and the software can still read the parts of the file that are still structurally intact.

If you’ve ever seen half of an image render or experienced skips in an audio file, you may have had a partially readable file on your hands.

Once You Know How A File Works, Your Work Gets Easier

Files are incredibly important, and once you know how they work, you’ll better appreciate how crucial they are to nearly every piece of technology we use in our daily lives.

As a reminder, here’s a quick way to remember a file:

Every file is a header (identity and structure), data blocks (content), and metadata (context).

Now you know why changing file types or trying to use the wrong file with the wrong software can be incredibly frustrating. And with this knowledge, you can start to learn how to fix or manage broken files, no matter what you use files for!

The most important thing about files? They’re a great reminder that some of the most important things come in small packages (and small mistakes can have huge consequences!)