Understanding Executables: Demystifying Binary Files
Have you ever wondered what happens behind the scenes when you compile and run a program? Do you think executables are impenetrable and impossible to understand? Well, think again! In this blog post, we will dive deep into the world of executable file formats and explore how to investigate them. We will specifically focus on ELF binaries, commonly used in Linux systems. So, let’s get started!
Symbols, Sections, and Segments: Unveiling the Secrets of Executables
When we compile a program, we often end up with a binary file that seems unreadable. However, with the right tools and knowledge, we can uncover the hidden information within. Let’s take a simple C program called “hello.c” as an example. After compiling it using the command “gcc -o hello hello.c”, we obtain a binary file named “hello”.
If we try to view the contents of the binary using a command like “cat hello”, we might see some text, such as “Penguin!” and “ELF”. This indicates that the binary follows the ELF format. However, most of the symbols appear as unprintable characters, making it difficult to understand the binary’s structure.
To explore the binary further, we can use a tool called “readelf”. By running the command “readelf –symbols hello”, we can obtain information about the symbols present in the binary. In our case, we see three symbols: “main”, which represents the address of the main() function, “puts”, which is a reference to the printf function, and “_start”, which plays a vital role in the program’s execution.
But what exactly are symbols? When we write a program, each function we define, like “hello”, is labeled with a symbol. These symbols allow the program to link with other libraries and locate the corresponding code. For example, if we call a function like printf, the binary needs to find the code for that function in the C standard library (libc).
To find symbols in libc, we can use tools like “nm” or “objdump”. In this case, running “objdump -tT /lib/x86_64-linux-gnu/libc-2.15.so” gives us a list of symbols present in the library. This helps us understand how dynamic linking works, as we can see the functions available in libc and their locations.
Instead of opening the binary in a text editor, which provides limited information, we can use “objdump” to get a better understanding. It displays the bytes in the file as hexadecimal values on the left and their ASCII translations on the right. This allows us to analyze the binary more effectively.
Within the binary, we can identify various sections that hold different types of data. Some important sections include .text (executable code), .rodata (read-only data), and .data (read-write data). These sections play a crucial role in program execution and memory allocation.
To gain more detailed information about the sections, we can use “readelf –sections hello”. This command provides metadata about each section, such as their permissions and attributes. For example, we can see that .text is executable and read-only, while .data is read-write.
If we want to explore the assembly code within the binary, we can use a disassembler. Although understanding assembly requires advanced knowledge, we can still get a glimpse of the code’s structure. Running “objdump -d hello” displays the assembly instructions in a human-readable format.
Finally, we come to segments or program headers, which determine how different parts of the program are organized in memory. Segments are used during program execution, while sections are used at link time. By running “readelf –segments hello”, we can examine the segments present in the binary and their respective permissions.
Unveiling the Magic of Executables
Executables may seem like magical black boxes, but they are just files following a specific format. By using tools like readelf, nm, and objdump, we can unravel the secrets hidden within these binary files. So, why not give it a try? Explore your own Linux binaries and have fun dissecting them!
Special thanks to Allison Kaptur and Dan Luu for their valuable feedback and insights while reviewing this blog post.
How is a binary executable organized? Let’s explore it