-
Notifications
You must be signed in to change notification settings - Fork 1
Linux Architecture
Operating systems are a core part of software. They act as an intermediary layer between hardware and applications ensuring portability across different hardware components and system integrity.
In this chapter we discuss the common software stack of modern computing systems, focusing on Linux and the interfaces provided to the user and the developer.
After this chapter you will get a better understanding of the need and the roles of operating systems. Focus is on the user/developer interface with the aim of become a more proficient user of the CLI shell. You will understand the roles of layers in the software stack and become familiar with the use cases for each programming interface (i.e. API). Furthermore you will gain system investigation skills such as hardware inspection, process investigation and resource consumption; these are important in diagnosing problems and identifying bottlenecks.
A computing system consists of hardware and software. It takes data as input, processes it, and submits it to an output channel. Users use applications to benefit from the computing system. Users must be provided an easy-to-use interface, as suitable as possible for their interests: media, gaming, net surfing, reading. From the perspective of users, applications must hide the software and hardware behind it. The rest of the software and hardware take care of this.
In its simplest form, the hardware designer will provide users with everything they needs: a device (physical component) that encodes the functionality desired by the user. The hardware interface activates the physical components needed to serve the interests of users. Such a hardware device only needs the processing circuits and the input / output hardware interface for user interaction.
This form has a major disadvantage: lack of flexibility. It's a machine that does only one thing. It cannot be used for anything else. It is purely hardware / physics.
So it makes sense to use a hardware build that we can use for a larger number of scenarios. That is, to have different instructions that the hardware can execute; that is, programs; that is, software.
The software is removable from the hardware. It can be loaded or unloaded to provide its functionality to the user. The software is loaded into memory and then runs on the processor. Memory is often loaded from an external storage device, such as a disk.
An important feature of the software is reusability. A software component can be reused as part of a larger software component. It can be distributed and then linked to other software components for wider functionality.
We call the application a software component that can be used by the user: loaded into the hardware and started, often providing the means of interaction with the user.
It is the equivalent of a main()
function that is run when the application is loaded.
We call a library a software component that is not used directly by the user but only in conjunction with other software components.
A library is a collection of software components used to link to applications.
A library does not have the equivalent of a main()
function.
In the simplest form, applications will be loaded in memory and executed. This raises at least three important topics:
- Who is in charge of loading applications?
- How can multiple applications be run? How are hardware resources going to be split and protected among different applications?
- How can an application run on multiple types of hardware (CPU architectures, network interface cards, disks)?
The answer to these questions in the operating system.
The operating system is the software component that mediates applications' access to hardware. Applications use the operating system as a reusable software component to not implement all hardware access software functionalities.
In this way the operating system ensures the portability of applications (hardware abstraction) on different types of hardware. Applications will implement generic, hardware-independent software components. Hardware-dependent software components (such as drivers) are implemented by the operating system.
The core of the operating system is called the kernel. Linux is an operating system kernel. It's the critical component loaded at boot time that's then in charge with providing the required software support to run applications.
The shell is the primary application in an operating system. It can be graphical or text. It's used by users to manage other applications: starting, stopping, configuring and using them.
The SHELL
environment variable points to the current shell.
The process ID is stored in the $
special variable accessed by the $$
construct.
The current terminal is shown by the tty
command.
Opening a new shell tab creates a new terminal and a new shell running in that.
Tab
is used for command completion. Tab-Tab
is used for multiple options.
We use reverse search to look for previous commands: press Ctrl-r
then insert text part of a previous command.
Use Alt+.
for the last argument of the last command.
Readline is a library for command line editing and moving in the command.
Terminal multiplexers are used to easily create multiple shell sessions, sharing screens and for keep alive remote shell sessions.
tmux
, screen
and byobo
are primary examples.
We show use of terminal multiplexers for local sessions and remote sessions.
tmux
is configured in ~/.tmux.conf
.
Common tmux
commands:
-
tmux ls
(in shell, outsidetmux
session): list sessions -
tmux new-session -s <name>
(in shell, outsidetmux
session): create a new session -
tmux attach -t <name>
(in shell, outsidetmux
session): attack to session -
Ctrl+b c
(intmux
session): open a new pane in tmux session -
Ctrb+b <index>
(intmux
session): switch to pane with index -
Ctrl+b d
(intmux
session): dettach from session -
Ctrl+b ,
(intmux
session): rename current pane / window -
Ctrl+b $
(intmux
session): rename session -
Ctrl+d
(intmux
session): close the pane (session is closed in case of last pane) -
Ctrl+b [
(intmux session
): enter scroll mode; use arrow keys or any other keys for scrolling -
Ctrl+c
(while in scroll mode): exit scroll mode -
Ctrl+v
/Alt+v
(while in scroll mode): page up, page down -
<Esc>:q!
(in Vim session): close Vim session
The core of the operating system is the kernel.
The Linux kernel exposes a configuration and inspection interface in the procfs
filesystem in Linux, used by process, I/O and memory inspection commands: ps
, free
, top
, lsof
.
The kernel is configured in /proc/sys
or by using the sysctl
command.
There are commands for inspecting the operating system: uname
, cat /etc/issue
, lsb_release -a
, getconf -a
.
Commands for inspecting hardware and hardware use: lshw
, lscpu
, lspci
, lsusb
, lsblk
, free
, uptime
, inxi
.
The specific implementation of the operating system should not concern applications. That's why the operating system exposes a software interface to applications. This interface is generic and hides the implementation features that may depend on the hardware. This interface is usually called the system call interface: system call API. It describes the appropriate functions / methods and arguments that the operating system exposes to applications. Such a function is called a system call.
In this way, the operating system takes the form of a library, a reusable software component with an interface exposed to applications (system call interface). The system call interface is used by applications to interact with the hardware. The operating system mediates access to hardware and ensures the portability of applications for different types of hardware.
However, the functions exposed by the system call interface are not only for the interaction with the hardware. The operating system provides software services to applications, software services that should otherwise be implemented differently by each application: memory allocation, file system, execution planning, network stack, frequently used data caching. For this reason, system calls are also called system services because they are functions that require services from the operating system.
The downside is that the system call API is tied to the operating system. The operating system provides portability across different types of hardware. But what if we want to provide portability across different OSes? This is where the standard C library comes into play.
The standard C library is the core software library. All other applications and libraries depend on it. The standard C library provides the ANSI/ISO C API, a standard portable API available on all OSes (Windows, Linux). It has the advantage of portability but doesn't provide all features (i.e. threads, IPC, asynchronous I/O).
In Linux, the system API is closely connected to the POSIX API, a standard API on Unix-like (i.e. POSIX-compliant OSes, a subset of OSes). POSIX and ANSI C are implemented in the standard C library.
Tracing is used for inspection and for debugging: what happens when a process crashes or blocks. We can trace applications for calls in the software stack.
ltrace
is used to detect library calls.
strace
is used to detect system calls.
Executables may be static or dynamic. Static (or statically-linked) executables are build by having all the required bits part of their code. Dynamic (or dynamically-linked) executales rely on the dynamic linker/loader to load dynamic libraries (also called shared libraries) it uses. For dynamic executables dynamic libraries are referenced in the executable, but not part of it.
file
gives information on executables.
ldd
is used to list dynamic library dependencies for executables.
pmap
can be used to list memory sections of a run-time process (spawned from an executable).
CLI use, keyboard shortcuts
tmux
Inspecting applications: static analysis, dynamic analysis
We trace a running shell and see what happens when it runs a command.
We show multiple fopen()
calls and what system calls they make by tracing them using strace
and ltrace
.
We use 01-linux-architecture/fopen-perm/
.
We show the correspondence between POSIX API and system API and cases where they don't match: creating a process and a thread.
We use 01-linux-architecture/posix-vs-syscall
.
Both fork()
and pthread_create()
POSIX calls use the clone()
system call.
But different arguments to the system call gives different outcomes.
Use different commands to discover information about the current system: OS version, hardware / CPU, uptime, logged in, users:
lsb_release -a
cat /etc/os-release
cat /etc/issue
uname -a
hostname
arch
lscpu
uptime
who
ps -ef
Use the command below to look for commands showing system information:
apropos system | grep -i info
What does the command inxi
do?
Run the command with -v 1
, then with -v 2
, then with -v 3
arguments.
What does the command lslogins
do?
Run tmux
and create three panes.
In one pane run vim, in another one go to the /etc/
folder and in another start an HTTP server using the command:
python -m SimpleHTTPServer
Detach from the tmux
session.
Create a new tmux
session named sleep
.
Create two panes, in one pane you start sleep 100
;
in another you list all processes using ps -ef
.
Detach from the tmux
session named sleep-<name>
, where <name>
is your name.
List all tmux
sessions;
attach and detach to each of them.
Enter the 01-linux-architecture/libc-syscall/
folder.
Build the caller-32
and caller
executables by running:
make
Investigate the executable with file
and ldd
.
Trace the library and system calls by running ltrace
and strace
on the executables.
Explain the similarities and difference between the 32-bit and the 64-bit executables.
For this task you require the nasm
package.
Install it using:
sudo apt install nasm
Enter the 01-linux-architecture/syscall-libc-call-asm/
folder.
Build the syscall-libc-call
and syscall-libc-call-static
executables by running:
make
Investigate the executables with file
, ldd
and nm
.
Trace the library and system calls by running ltrace
and strace
on the executables.
Explain the similarities and difference between the dynamic and static executables.
Enter the 01-linux-architecture/libc-syscall/
folder.
Compile a statically-linked and a dynamically-linked executable:
gcc -static -o caller-static caller.c
gcc -o caller caller.c
Investigate the executables with file
, ldd
and nm
.
Trace the library and system calls by running ltrace
and strace
on the executables.
Add a sleep(100)
call at the end of the C program.
Rebuild the statically-linked and dynamically-linked executables.
Run each program on a terminal and, on another terminal, run
pmap $(pgrep -f caller)
See the differences between the address spaces of the two processes.
Enter the 01-linux-architecture/syscall-nolibc-asm/
folder.
Build the syscall-nolibc
executable by running:
make
This is the simplest executable, with no reliance on the standard C library (libc).
Investigate the executable with file
, nm
, ldd
, ltrace
, strace
.
Explain the outputs.
Enter the 01-linux-architecture/printf-puts/
folder.
Build the printf-puts
executable by running:
make
Investigate the executable with ltrace
and strace
.
What are the library calls made by printf()
and puts()
?
What are the system calls made by printf()
and puts()
?
What is the difference between printf()
and puts()
?
Enter the 01-linux-architecture/syscall-trace/
folder.
See the syscall-trace.c
file.
Make viable calls of the functions sleep()
, system()
, strdup()
and strcpy()
.
You can check the manual page for each function or use the Internet / Google.
After that, build the syscall-trace
executable by running:
make
Use ltrace
and strace
to detect the library and syscall calls made by each function call.
Enter the 01-linux-architecture/page-size/
folder.
See the page-size.c
file.
Make the appropriate calls for getconf()
and getpagesize()
to find out the system page size.
You can check the manual page for getconf
/ getpagesize
or use the Internet / Google.
After that, build the page-size
executable by running:
make
Use ltrace
and strace
to detect the library and syscall calls made.
Enter the 01-linux-architecture/ansi-to-posix/
folder.
See the ansi-to-posix.c
file.
Build the ansi-to-posix.c
executable by running:
make
Use ltrace
and strace
to see the library and system calls made.
Comment out the current contents of the main function and replace them with their POSIX equivalent.
That is, replace fopen()
with open()
, fwrite()
with write()
and so on.
Enter the 01-linux-architecture/posix-to-syscall/
folder.
See the posix-to-syscall.c
file.
Build the posix-to-syscall.c
executable by running:
make
Use ltrace
and strace
to see the library and system calls made.
This sequence in the source code file:
puts("POSIX");
write(STDOUT_FILENO, "POSIX\n", 6);
syscall(SYS_write, STDOUT_FILENO, "POSIX\n", 6);
shows the equivalent of doing an ANSI library call, a POSIX call and a pure system call.
Comment out the other sequence of POSIX calls:
fd = open("a.txt", O_CREAT | O_WRONLY, 0644);
write(fd, "hello\n", 6);
close(fd);
and replace the POSIX calls with equivalent pure system calls.