Linux Systems for Platform Engineers
Master Linux fundamentals and internals for platform engineering interviews.
🎯 Interview Focus Areas​
Most Asked Topics​
- Process Management - How processes work, debugging, signals
- Memory Management - Virtual memory, OOM killer, troubleshooting
- File Systems - Permissions, inodes, performance
- Networking - TCP/IP stack, troubleshooting, iptables
- System Performance - Identifying bottlenecks, optimization
Essential Commands​
Process Management​
# View processes
ps aux # All processes with details
ps -ef --forest # Process tree
top -H -p <pid> # Show threads for process
htop # Interactive process viewer
# Process control
kill -l # List all signals
kill -TERM <pid> # Graceful termination (15)
kill -KILL <pid> # Force kill (9)
killall <name> # Kill by process name
pkill -f pattern # Kill by pattern
# Process debugging
strace -p <pid> # Trace system calls
strace -c command # Count system calls
ltrace command # Library call tracing
lsof -p <pid> # Files opened by process
pstack <pid> # Print stack trace
# Background jobs
command & # Run in background
jobs # List background jobs
fg %1 # Bring to foreground
bg %1 # Continue in background
nohup command & # Survive logout
disown %1 # Remove from job table
Memory Management​
# Memory information
free -h # Human-readable memory
vmstat 1 # Virtual memory stats
cat /proc/meminfo # Detailed memory info
smem -rs swap # Memory by swap usage
pmap -x <pid> # Process memory mapping
# Memory debugging
valgrind --leak-check=full ./program
dmesg | grep -i "out of memory"
cat /proc/<pid>/status | grep -i vm
echo 3 > /proc/sys/vm/drop_caches # Clear cache
File Systems & Storage​
# Disk usage
df -h # File system usage
df -i # Inode usage
du -sh /* # Directory sizes
du -sh * | sort -h # Sorted by size
ncdu / # Interactive disk usage
# File operations
find / -type f -size +100M 2>/dev/null # Large files
find / -type f -mtime +30 -delete # Delete old files
lsof +L1 # Deleted but open files
stat file.txt # File metadata
# File permissions
chmod 755 file # rwxr-xr-x
chmod u+s file # Set SUID
chmod g+s dir # Set SGID
chmod +t dir # Sticky bit
umask 022 # Default permissions
# I/O monitoring
iostat -xz 1 # Extended I/O stats
iotop -o # I/O by process
blktrace -d /dev/sda # Block layer tracing
Networking​
# Network configuration
ip addr show # Show IP addresses
ip route show # Routing table
ip link show # Network interfaces
ss -tulpn # Listening ports (modern)
netstat -tulpn # Listening ports (legacy)
# Connection debugging
ss -an | grep ESTAB # Established connections
lsof -i :80 # Process on port
tcpdump -i eth0 -nn port 80 # Packet capture
tcpdump -w capture.pcap -s0 # Save full packets
nc -zv host 80 # Port connectivity
# DNS troubleshooting
dig +trace google.com # DNS resolution path
nslookup google.com # Simple DNS lookup
host -t A google.com # Specific record type
cat /etc/resolv.conf # DNS servers
# Network performance
iperf3 -s # Server mode
iperf3 -c server # Client test
mtr google.com # Combined ping/traceroute
ping -c 4 -i 0.2 host # Fast ping
System Internals​
Boot Process​
1. BIOS/UEFI → POST
2. Bootloader (GRUB2) → Load kernel
3. Kernel initialization → Mount root filesystem
4. Init system (systemd) → Start services
5. Login prompt
Interview Question: "What happens when a Linux system boots?"
Process States​
# Process states in ps output
R - Running or runnable
S - Sleeping (interruptible)
D - Sleeping (uninterruptible, usually I/O)
T - Stopped (SIGSTOP or Ctrl+Z)
Z - Zombie (terminated but not reaped)
# Check process state
ps aux | awk '$8 ~ /^Z/ { print }' # Find zombies
Virtual Memory​
# Memory layout
cat /proc/<pid>/maps # Process memory map
pmap -x <pid> # Detailed memory usage
# Important files
/proc/sys/vm/swappiness # Swap tendency (0-100)
/proc/sys/vm/overcommit_memory # Memory overcommit
/proc/sys/vm/min_free_kbytes # Reserved memory
Performance Troubleshooting​
High CPU Usage​
# 1. Identify the culprit
top -b -n 1 | head -20
ps aux --sort=-%cpu | head
# 2. Analyze the process
strace -c -p <pid> # System call profile
perf top -p <pid> # CPU profiling
cat /proc/<pid>/status
# 3. Check for issues
dmesg | tail # Kernel messages
journalctl -u service-name # Service logs
Memory Issues​
# 1. Check memory pressure
free -h
vmstat 1 5
cat /proc/pressure/memory
# 2. Find memory hogs
ps aux --sort=-%mem | head
smem -rs swap -k
# 3. Investigate OOM
dmesg | grep -i "killed process"
journalctl -u systemd-oomd
Disk I/O Problems​
# 1. Check I/O wait
top # Look for %wa
iostat -xz 1 # Detailed I/O stats
# 2. Find I/O intensive processes
iotop -o
pidstat -d 1
# 3. Analyze specific disk
hdparm -tT /dev/sda # Disk speed test
smartctl -a /dev/sda # Disk health
Network Issues​
# 1. Check connectivity
ping -c 4 8.8.8.8 # Internet connectivity
traceroute google.com # Path to destination
mtr --report google.com # Detailed path analysis
# 2. Check DNS
time nslookup google.com # DNS response time
dig @8.8.8.8 google.com # Test specific DNS
# 3. Check ports/firewall
sudo iptables -L -n -v # Firewall rules
ss -tulpn # Listening ports
telnet host port # Test specific port
Common Interview Questions​
Q1: "How do you find what process is using a file?"​
lsof /path/to/file
fuser -v /path/to/file
Q2: "How do you debug a process that won't die?"​
# Check state
ps aux | grep <process>
# If in D state (uninterruptible sleep)
ls -l /proc/<pid>/fd/ # Check file descriptors
cat /proc/<pid>/stack # Kernel stack trace
# Force kill parent
pstree -p <pid> # Find parent
kill -9 <parent_pid>
Q3: "How do you find deleted but open files?"​
lsof +L1 # Files with 0 links
lsof | grep deleted
df -h # Compare with du
Q4: "System is slow, how do you investigate?"​
# 1. Check load and CPU
uptime
top
# 2. Check memory
free -h
vmstat 1
# 3. Check I/O
iostat -xz 1
iotop
# 4. Check network
ss -s
netstat -i
# 5. Check logs
dmesg | tail
journalctl -xe
Q5: "How do you optimize Linux performance?"​
# CPU
# - Use nice/renice for priority
# - CPU affinity with taskset
# - Disable unnecessary services
# Memory
echo 10 > /proc/sys/vm/swappiness # Reduce swapping
echo 3 > /proc/sys/vm/drop_caches # Clear cache
# Disk
# - Use noatime mount option
# - Adjust I/O scheduler
echo deadline > /sys/block/sda/queue/scheduler
# Network
# - Increase socket buffers
sysctl -w net.core.rmem_max=134217728
# - Enable TCP fast open
sysctl -w net.ipv4.tcp_fastopen=3
Advanced Topics​
Systemd​
# Service management
systemctl start/stop/restart service
systemctl enable/disable service
systemctl status service
systemctl daemon-reload
# Logs
journalctl -u service
journalctl -f # Follow
journalctl --since "1 hour ago"
journalctl -p err # Error priority
# Analyze boot
systemd-analyze
systemd-analyze blame # Boot time by service
systemd-analyze critical-chain
Security​
# SELinux (RHEL/CentOS)
getenforce
setenforce 0/1
ausearch -m avc # Audit logs
sealert -a /var/log/audit/audit.log
# AppArmor (Ubuntu)
aa-status
aa-complain /path/to/profile
aa-enforce /path/to/profile
# Capabilities
getcap /usr/bin/ping
setcap cap_net_raw+ep /usr/bin/ping
Kernel Parameters​
# View/modify kernel parameters
sysctl -a # All parameters
sysctl net.ipv4.ip_forward # Specific parameter
sysctl -w net.ipv4.ip_forward=1 # Temporary change
# Permanent changes in /etc/sysctl.conf
net.ipv4.ip_forward = 1
vm.swappiness = 10
# Apply changes
sysctl -p
Hands-On Practice​
Exercise 1: Process Investigation​
- Start a CPU-intensive process:
yes > /dev/null &
- Find its PID using different methods
- Check its CPU usage
- Trace its system calls
- Change its priority
- Kill it gracefully
Exercise 2: Memory Analysis​
- Write a program that leaks memory
- Monitor system memory while it runs
- Identify the leak using tools
- Find when OOM killer activates
- Analyze the dmesg output
Exercise 3: Performance Tuning​
- Create I/O load:
dd if=/dev/zero of=test bs=1M count=1000
- Monitor I/O performance
- Change I/O scheduler
- Compare performance
- Clean up test files
Quick Reference​
Emergency Commands​
# System hanging
Alt + SysRq + REISUB # Safe reboot
# Disk full
du -sh /* | sort -h # Find large directories
find / -size +1G # Find large files
lsof +L1 # Deleted open files
# High load
ps aux --sort=-%cpu | head
kill -STOP <pid> # Pause process
kill -CONT <pid> # Resume process
# Network down
ip link set eth0 up
systemctl restart NetworkManager
dhclient -r && dhclient
Resources for Deep Learning​
- 📚 The Linux Programming Interface - The definitive guide
- 📖 Linux Performance by Brendan Gregg
- 🎥 Linux System Programming
- 🎓 Linux Foundation Training
Pro tip: In interviews, always explain your troubleshooting methodology. Show how you systematically narrow down issues rather than just memorizing commands.