Diagnosing High CPU Usage on Dedicated Server

By Raman Kumar

Updated on May 02, 2025

In this tutorial, we're diagnosing high CPU usage on dedicated server.

High CPU usage on a dedicated server can degrade performance, disrupt services, and ultimately impact user experience. By following a structured troubleshooting process, we can pinpoint the root causes and apply targeted remedies. In this guide, we’ll walk through each diagnostic step in detail, drawing on industry-proven tools and techniques.

Prerequisites

  • Any Linux disto installed dedicated server.
  • A root user or normal user with administrative privileges.
  • Basic knowledge of Linux commands.

Diagnosing High CPU Usage on Dedicated Server

1. Gather Baseline Metrics with System Monitoring Tools

Before diving into specific processes, it’s essential to establish how our server normally behaves.

Install and launch htop or top:

sudo apt-get install htop   # Debian/Ubuntu  
sudo yum install htop       # RHEL/CentOS  

Execute htop command:

htop

– Observe overall CPU utilization, load averages, and per-core usage.
– Note any spikes or consistently high load values (e.g., load average > number of CPU cores).

Leverage sar for historical data:

sudo apt-get install sysstat 
sar -u 1 10 

– Determines whether CPU usage is sustained or intermittent.
– Helps correlate usage patterns with scheduled tasks or user activity.

2. Identify Top CPU-Consuming Processes

Once abnormal CPU consumption is apparent, isolating the processes responsible is critical.

Use htop or top:

– Sort by CPU% (press P in top).
– Note process IDs (PIDs) and commands.

Drill down with ps:

ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head -n 10

– Provides a snapshot of the top 10 CPU-intensive processes.
– Confirms whether a rogue application, background job, or system daemon is the culprit.

Inspect threads within a process (for Java, Python, etc.):

top -H -p <PID>

– Reveals which thread consumes the most CPU, guiding us toward problematic code paths.

3. Examine Load Average vs. CPU Cores

Load average represents processes waiting for CPU or I/O. By comparing load averages to core count, we gauge saturation.

If load average consistently exceeds the number of CPU cores, multiple processes are queuing for CPU time.

Investigate I/O wait (wa column in top) to determine if disk latency contributes to high load.

4. Analyze System and Application Logs

Logs often hold clues to runaway processes or recurring errors:

System logs (/var/log/syslog or /var/log/messages):

sudo grep -i error /var/log/syslog | tail -n 50

– Searches for kernel or service errors that coincide with CPU spikes.

Application-specific logs:

– Web servers (e.g., /var/log/nginx/error.log).
– Databases (e.g., PostgreSQL’s pg_log).
– Scheduled jobs (e.g., cron logs in /var/log/cron).

Correlate timestamps between log events and observed CPU surges to isolate offending components.

5. Investigate Kernel Activity and Interrupts

In some cases, hardware interrupts or kernel threads can monopolize CPU:

Check interrupt counts:

cat /proc/interrupts

– High interrupt rates on a specific CPU line may indicate misbehaving hardware or drivers.

Use mpstat for per-CPU breakdowns:

vmstat 1 10
mpstat -P ALL 1 5

– Shows user/kernel/I/O wait time per core.
– A kernel-mode (system) spike suggests deep system-level activity.

6. Profile with Advanced Tools

For complex or persistent issues, advanced profilers can reveal inefficiencies:

perf for kernel and user-land profiling:

sudo perf top

– Live view of CPU cycles by function, aiding in pinpointing hot spots.

Kernel lock contention or runaway threads:

perf top        # sampling profiler
perf record -a -- sleep 30
perf report

NUMA imbalance (multi-socket boards):

numactl --hardware

Uneven distribution can produce apparent CPU pressure.

strace for syscall analysis:

sudo strace -p <PID> -c

– Summarizes system calls and time spent, highlighting I/O or network bottlenecks.

Application profilers:

– Java: VisualVM or YourKit.
– Python: cProfile or Py-Spy.

7. Apply Targeted Remediations

Once we've diagnosed the root cause, we can take steps to mitigate the high CPU usage. The specific actions will depend on our findings.

Optimize Code or Queries: If inefficient code or database queries are the issue, we'll need to refactor the code or optimize the queries. This might involve rewriting sections of code, adding indexes to database tables, or restructuring database queries.

Scale Resources: If the high CPU usage is due to legitimate high traffic or workload, we might need to consider scaling our server resources (e.g., upgrading to a server with more CPU cores).

Address Security Issues: If we suspect a malicious attack, we need to take immediate security measures, such as blocking malicious IP addresses, patching vulnerabilities, or cleaning up malware.

Tune Application or Database Configurations: Sometimes, tweaking the configuration settings of our web server, database, or application can improve performance and reduce CPU usage. This might involve adjusting caching settings, memory allocation, or the number of worker processes.

Kill or Restart Problematic Processes: As a temporary measure, we might need to kill or restart a runaway process. However, we should always investigate the underlying cause to prevent it from happening again.

Update Software and Drivers: Keeping our operating system, software, and drivers up to date ensures we have the latest performance improvements and security patches.

8. Establish Proactive Monitoring and Alerts

To prevent future surprises, integrate continuous monitoring:

Prometheus + Grafana: Collect CPU, memory, and application metrics; visualize trends.

Alertmanager or PagerDuty: Trigger notifications when CPU utilization exceeds, for example, 80% for more than five minutes.

Automated remediation scripts: Restart stuck services or throttle processes upon threshold breaches.

By systematically applying these steps—establishing baselines, isolating processes, inspecting logs, profiling, and remediating—we can diagnose high CPU usage with precision. Continuous monitoring and tuning complete the cycle, helping us maintain optimal performance and reliability on our dedicated servers.

Check out robust instant dedicated serversInstant KVM VPS, premium shared hosting and data center services in New Zealand