Linux System Resource Monitoring Command

  linux

View the system release

root@cf0c6032ba2f:/# lsb_release -a
No LSB modules are available.
Distributor ID:    Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:    14.04
Codename:    trusty

top(cpu)

The Cpu(s) line provides information on the current CPU operation:

Cpu(s): 11.4%us, 29.6%sy, 0.0%ni, 58.3%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st

  • Us: User CPU Time Percentage of CPU time spent running non-elegant user processes (elegant, English “nicing” refers to a process that allows you to change priorities according to other processes).

  • Sy: System CPU Time Percentage of CPU Time Running Kernel and Kernel Processes.

  • Ni: Elegant CPU Time If you change the priority of some processes, this indicator can tell you their percentage of CPU time.

  • ID: CPU Idle Time This is one of the metrics that you want to have a very high value. It represents the idle time ratio of CPU. If the system is runn ing slowly, but this indicator is particularly high, then you can determine that the cause of the problem is not high CPU load.

  • WA: I/O Wait This number represents the percentage of CPU time spent waiting to perform I/ O operations. This is a very valuable metric when you solve slow-running system problems, because if this value is very low, you can easily eliminate disk or network I/O problems.

  • Hi: Hardware Interrupt The percentage of time the CPU spends processing hardware interrupts.

  • Si: percentage of time spent by the software interrupt CPU processing software interrupts.

  • St: Elapsed Time If you are running a virtual machine, this metric will tell you the percentage of CPU time that other tasks performed in the virtual machine occupy.

See cpu count

The basis for judging the CPU status of Linux server is as follows:

  • Cpus with the same core id are hyper-threading of the same coreid.

  • Cpus with the same physical id are threads or cores encapsulated by the same CPU.

  • The command to display the number of physical CPUs is as follows:

cat /proc/cpuinfo | grep "physical id" | sort | uniq | wc -l 
  • The command to display the number of cores (i.e., cores) in each physical CPU is as follows:

cat /proc/cpuinfo | grep "cpu cores" | uniq 

The command results are shown as follows: cpu cores : 1

  • The command to display the number of logical CPUs is as follows:

cat /proc/cpuinfo | grep "processor" | wc -l 

The command results are shown as follows: 4
In fact, everyone can see from here that, in theory, there should be the following equation: the number of physical CPUs x the number of cores = the number of logical CPUs. if not equal, then your server CPU supports hyper-threading technology. When we configure the application of the server, the number of logical CPU of the server shall prevail.

uptime(Average load)

Sometimes we feel that the response speed of the system is very slow, but we can’t find the reason, so we have to check the average load to see if it has a large number of processes waiting in line. The average number of processes in the running queue in a specific time interval can reflect the busy degree of the system, so we usually check the load of the system, that is, the average load of CPU, at the first time when our website or system slows down. How should we look at the average load? The simplest command is uptime, as follows:

uptime 

The command displays the following results:

11:31:11 up 11 days, 19:01, 2 users, load average: 0.02, 0.01, 0.00 

At present, the mainstream servers are dual quad-core and have quite powerful CPU. When providing general application services, there is no need to worry about the load of Linux system.

Attention should be paid to the output value of load average. Generally, the size of these three values cannot be greater than the number of logical CPUs in the system. For example, the system has four logical CPUs in this output. If the three values of load average are greater than 4 for a long time, it indicates that the CPU is very busy and the load is very high, which may affect the system performance. However, if the value is greater than 4 occasionally, there is no need to worry, and the system performance will not be affected generally. On the contrary, if the output value of load average is less than the number of CPUs, it means that the CPUs are still idle. For example, the output in this example is relatively idle.

At this time, we can judge whether our system is too busy with vmstat command. if we are sure it is very busy, we should consider whether to replace the server or increase the number of CPU. The summary is as follows: if r is often greater than 3 or 4 and id i s often less than 50, then CPU load is very heavy.

top(mem)

Mem: 1024176k total, 997408k used, 26768k free, 85520k buffers 
Swap: 1004052k total, 4360k used, 999692k free, 286040k cached 

Line 1 tells us how much physical memory is available, how much memory is occupied, how much memory is free, and how much memory is cached.
The second behavior provides similar information, exchange storage and how much RAM is used by Linux file cache.

To find out how much RAM the process actually uses, you must remove the file cache in RAM. As you can see from the sample code, of the 997408KB of RAM used, 286040KB is occupied by the file cache, so this means that only 711368KB of RAM is actually used. A good way to tell if RAM is exhausted is to look at the file cache.

If the actual memory minus the file cache value is very large and the swap storage value is also very high, there is probably a memory problem.

free -m(Memory)

What is shown is the current memory usage. m means to display the contents in m bytes. this command is only valid under Linux system, but it is not available under FreeBSD. The command display results are as follows:

          total    used    free   shared  buffers   cached 
Mem:     3949    1397    2551     0    268    917 
-/+ buffers/cache:    211    3737 
Swap:    8001     0    8001 

Details of each parameter in the above results are as follows:

  • Total: total memory.

  • Used: The amount of memory that has been used.

  • Free: the amount of free memory.

  • Shared: Total amount of memory shared by multiple processes.

  • Buffers buffer cache and cached page cache: the size of the disk cache.

  • -buffers/cache: (used) amount of memory, i.e. used-buffers-cached.

  • +buffers/cache: (available) amount of memory, i.e. free+buffers+cached. From this conclusion, the calculation formula of available memory is available memory =free+buffers+cached, i.e. 2551MB+268MB+917MB=3737MB
    Note that the values on both sides of the above equation are not equal, but this does not matter. The -m parameter is actually chosen by integer values. If you have doubts about the result of this calculation, you can try to watch the result displayed by the free command without the -m parameter, so that you can see at a glance.

It can be seen that -buffers/cache reflects the memory actually occupied by the program, while +buffers/cache reflects the total amount of memory that can be misappropriated.

vmstat(io)

Vmstat is a quite comprehensive performance analysis tool. Through it, performance information such as system process status, memory usage, virtual memory usage, disk I/O, interrupts, context switching, CPU usage, etc. can be observed. It is recommended to master this command skillfully.

procs ———–memory———- —swap– —–io—- –system– —-cpu—-
r b  swpd  free  buff cache  si  so  bi  bo  in  cs us sy id wa
2 0   0 519024 74732 4606568  0  0   3   9  5  10 27 5 68 0 2 0   0 519664 74732 4606568  0  0   0   0 1847 1244 20 17 63 0 1 0   0 517296 74732 460656 8  0  0   0  284 2092 1617 37 17 47 0 3 0   0 515440 74732 4606568  0  0   0  164 1620  718 26 17 57 0

Among them:
(1)procs r: number of processes waiting to run. B: Number of processes in uninterrupted sleep state.
(2)memory swpd: Virtual Memory Usage (Unit: KB). Free: Free memory (KB). Buff: The amount of memory used as cache (in KB).
(3)swap si: number of swap pages swapped from disk to memory (KB/s). So: Number of swap pages swapped from memory to disk (KB/s).
(4)io bi: number of blocks sent to the block device (unit: block/second). Bo: Number of blocks received from the block device (unit: block/second).
(5)system in: number of interrupts per second, including clock interrupts. Cs: Number of environment (context) handovers per second. (6)cpu is displayed according to the total usage percentage of CPU. Us: cpu usage time. SY: CPU system usage time. Id: idle time.

Under standard conditions, the values of r and b should be: r<5, b≈0. If user%+sys%<70% means that the system performance is better, if user%+sys%>=85% or more, it means that the system performance is worse, then the system should be checked in all aspects. Where: user% represents the percentage of time the CPU is in user mode. Sys% represents the percentage of time the CPU is in system mode.

ps auxf(Process)

To view all processes that users are running in the system, you can use the following options after the ps command:
A (for all users)
U (display in user-oriented format, or display users who own each process)
X (there is no process controlling tty or terminal screen, another way to “show each process”)

ps aux 

Please note that “ps -aux” is different from “ps aux”. POSIX and UNIX standards require “ps -aux” to print all processes of users with user name “X” a nd all processes to be selected by the -a option. If the user name “x” does not exist, ps will be interpreted as “ps aux” and a warning will be printed. This behavior is to help transform old scripts and habits. It is fragile and is about to change, so it should not be relied on.

To view the process tree, in addition to using the A, U, and X options used in the previous section, add an F (whose name is derived from ASCII art forest) option.

ps auxf

ps -ef(Process)

Ps aux uses BSD format to display the results.
Ps -ef is in full format System V format, which shows the process name with full path.

One difference that affects usage is that aux truncates the command column while -ef does not. Therefore, when grep needs to be combined, the -ef command is preferred to avoid misjudgment.

netstat(Network)

The function of netstat command is to display information of network connections, routing tables and network interfaces, which can let users know which network connections are currently in operation. The following are its important parameters and detailed descriptions:
-A: Displays the address of any associated protocol control block. Mainly used for debugging.
-a: Displays the status of all sockets. Sockets associated with server processes are not normally displayed.
-i: Displays the status of the auto-configuration interface. The interface states configured after the initial boot of the system are not listed in the output.
-m: print network memory usage.
-n: Print the actual address instead of explaining the address or displaying symbols such as host computer and network name.
-r: print routing table.
-faddress: family prints statistics and control block information for address clusters that give names. So far, the only address cluster it supports is inet.
-I interface: indicates the interface status where only the given name is printed.
-p protocol-name: indicates to print only the statistics and protocol control block information of the protocol giving the name.
-s: Print statistics for each protocol.
-t: indicates that the queue length information is replaced by time information in the output display.

The two parameters that we use most and are most used to are netstat-an, as follows:

netstat -an | grep –v unix

lsof(File)

Lsof(list open files) is a tool that lists open files of the current system. In UNIX environment, everything exists in the form of files, through which not only regular data but also network connections and hardware can be accessed. Therefore, like Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) sockets, the system assigns a file descriptor to the application in the background. Regardless of the nature of the file, the file descriptor will provide a general interface for the interaction between the application and the underlying operating system. Because the descriptor list of an application’s open file provides a large amount of information about the application, viewing this list through the lsof tool is very helpful for system monitoring and troubleshooting. By the way, this tool first appeared in UNIX system and then migrated to Linux platform.

The -i parameter is used most in the work. It can be used to check the situation of a specific port. For example, I can use lsof -i:22 to check which programs occupy port 22.

fdisk -l(Hard disk partition)

Check the hard disk and partition information as follows: the fdisk–l command displays the following results:
Disk /dev/sda: 160.0 GB, 160040803840 bytes 255 heads, 63 sectors/track, 19457 cylinders Units = cylinders of 16065512 = 8225280 bytes   Device Boot   Start     End   Blocks  Id System /dev/sda11     13   104391  83 Linux / /dev/sda2       14    3200  25599577+ 83 Linux /dev/sda3      3201    3582   3068415  82 Linux swap / Solaris /dev/ sda4      3583    19457  127515937+  5 Extended /dev/sda5      3583    19457  127515906  83 Linux

The above results show that this is a 160GB server hard disk.

df(Hard disk space)

Check the disk space usage of the file system with the following command:
The df–h command displays the following results:
Filesystem      Size Used Avail Use% Mounted on /dev/sda2       24G 5.9G  17G 26% /
/dev/sda5       118G 8.8G 103G  8% /data
/dev/sda1       99M  20M  75M 21% /boot
tmpfs         859M   0 859M  0% /dev/shm

du(Directory size)

Check the size of a directory in Linux system, which is often encountered in work. You can use the following command to view:

du -sh 目录名 

For example, the display result of du -sh /data command is as follows: 8.6g/data/
Check whether there is a partition with a high usage rate (e.g. over 90%). If you find that a partition is running out of space, you can enter the mount point of the partition, Use% the following command to find the files or directories that occupy the most space, and then find the top ten files or directories that occupy the most space in the system in descending order:

du -sh * | sort -hr | head -n 10

doc