Linux Server File Delete Policy

  devops, linux

Disk space full

Since Linux has no recycle bin function, all files to be deleted on the online server will be moved to the system /tmp directory first, and then the data in the /tmp directory will be cleaned regularly. There is no problem with this policy itself, but the check shows that there is no separate partition of /tmp in the system partition of this server, so the data under /tmp actually takes up the space of the root partition. Since the problem has been found, it is sufficient to delete some data files that occupy a large space under the /tmp directory and check the three largest data files under /tmp.

du -sh /tmp/* | sort -nr | head -3 

Looking at the first three largest data files under /tmp, I found a 66GB file access_log in the /tmp directory through command output. this file should be an access log file generated by Apache. judging from the log size, it should be a long time since Apache log file was cleaned up. it is basically determined that the root space caused by this file is full. after confirming that this file can be deleted, perform the following deletion operation:

rm /tmp/access_log 

Then check whether the system root partition space has been released. From the output, we can see that the root partition space is still not released. What is this?

Deleting file space does not free up

Generally speaking, there will be no case where the space is not released after deleting a file, but there are exceptions, such as the file is locked by a process, or a process has been writing data to the file, etc. To understand this problem, it is necessary to know the storage mechanism and structure of the file under Linux.

The data and pointer part of the file

The storage of a file in the file system is divided into two parts: a data part and a pointer part. The pointer is located in meta-data of the file system. After the data is deleted, the pointer is cleared from meta-data, while the data part is stored in disk. After the pointer corresponding to the data is cleared from meta-data, the space occupied by the data part of the file can be overwritten and new contents can be written. the reason why the space has not been released after deleting the access_log file is that the httpd process is still writing contents to the file, resulting in the deletion of the access_log file, but the pointer part corresponding to the file has not been cleared from meta-data due to process locking, and the system kernel thinks that the file has not been deleted because the pointer has not been deleted.

Find the list of deleted files occupied by the application

Therefore, since the query space through df command has not been released, and since there is a way to solve the problem, then let’s see if any process has been writing data to the access_log file. Here, lsof command under Linux is needed. Through this command, a list of deleted files still occupied by the application program can be obtained:

lsof | grep delete

As can be seen from the output results, the /tmp/access_log file is locked by the process httpd, and the httpd process has been writing log data to this file. From column 7, it can be seen that this log file is about 70GB in size, while the total size of the system root partition is only 100GB. Therefore, this file is the main culprit causing the system root partition space to run out. The “deleted” status in the last column indicates that this log file has been deleted, but space has not been freed because the process is still writing data to this file.

Empty files correctly

There are many ways to solve this kind of problem. The simplest way is to shut down or restart the httpd process. Of course, you can restart the operating system, but these are not the best methods. The best way to free up the disk space occupied by the file is to empty the file online, which can be done by the following command:

[root@localhost ~]# echo " " >/tmp/access_log 

This method can not only release disk space immediately, but also ensure the process to continue writing logs to files. This method is often used to clean log files generated by Apache, Tomcat, Nginx and other Web services online.