Mongodb backup node data directory is inconsistent with primary node data size

  mongodb, question

Ask the great god to dispel doubts:
Two mongodb nodes, active and standby mode, check the data directory found under the two nodes:
The main node data directory is 23GB, and the directory details are as follows:

The backup node data directory is 11GB, with details as follows:

The amount of data in db.collection.stats () is consistent. what is the reason? Are there relevant resources available?

On the other hand, MongoDB has seen master/slave replication in history (in fact, it still exists today). Strictly speaking, the main equipment usually refers to that thing. We are basically using a replica set.

Besides, your situation is actually normal. The principle is the same as that your disk will have fragments when it is used for a long time. Especially if you have deleted data on a large scale. Simply explain, suppose your table has a total of 4 documents, doc1/doc2/doc3/doc4. the order of storage on disk is:
Now that you have deleted doc2, the space usage on the disk becomes:
Doc1| (blank) |doc3|doc4
There is no way for the system to free up this empty space unless you clean up the disk and move the empty space to the last place:
Doc1|doc3|doc4| (blank)
Then the system can truncate the space at the end of the file and free this space. It can be seen that moving the blank to the end of the file is a rather time-consuming and laborious operation, and the simplest way is to move all the subsequent documents forward in order to fill in the blank left by doc2 (doc3/doc4 is moved forward as shown above). However, this involves a large amount of disk I/O, which will have a serious impact on performance. Of course, there are many other methods for defragmenting the disk, but either one will cause serious I/O impact, so generally we will not carry out such defragmentation. The way to defragment is:Compact command. As mentioned earlier, this operation is generally only performed during maintenance time because it will have a serious impact on performance. Even if you don’t do this, the system will know which places are blank. When new documents come in, it will try to reuse these blank parts to maximize space utilization. However, no matter how good the algorithm is, space reuse cannot be 100%, because new documents can never be exactly the same size as previously deleted documents, so only a larger space can be found to use than new documents, thus leaving a smaller and more difficult to reuse fragment.
Another alternative is to delete the node content and synchronize again. Because synchronization is equivalent to grabbing all documents once and rewriting them to disk one by one, the documents are arranged compactly on disk after synchronization is completed, which is equivalent to defragmentation. Moreover, in this process, the slave node is affected, and it does not provide services to the outside in the synchronization process, so the impact on the line is minimal. However, note that it will also affect the master node, because it has to read all the data on the master node, and I/O rise of the master node is inevitable.

Finally, back to your question, why the slave node is smaller than the master node should have been explained clearly above.