How does mongodb load index data?

  mongodb, question


A test set one has been established. This set has 100 million documents and the document data is about 6.3G A total of three indexes have been established. As shown in the above figure, the three indexes are respectively 1G in size and the three indexes have a total of 3.2G in size.

When the query was first performed on this set, the query criteria did not use index fields, and 100 million full table scans were performed. As can be seen from the memory usage, the memory usage was continuously soaring, increasing by 6G or so.

After finishing the Mongo process, restart, query with the index field, instantly find out the target document, but did not see any change in the memory usage (a little change will increase the memory usage by about 100M when starting the mongodb process), but any index in the three indexes is 1G in size, has mongodb loaded the index data into the memory?

How exactly does mongodb use index data? If it loads it into memory, why is the memory usage basically unchanged? For the three indexes in the above figure, if this only query only uses the index query of the C field, does mongodb only load the data 1.1G of the index of the C field, or directly load all the data 3.2G of the three indexes?

In fact, this is mostly a matter of operating system principles. When reading files, the operating system will put the contents of the files into free memory, so that the next time a program attempts to read the same contents of the files, it can directly give them from memory without reading the disk, thus greatly improving the reading speed. This cache is the file system cache.
In fact, it is easy to understand: if the memory is unused, it is also a waste if it is empty. why not cache something in it? No matter what the cache is, it only makes one hit. As for how to earn more, it depends on how you choose what content to cache in the limited memory space and how to make the cached content hit more. This part of the content has nothing to do with the problem. I will not go into details. If you are interested, you can look at the operating system principles.
Returning to your question, when you restart the MongoDB instance, the memory occupied by MongoDB has of course been released. However, both data and indexes are still cached in the file system cache, because they all come from data files and index files (provided that no one else uses the memory). The index is loaded on demand, which can be basically guessed from logical reasoning: assuming your 10GB index, do you have to wait for the 10GB index to be loaded into memory when reading for the first time? What if the index capacity is larger than the memory? Therefore, it is obviously unreasonable to load all indexes at once. Even if it is an index, it is partially loaded on demand instead of fully loaded. So what you need is only a small part of this 1GB. Remember that the time complexity of the index is log2(n). To find the required one from 100 million data, in the worst case, only 27 comparisons need to be queried, of course, in an instant.