How to Select Index when mongo Queries

  mongodb, question

Problem description

Mongo has the following two indexes. The first index is the single-field index of the {_id} field.
The second index is a composite index composed of {chat_id, _id}. When I use queries
db.collection.find({"chat_id" : ObjectId("*********"), "the_time":{"$lt" : ISODate("2010-01-01T00:00:00Z")}}).sort({"the_time":-1}).explain("executionStats"), why do you get the following results analysis

......
 "winningPlan" : {
 "stage" : "SORT",
 "sortPattern" : {
 "the_time" : -1
 },
 "inputStage" : {
 "stage" : "SORT_KEY_GENERATOR",
 "inputStage" : {
 "stage" : "FETCH",
 "filter" : {
 "the_time" : {
 "$lt" : ISODate("2010-01-01T00:00:00Z")
 }
 },
 "inputStage" : {
 "stage" : "IXSCAN",
 "keyPattern" : {
 "chat_id" : 1,
 "_id" : 1
 },
 "indexName" : "chat_id_1__id_1"
 ......

Why is a composite index composed of {chat_id, _id} used?

The composite index I know is like this

An index prefix refers to a subset of a composite index

If the following index exists
 
 { "item": 1, "location": 1, "stock": 1 }
 
 There are the following index prefixes
 { item: 1 }
 { item: 1, location: 1 }
 
 In MongoDB, the index will be used to
 Item field
 Item field+location field
 Item field+location field+stock field
 Item field+location field (although the index is used, it is not efficient)
 
 The index will not be used for the following filtering condition query scenarios
 Location field
 Stock field
 Location+stock field

Question 1: Is it true that in the composite index {chat_id, _id}, even if the query used is {chat_id, the_time} the_time field is not in the composite index, but chat_id is in the composite index, so will this composite index be used?

Question 2: In addition, I added an additional index {chat_id,third_field}, and still queried {chat_id, the_time}. The final analysis result was {chat_id,third_field} instead of {chat_id, _id} this index. What is mongo’s strategy for index selection?

Problem one

  1. If the query can hit the index, it can directly give the addresses (IXSCAN) of all documents that meet the conditions. Since the addresses are obtained, not the documents themselves, an additional step is needed to find the actual documents (FETCH) from the addresses.
  2. If the query does not have index support, you can only put thePossibleAll the data satisfying the conditions are loaded into the memory, and then whether the conditions are satisfied is compared one by one to finally obtain a result set (COLL SCAN);

If we take the second path, it is obviously quite resource-consuming and time-consuming, so all our queries should hit the index as much as possible, or partially hit the index. Your question may be that you don’t understand what partial hits are. Suppose your collection has 100w records, query criteria{"chat_id" : ObjectId("*********"), "the_time":{"$lt" : ISODate("2010-01-01T00:00:00Z")}}Below:

  • The worst case is that there is no index support, so you need to go through 100w records to see who satisfies these two conditions to get the result set.
  • But now there are{chat_id: 1, _id: 1}Yes, although it cannot fully satisfy your query, the first condition is satisfied. Assuming that this condition filters 90w records (only 10w records remain), then the rest of the conditions only need to be traversed in these 10w records. is it better than traversing 100w records?

If you choose{_id: 1}, no help to query, is the same as the first case; If you choose{chat_id: 1, _id: 1}, at least can have some help, so why not choose the latter?

question 2

First of all, we should understand a problem. Even if the same condition is run twice using the same index, the execution time is not necessarily the same, because the pressure on the server is not necessarily the same. So the two indexes you mentioned{chat_id,third_field}And{chat_id,the_time}Their effects on your execution conditions are almost the same. Objectively speaking, the most efficient one will be selected first, and the faster one will always be selected. Once the faster one is selected, the execution plan cache will ensure that it will be used all the time before restarting, instead of being evaluated every time (wasting resources). So there is no point in comparing them here.

Other

db.collection.find({"chat_id" : ObjectId("*********"), "the_time":{"$lt" : ISODate("2010-01-01T00:00:00Z")}}).sort({"the_time":-1}).explain("executionStats")

The best index to satisfy this query should be:{chat_id: 1, the_time: -1}