Mongo’s Dense Index and Sparse Index


Dense index

Mongo’s indexes are intensive by default. In other words, in an indexed collection, each document will have a corresponding index item, even if there are no indexed keys in the document. For example, for a product set in an e-commerce data model, suppose you build an index on the product attribute category_ids. Now assume that some products are not assigned to any category, and for each product without a category, there will still be a null entry like this in the category_ids index. You can query null values like db.products.find ({category _ ids: null})
When querying all products that lack classification, the query optimizer can still use the index on category_ids to locate the corresponding products.

Sparse index

In a sparse index, only documents that are valued by the index key will appear. If you want to create a sparse index, specify {sparse: true}. For example, you can create a unique sparse index on the sku as follows:
db.products.ensureIndex({sku: 1}, {unique: true, sparse: true})

How to choose

However, there are two situations where it is inconvenient to use dense indexes.

One is when you want to add a unique index to fields that do not appear in all documents in the collection. For example, you explicitly want to add a unique index to the sku field of each product. However, for some reasons, it is assumed that the product has joined the system before sku has been allocated. If there is a unique index on the sku field and you want to insert multiple products without skus, the first insertion will succeed, but subsequent insertions will fail because there is already an item with a null SKU in the index. In this case, dense index is not suitable, what you need is sparse index.

Another case for sparse indexing: a large number of documents in the collection do not contain indexed keys. For example, suppose anonymous comments on e-commerce websites are allowed. In this case, half of the comments may lack the user_id field. if there is an index on that field, then half of the items in that index will be null. For two reasons, the efficiency of this situation will be very poor. First, this will increase the size of the index. Second, updating the index is also required when adding and deleting documents with null value user_id field.
If anonymous comments are rarely (or not) queried, you can choose to build a sparse index on user_id.