#### Million Data Aggregated GroupSum Statistics Super Time-consuming Problem, Seek Solution

1. Example of Document Structure

``````{
_id: xxxx,
user: 'xiaoming',
level: 5,
from: 'iPhone',
info: 'something wrong'
}``````

2. Scene: There are 67 million documents with user as’ minor’

3. Question: How to Increase the Speed of aggregate+group+sum

``````aggregate([
{\$match:{user: 'xiaoming', info:{\$regex:'wrong'}}},
{\$group:{_id:null, count:{\$sum:1}}}
])``````

Use the above to count the number of xiaoming documents with wrong. As a result

``{"_id": null, "count": 2299999 }``

It takes 30s-40s. User, info, user+info three indexes have been tried, and the speed has not improved.
Baidu and google Find’ Conditional Counting Slow Without Solution’
How to Improve Efficiency, Can It Be Achieved within 10s

The first problem to be explained is that the expectation for OLAP-type operations should not be too high. After all, for the operation of a large amount of data, IO alone has far exceeded the normal OLTP operation, so it is unrealistic and meaningless to require the speed and concurrency of OLTP operation. However, this does not mean that there is no room for optimization.
Let’s start with the index. On the premise of no index, find 6 million`{user: "xiaoming"}`How long does it take? Full table scan`COLLSCAN`Finding 600w from 700w data and finding 600w from 100 million data are obviously two concepts. Hit index`IXSCAN`, the difference will be much smaller, almost negligible. So you said`{user: 1}`It is wrong that this index has no effect, perhaps because the amount of aggregate data is too small to see any difference. Incidentally, it should be mentioned that whether there is any difference in efficiency depends on the implementation plan, not the execution time, which is inaccurate.
There are`user`On the premise of index, the result still has 600w, the rest is`regex`,`regex`The index cannot be hit, so whether it is right or not`info`The index of is meaningless. After finding 600w pieces of data, there is another one for 600w data`filter`Operation. The only thing that may be helpful to this operation is`Full-text index`However, full-text indexing cannot completely replace regularization, and specific problems need to be read.Document. Considering the feasibility of full-text indexing, composite indexes can be established:

``````db.coll.createIndex({
user: 1,
info: "text"
});``````

``````db.coll.aggregate([
1. Every time`info`Count fields when they are updated or inserted.