Pymongo count is slow

  mongodb, question

Thirty thousand pieces of data, each containing only one random number {“digit “:random number}
Requirement: Count the number of occurrences
Database table

def main():
 digits = []
 for d in table.find():
 n = d['digit']
 digits.append(n)
 dig = set(digits)
 
 news = []
 i = 0
 for d in dig:
 c = table.find({"digit": d}).count()
 zz = (d, c)
 news.append(zz)
 print(i)
 i += 1
 
 if __name__ == '__main__':
 start = time.time()
 main()
 print('Cost: {}'.format(time.time() - start))

It takes five or six minutes to run once. It is not much faster to run 100 with multiple threads. The fan is also very loud. …
What is the correct posture?

The correct posture is to useaggregation.

db.table.aggregate([
 {$ group: {_ id: "$ digit", count: {$ sum: 1}}},//count the number of occurrences of each number
 {$sort: {count: -1}}, // in reverse order
 {$limit: 1} // take the first record
 ]);

$groupUsers of the can refer to the document.
It should be noted that the possibility of such demand in reality is not high, and it is estimated that this is an exercise topic for you. In fact, even if Aggregatoin is used, it is still necessary to traverse all the data of the entire set to find the most frequently occurring numbers. Therefore, when the total number of records in the set is relatively large, such full table traversal operation is impossible to be fast. Usually, this method is only available in OLAP scenarios, and OLAP usually requires less speed. Therefore, only in theory, the aggregation framework should be used, but the actual requirements still need to be analyzed in detail.