Nodejs Sort Millions of Data

  node.js, question

Sorting of millions of data. The environment is nodejs, there is an object, the contents are very simple, just key=>value format

let keys = Object.keys(data);

It took me one second to carry out this step alone.

The cycle took 1.5 seconds.

Is there anything else to do?
I don’t know how people arrange and search data in less than one second. It’s amazing.

-Supplementary-

The following answer focuses on the needs. Tell me more about it.

1. First of all, the nodejs environment is not the real front end. I did not say this above, but it will not affect the solution of the problem.

2. What are the angles of the following answers? Show a few pages first. I didn’t say to show a million pieces of data. I just sorted and searched.
For example, I search key=abc, article 100, lists 50 data, and lists them according to key order. don’t you search and sort from these 1 million data? Do you want to operate from the first 50 articles?

I operate 1 million, I didn’t say to show 1 million. . . This needs to be seen clearly.

3, the specific situation is like this. The underlying data storage uses leveldb, which is stored in the hard disk, and a single key search is still quite fast. But if you want to sort and page, he has to read one million pieces of data from the hard disk one by one, and then you can do the operation.

According to him, after reading once, it is estimated to take 4 minutes. Read from the hard disk, the speed must be very slow, but he also has advantages, memory footprint is small. Otherwise, all your 10G items will be taken out at one time and operated again. It is conceivable that …

So, the solution is to generate the index yourself. Put the index into a file and load it into the memory at one time when querying, which is much more efficient than sorting 1 million articles one by one from the hard disk.
For example, id=>dateline saves id and time. if you want to sort by time, you can sort the index, for example, take out 50 ids and then read them on the hard disk

Because the index file is very small, it only holds some data that can’t be as simple as that, so millions of pieces of data add up to only a few tens of MB, so the memory usage is also very low.

4. In this case, using Redis storage for indexes should be a good solution, but the environment does not allow it. If the environment permits, I will just use mysql and other storage.

let keys = Object.keys(data);

This sentence is equivalent to traversing all elements, expressed as O(n*x) in time complexity, and x is the average time complexity required to find an element. Js{}The type should adopt a balanced tree. The best complexity of balanced tree search is O(logn), which means the time complexity of the above sentence should also be O(nlogn), because{}There are other methods for the type. The performance of all balanced tree is not the best, and the actual running time is bound to be higher than the theoretical estimation of O(nlogn).

However, your current situation is: data is a 100w data level set, and you need to sort the key values, while js requires 1.5s seconds to execute this sentence once.

Resolution:

  1. Js is slow, change a language to handle your index set, such as C plus class. The disadvantage is that js has to recalculate every time if it needs a different order of key values. If the stored data is m odified or deleted, the index will have to be recalculated.

  2. Databases can also be indexed to improve search speed.

  3. In fact,{}The stored data is orderly, which can be checked.binary sort treeFor the problem of outputting the first 50 data of 100w, only:

    var num = 0;
     for(var x in data){
     If(num < 50) num plus;
     else break;
     // your code used to deal data[x]
     bracket

Looking at the other comments of the topic owner, we know that the topic owner’s goal is to sort the data according to a certain key value, take the x number, and the amount of data is as high as 100w ..

In any case, I have to read 100w pieces of data. I feel that I can’t greatly increase the speed any more. The method of algorithm complexity O(nlogn) is implemented for 1.5s, which is already the theoretical minimum.