Hello, everyone, let me describe the problems I encountered.
I want to be a crawler. The crawler crawls through userid. The crawling process is divided into two steps-two requests, request one to refresh cookie and request two to get content. It is not feasible to use the for loop because the request is asynchronous. even if I write the second request in the callback, there will still be asynchrony in the for. the userid has a large amount of data and for cannot run, so on balance, I used recursion.
However, using recursion will cause a problem, that is, the memory used will slowly grow until it takes up the maximum memory that nodejs can use (we will not discuss the last recursion first) (What? will occur when nodejs memory is full I only saw that my program Blue Veins was still there and did not report any mistakes.
In response to question one
I want to create a new sub-process through the cluster module and put the crawler into the sub-process to complete, so that I restart the sub-process to free up memory every once in a while, but found that this method will bring some problems. First, the crawler in the sub-process has extremely slow crawling speed and extremely low performance. Second, the child process runs out of memory, which is problem one. The parent process cannot get any information and cannot know it through error or disconnect. In fact, I don’t understand why the sub-process crawling process is so slow. I think both the parent process and the child process belong to the same process group.
Ps: I am on mac OSX 10.10.5 system
Electronic, non-technical and operating system charging is planned for the next 3-4 months. Thank you for your trouble.
First of all, I think your thinking is wrong. The for loop doesn’t work. Then find a feasible way. It is suggested to look at the usage of promise,async async.
Second, memory growth, you have to figure out why the growth, rather than timing to restart him, this is temporary solution not the root cause.