How can mongodb extract useful data after storing multiple data passed in by pymongo

  mongodb, question

There are many such similar data

{ "_id" : ObjectId("56d06f01c3666e08d0f0c844"),
 "http://tieba.baidu.com/p/4345287300": "the original words of the author [about the update]."
 "http://tieba.baidu.com/p/4328978430": "services."  ,
 "http://tieba.baidu.com/p/4372502982": "the memory of the dead" chapter 331: holy east palace ",
 "http://tieba.baidu.com/p/4355241530": "the memory of the dead" chapter 322: the power of kirin ",
 "http://tieba.baidu.com/p/432950585": "the memory of the dead" chapter 313: begging with tears of blood,
 "http://tieba.baidu.com/p/4343824178": "happy new year!"
 "http://tieba.baidu.com/p/4328603018": "does it look good to write novels?"
 "http://tieba.baidu.com/p/4333008061": "come on, let's have a fight".
 "http://tieba.baidu.com/p/4315565196": "the memory of the dead" chapter 305: taking orders in the face of danger ",
 "http://tieba.baidu.com/p/4340906961": "the memory of the dead" chapter 320: capturing the thief and the king ",
 "http://tieba.baidu.com/p/4337476352": "is the new year coming, is it a red bag?"
 }

I want to get the connection that can match the above data: “Zhu hun Ji” and the following text data, such as

"http://tieba.baidu.com/p/432950585": "the memory of the dead" chapter 313: begging with tears of blood "

In this way,
At the same time, the structure found in the query is stored in another table, and

"http://tieba.baidu.com/p/432950585": "the memory of the dead" chapter 313: begging with tears of blood "

Connection inhttp://tieba.baidu.com/p/4329505585

Recently, I started to contact with some reptile related things and wanted to make something for myself, which was really urgent.

The following is the code in python

def craw(self, root_urls):
 for new_url in root_urls:
 html_cont = self.downloader.download(new_url)
 new_chapter_urls, new_linkdatas = self.parser.parselink(root_chapter_url, html_cont)
 mid_data = zip(new_chapter_urls,new_linkdatas)
 mid_mid_datas = dict((new_chapter_urls,new_linkdatas) for new_chapter_urls,new_linkdatas in mid_data)
 c = pymongo.MongoClient(host='127.0.0.1', port=27017)
 db = c.spider
 db.chapter_datas.insert(mid_mid_datas, check_keys=False)

Why don’t you filter it directly according to whether the contents in data contain “the memory of the dead” when grabbing?

> > > s = "『 the memory of the dead "chapter 331: holy east palace"
 In s
 True
 > > s = "happy new year!"
 In s
 False