About Using text Index to Query Similar Titles

  mongodb, question

The information of the article is saved in Mongo, and the article title is used to judge the article as follows

Global demand resonates, sweeping robots enter thousands of households [monarch's mechanical depth]
 In-depth Report of Service Robot Industry: Global Demand Resonance, Sweeping Robot Enters Thousands of Families
 Global Demand Resonance, Sweeping Robot Enters Thousands of Families-Service Robot Industry
 
 Lovely intelligent robot, the summer vacation gives the babies a most intimate accompanying teacher!  Early learning wins at the starting line!
 Aiyouyouzi intelligent robot, the summer vacation gives the babies a most intimate accompanying teacher!  Early learning wins at the starting line!
 Lovely Youyouzi intelligent robot, the summer vacation gives the babies a most intimate accompanying teacher!  Early learning wins at the starting line!

The above-mentioned article is considered to be a duplicate article by creating the title of the article.textThe index can find duplicate articles as follows

db.post.createIndex({title: 'text'})
 
 > db.post.find({$text: {$search:' service robot industry depth report: global demand resonates, sweeping robots enter thousands of households'}}, {score: {$ meta:' textscore'}}). sort ({$ meta:' textscore'}})
 {"_ id": objectid ("5b2f809152993004aaabdacc"), "title": "service robot industry in-depth report: global demand resonance, sweeping robots enter thousands of households", "score": 2}
 {"_ id": objectid ("5b2f8092993004 aaabdacd"), "title": "global demand resonates, sweeping robots enter millions of households-service robot industry", "score": 1.3393993}
 {"_ id": objectid ("5b2f809152993004aaabdacb"), "title": "global demand resonates, sweeping robots enter thousands of households [monarch's mechanical depth]", "score": 1.25}
 
 > db.post.find({$text: {$search:' summer vacation gives the babies a most intimate companion teacher!  Early learning wins at the starting line!'  }}, {score: {$meta: 'textScore'}}).sort({score: {$meta: 'textScore'}})
 {"_ id": objectid ("5b2f8366529993004aaabdad 6"), "title": "lovely intelligent robot, summer vacation to the baby a most intimate accompany teacher!  Early learning wins at the starting line! "  , "score" : 1.3333333333333333 }
 {"_ id": objectid ("5b2f8366529993004aaabdad 7"), "title": "aiyouyouzai intelligent robot, summer vacation gives the babies a most intimate companion teacher!  Early learning wins at the starting line! "  , "score" : 1.3333333333333333 }
 {"_ id": objectid ("5b2f8366529993004aaabdad 8"), "title": "lovely youyouzai intelligent robot, summer vacation gives the babies a most intimate companion teacher!  Early learning wins at the starting line! "  , "score" : 1.3333333333333333 }

However, in some cases, articles with similar titles cannot be found as follows

Yi Language Intelligent Translation Software Conference Held in Xichang, Sichuan
 Yi Language Intelligent Translation Software Conference of China National Language Translation Bureau Held in Qionghai Hotel, Xichang
 Yesterday, the Yi language intelligent translation software conference of the Chinese National Language Translation Bureau was held in Xichang, Sichuan Province.
 
 # Why can't the second find out
 > db.post.find({$text: {$search:' yi intelligent translation software conference of China national language translation bureau held in Xichang, Sichuan'}}, {score: {$ meta:' textscore'}). sort ({$ meta:' textscore'}})
 {"_ id": objectid ("5b2f80cd52993004aaabdad3"), "title": "yi language intelligent translation software conference of China national language translation bureau held in Xichang, Sichuan", "score": 1.1}
 {"_ id": objectid ("5b2f80ce52993004aaabdad 5"), "title": "yesterday, the yi language intelligent translation software release conference of China national language translation bureau was held in Xichang, Sichuan province", "score": 0.75}
 
 On June 5, make a statement on the situation of stopping profits
 On the 7th of June, we will make a statement on the situation of stopping profits.
 On June 8th, the company made a statement on the situation of stopping the surplus.
 
 # Why can only one be found?
 > db. post. find ({$ text: {$ search:' June 5, make a single stop list'}}, {score: {$ meta:' textscore'}}). sort ({$ meta:' textscore'}}})
 {"_ id": objectid ("5b2f80ad52993004aaabdace"), "title": "make a single profit stop list on June 5", "score": 1.1}

Why can’t we find some titles that are obviously very similar?

Because MongoDB’s support for Chinese requires the installation of a third-party segmentation engine RLP (Rosette Guidelines Platform). This engine is not free, so support for Chinese is only supported in MongoDB Enterprise Edition. Please refer to the documentation for details:https://docs.mongodb.com/manu …