What database support is better for a two-two mentioned query requirement?

  mongodb, question

Demand

What other brands are mentioned when one brand is mentioned in the query statistics
At present, the brand is preserved in MongoDB in this way.

"brands" : [
 "Siemens",
 "ABB",
 "GE"
 ],

Obviously, this structure is difficult to meet the above query requirements. The following structure is required to support this query requirement

"brand_and_mentioned_brands":[
 {
 "main_brand": "Siemens",
 "mentioned_brands": ["ABB","GE"]
 },
 {
 "main_brand": "ABB",
 "Mentalized _ Brands": ["Siemens", "GE"]
 },
 {
 "main_brand": "GE",
 "Mentalized _ Brands": ["Siemens", "ABB"]
 }
 ]

I wonder how other databases besides MongoDB, such as ES, support this kind of query requirement.

My understanding is that this is the requirement of data analysis. The requirement of OLAP depends on a database query to directly produce results. If SQL may consider stored procedures, MongoDB can choose Map/Reduce and Aggregation, which is preferred. Performance issues will not be discussed here first, and we’ll see if we can achieve the desired results. There should be no problem with the second data structure. My solution for the first structure is as follows:

db.test.aggregate([
 {$project: {brands: "$brands", brands2: "$brands"}},
 {$unwind: "$brands"},
 {$unwind: "$brands2"},
 {$project: {pair: ["$brands", "$brands2"]}},
 {$group: {_id: "$pair", count: {$sum: 1}}}
 ]);

This way, one copy is first made.brandsCome out and do it$unwindEquivalent tobrandsAssemble yourself and arrange yourself, and what you need is combination. For example, your sample data:

{"brands": ["Siemens", "ABB", "GE" ]}

The result is:

{ "_id" : [ "GE", "GE" ], "count" : 1 }
 { "_id" : [ "GE", "ABB" ], "count" : 1 }
 {"_id": ["GE ","Siemens "], "count": 1}
 { "_id" : [ "ABB", "ABB" ], "count" : 1 }
 {"_id": ["ABB ","Siemens "], "count": 1}
 { "_id" : [ "ABB", "GE" ], "count" : 1 }
 {"_id": ["Siemens", "GE" ], "count": 1}
 {"_id": ["Siemens", "ABB" ], "count": 1}
 {"_id": ["Siemens", "Siemens" ], "count": 1}

Some additional data, such as[ "ABB", "ABB" ],["Siemens", "ABB" ]/[ "ABB", "Siemens"]. I haven’t thought of a good way to filter these data directly in the aggregation pipeline, but it should not affect your use. If you think of a more thorough way, I’ll come back and add it.

Supplementary answer

After turning to the off-site audience, aggregation is really strong. You may need to check:$ map,$ reduce,$ let,$ rangeUsage of these operators

db.test.aggregate({
 $project: {
 tuples: {
 $reduce: {
 initialValue: [],
 input: {
 $range: [0, {
 $subtract: [{
 $size: "$brands"
 }, 1]
 }]
 },
 in: {
 $let: {
 vars: {
 i1: "$$this"
 },
 in: {
 $concatArrays: ["$$value", {
 $map: {
 input: {
 $range: [{
 $add: [1, "$$i1"]
 }, {
 $size: "$brands"
 }]
 },
 in: [{
 $arrayElemAt: ["$brands", "$$i1"]
 }, {
 $arrayElemAt: ["$brands", "$$this"]
 }]
 }
 }]
 }
 }
 }
 }
 }
 }
 }, {
 $unwind: "$tuples"
 }, {
 $sortByCount: {
 $setUnion: "$tuples"
 }
 })

The essential meaning of this pipeline operation is:

for(var i = 0;   i < array.length - 1;  i++)
 for(var j = i + 1;   j < array.length - 1;  j++) {...}

Results of implementation:

{"_id": ["GE ","Siemens "], "count": 1}
 { "_id" : [ "ABB", "GE" ], "count" : 1 }
 {"_id": ["ABB ","Siemens "], "count": 1}