What is the best plan for cross-service query aggregation in Microservice?

  java, question

In Microservice, each service has its own independent database.
However, there is now a need to generate a real-time report that contains data from two services.
Such as service a and service b. B contains only the primary key id of a as an association.
The search criteria for this report include the fields in service entity A and service entity B.

Existing scheme
1. if the search condition includes the condition of a, first search in service a to obtain the primary key of all results, and then use where A.id IN (ids) to query again in service b.
Idea: when A.id number is large, this query is extremely slow! However, there are many cases where A.id numbers are huge.

2. Use search engines

Idea: Feel like killing chickens with a scalpel

May I ask Daniel if he has a better plan?

In fact, this kind of problem is very common in Microservices. For example, you need to inquire about orders through some information on commodities. Orders and commodities belong to two Microservices respectively. Besides your own two solutions, there are also

  1. The data aggregation is put into the data warehouse, and the data in real-time aggregation a and b are put into another library (not necessarily mysql, but also Hbase). the data pulled by the report are all pulled from the data warehouse

  2. When designing the table, some fields should be appropriately redundant, as you said, some fields of A should be predictably redundant on B.

Method 1 has a fatal disadvantage. Once paging is involved, this method is definitely not feasible. The specific scheme should be decided according to the corresponding order of magnitude of your data. If the corresponding amount of data is not large, method 1 can be adopted. If the speed is relatively slow, several more threads can be opened to scoop up the corresponding data in batches (if the id number is too large, batch queries are all effective solutions to reduce timeout and time). If there is a large amount of data, it is recommended to use data warehouse. The main advantage of using data warehouse is that there will be no pressure on the main database, because the generation of aggregated tables can be obtained through Binlog. Because the report still belongs to the category of offline data, if it really needs to be real-time like the order query, the table with high efficiency is accompanied by the status, and the search conditions are enormous, then the search engine is a good choice.
Therefore, methods 1 and 3 can be adopted according to the actual situation.