Sql left join Result union Right Join Result, How to Solve Repetitive Calculation?

  mysql, question

Simplification of Sliding Computing Scenarios

The database environment isMySQL

Because the original problem scenario is more complex, the following will simplify the description:(Title description may not be relevant)

There are two watches now.t1Andt2, the table structure is the same, but the stored data is different.Primary key aAndField bAll types areIntType. Examples are as follows.

Among them, the tablet1Sum tablet2Existing partPrimary key aSame, butField bData with different values. There are alsoPrimary key valueOn the tablet1Is present in, but tablet2Does not exist in, orPrimary key valueOn the tablet2Is present in, but tablet1Does not exist in.

  • Table t1:

a b
1 10
2 5
3 7
4 7
5 7
  • Table t2:

a b
3 17
4 8
5 3
6 11
7 6

Now, if the primary key value is the same,t1.b - t2.b. IfPrimary key aIf the value of does not exist in the opposite table, thenField btakeZeroValue. Query to get the data in the following table structure.

a t1.b – t2.b
1 10
2 5
3 -10
4 -1
5 4
6 -11
7 -6

What I thinkSqlThe statement reads as follows:

SELECT t1.a, t1.b - t2.b from t1 left  join t2 on t1.a = t2.a
 Union
 SELECT t2.a, t1.b - t2.b from t1 right join t2 on t1.a = t2.a;

However, this oneSqlTwo problems cannot be solved:

  • IfPrimary key aIf the value of does not exist in the opposite table, thenField bTakeZeroValue.

  • TwiceJoinThis will result in double counting of data with primary key values of 3,4,5

How to write sql is better?


The following sql can solve the above problem of transferring to 0, but it still cannot solve the problem of double counting.

SELECT t1.a, ifnull(t1.b, 0) - ifnull(t2.b, 0) from t1 left  join t2 on t1.a = t2.a
 Union
 SELECT t2.a, ifnull(t1.b, 0) - ifnull(t2.b, 0) from t1 right join t2 on t1.a = t2.a;

select fa,fb from(
 SELECT t1.a as fa , ifnull(t1.b, 0) - ifnull(t2.b, 0) as fb from t1 left  join t2 on t1.a = t2.a
 Union
 SELECT t2.a as fa,0-t2.b  as fb from t2 where t2.a not in (select t1.a from t1 )
 ) t order by t.fa

The above questions are over. The following is a simulation and description of the original problem scenario. If you are interested, you can look at it and point out what is incorrect or not good enough.

Sliding Calculation Scene Simulation

  • Table t3:

a b c
1 10 1
2 5 2
3 7 3
4 7 4
5 7 5
3 17 6
4 8 7
5 3 8
6 11 9
7 6 10
# Simple sql Simulation
 select t.a , sum(t.b) from (
 select t3.a , b from t3 where c <=3
 union all
 select t3.a , 0-b from t3 where c > 7
 ) t group by t.a

Sliding calculation scenario description

There are no fields representing primary keys in table 3. one row represents a consumption record, column c represents a consumption timestamp, column a represents a user number, and column b represents a consumption amount.

Real scene: At present, if you want to calculate the total amount of each user’s transaction within 24 hours before the current time every 500 milliseconds, the total number of times. (even mean, variance, standard deviation, etc. note: variance, standard deviation, these statistical methods are not suitable for sliding calculation and are not considered here)

Simulation scene: Now assume that there is only one transaction every time t, and the value in column C in Table 3 represents the total amount from 4t before the current time to the current time every time t is incremented. If the total amount in 4t time is calculated completely each time, the total number of times may be relatively large.

After that, the calculation results need to be persisted into table 4 for each calculation (the user number in table 4 is unique; if there is such a user, the total amount and number of times are updated; otherwise, it is directly inserted).


For example, the current time for the first calculation is 6, which requires calculation.c <= 6 and c > 6-4An example of sql for is:

select t3.a , sum(t3.b) , count(1) from t3 where 1=1 and (c <= 6 and c > 6-4) group by t3.a;
 # Note that a unique constraint on field A needs to be added in Table 4
 insert into t4 (a,sumb,cnt) values (3,    24,    2)  on duplicate key update sumb = 24 , cnt = 2;
 insert into t4 (a,sumb,cnt) values (4,    7,    1)  on duplicate key update sumb = 7  , cnt = 1;
 insert into t4 (a,sumb,cnt) values (5,    7,    1)  on duplicate key update sumb = 7  , cnt = 1;

The current time for the second calculation is 7, which needs calculation.c <= 7 and c > 7-4In reality, the time span (24 hour) represented by 4t is always very large. If calculated according to the above sql, a large number of transaction records will be involved in each calculation, although the calculation burden is on the database rather than the business logic. The calculation time represented by T is relatively small (500 ms). Every T time, the newly added consumption records are always relatively small.

select t3.a , sum(t3.b) , count(1) from t3 where 1=1 and (c <= 7 and c > 7-4) group by t3.a;
 # Query Results: (3,17,1),(4,15,2) ,(5,7,1)
 # Persist query results.  Note: a unique constraint on field a needs to be added to table 4
 insert into t4 (a,sumb,cnt) values (3,17,1)  on duplicate key update sumb = 17 , cnt = 1;
 insert into t4 (a,sumb,cnt) values (4,15,2)  on duplicate key update sumb = 15 , cnt = 2;
 insert into t4 (a,sumb,cnt) values (5,7,1 )  on duplicate key update sumb = 7  , cnt = 1;

Now, the first calculation adopts the above-mentioned total calculation scheme, and if the second calculation can use the result of the first calculation, add the newly added consumption record for the second time (i.e. the record with c=7), and then subtract the consumption record reduced for the second time (i.e. the record with c=3). (Here, how to find new or reduced records will not be introduced again.)

select t3.a , sum(t3.b) , count(1) from t3 where 1=1 and (c <= 6 and c > 6-4) group by t3.a;
 # Query Results: (3, 24, 2),(4, 7, 1) ,(5, 7, 1)
 # Persist query results.  Note: a unique constraint on field a needs to be added to table 4
 insert into t4 (a,sumb,cnt) values (3,24,2) on duplicate key update sumb = 24, cnt = 2;
 insert into t4 (a,sumb,cnt) values (4,7,1) on duplicate key update sumb = 7, cnt = 1;
 insert into t4 (a,sumb,cnt) values (5,7,1) on duplicate key update sumb = 7, cnt = 1;
select t.a , t.sumb, t.cnt from (
 select t3.a as a , sum(t3.b) as sumb, count(1) as cnt from t3 where 1=1 and c=7 group by t3.a
 union all
 select t3.a as a,0-sum(t3.b) as sumb,0-count(1) as cnt from t3 where 1=1 and c=3 group by t3.a
 ) t
 # Query Results: (4,8,1), (3,-7,-1)
 # Persist query results.  Note: a unique constraint on field a needs to be added to 4
 Insert into t4 (a, sumb, cnt) values (4, 8, 1) on duplicate key updatesumb = sumb plus 8,cnt = cnt plus 1;
 insert into t4 (a,sumb,cnt) values (3,-7,-1) on duplicate key update sumb = sumb-7,cnt = cnt-1;

select t.a , sum(t.b) from (
 select t1.a , b from t1
 union all
 select t2.a , 0-b from t2
 ) t group by t.a

The answer is only forSimulation scene simplificationThe answer.
Below the questionSliding Calculation Scene SimulationAndSliding calculation scenario descriptionThere is a more detailed description of the process in. Those who are interested can look at it.