Suspected MongoDB Data Loss Problem

  mongodb, question

Using MongoDB for data insertion, we found that there is a problem of data loss.

Cluster environment: five windows server2008 servers, equipped with five shard, with the mode of one master, one slave and one arbitration.

Scenario description: there are about 4000w of data with a size of 62G (picture data). the insertMany method of IMongoCollection interface provided by MongDB C# Driver is used for batch insertion, with about 100 pieces at a time. the write mode of the program has been configured as WriteConcern.Acknowledge, and the journal is opened. the InsertMany also uses try to capture the exception. the final execution result is that no exception is captured. After the execution of the program was completed, some data was found missing.

The number of data parsed by the program is 39821308, while the number counted in mongoDB database is 39804543.

Can Daniel help me analyze and explain it? After analyzing for about a week, I really have no idea.

Because of MongoDB related services, they are suspected to be lost every three to five times, but so far none of them have been really lost.
If you are fully confident that there are no code problems, most people may encounter the following situations:

Count result is incorrect after abnormal shutdown:Accuracy after Unexpected Shutdown;

After an unclean shutdown of a mongod using the Wired Tiger storage engine, count statistics reported by count may be inaccurate.

Count result is not accurate in Sharding environment:Behavior

On a sharded cluster, count can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.

You mentioned above:

The number of data parsed by the program is 39821308, while the number counted in mongoDB database is 39804543.

Since the data returned by the program and shell should be the same, you may be the second possibility mentioned above. In order to obtain accurate numerical values, we need to use the aggregation method as described in the document to calculate the correct results.

supplement

Based on the situation you mentioned, there are other situations that may lead to missing data:

  1. Does the pressure exceed the limit when inserting data in large quantities, resulting in master-slave switching of nodes? You can observe whether keywords such as PRIMARY, SECONDARY have appeared in the log to determine whether switching has occurred.
  2. W=1 is used for write operation by default. Has the default behavior been modified?
  3. There is also a rare but actual situation. Has the inserted data been deleted again? Can be found inlocal.oplog.rsTo find out if there are any missing documents_idTo determine this.