How should the model be designed to contain data that needs increasing?

  mongodb, question

The data model to be designed is a voting post, and users need to record it after voting so as not to repeat voting.
The current design is to save the user id that has voted as an array in the voting post.

Mongodb’s ability to embed increasing data in documents is detrimental to performance. How can it be better designed?

Before designing, please confirm your data size:

  1. If your data size is very small and only a small number of people vote (less than 1,000), worry is unnecessary, although it will grow, it is possible to simply put it into an array (Note mongodb’s size limit for a document);
  2. If the number of voters exceeds 1K, and with the continuous growth, it reaches the scale of W (ten thousand), it will be independent early, and another Collection will be built to store the voting records of posts.
  3. If the number of voters reaches the scale of W, and the voting frequency is also relatively frequent (or there is malicious vote brushing), maybe, you should consider using cache to store the id of all voters in a centralized cache, confirm whether to repeat voting through cache (redis originally supports Set structure), and then synchronize to mongodb; in the background at regular intervals;
  4. If the number of voters reaches the level of one million and the voting frequency is objective, this is a cache that you must use, and it is also a distributed cache cluster that maps the id of all voters to a cache server through calculation (you can simply do a mod calculation), and then the processing method is similar to that in 3.
  5. A kind of processing similar to 4: Forward the user id through apache or nginx at the front end of the server and transfer it to different application servers for processing. The application servers are also distributed horizontally.
    PS: What you have described is only a very small aspect of the business scenario. No matter whether you adopt nosql or SQL, once the data scale comes up, a single computer will inevitably be unable to hold. Distributed expansion is inevitable. However, it should be noted that the complexity also increases. Therefore, it is necessary for you to choose a reasonable scheme according to your own data scale and technical conditions.