For example, how to design a subscription system depends on key words. For example, if you subscribe to “small three” news titles and include “small three”, then you subscribe to “Apple” and then you also include “Apple” in the title. In this way, you may have to design to exclude some words.
If you use the database LIKE, it will definitely consume a lot of resources. How do you get the mature amount? How do you publish different results to different subscribers?
Please describe the problem in detail first
The source of the news is What? UGC？ Grab?
How much news is there? Millions? Ten million?
To grab information, news, tens of millions of orders of magnitude, for example, is roughly divided into the following parts:
1. The captured news will have a classification module to label each item and put it into storage
2. In order to access speed, maintain a data structure of tag-news unique ID array in memory, and can use redis or write a service separately.
3. There will be a table to maintain users and subscription topics
4. when the user sends a request to obtain a message, go to the memory to obtain the latest news ids according to the requested topic words, go to the database to obtain the news according to the ids, and add a layer of cache on the database to prevent concurrency.
In the case of a search engine, an inverted index table is maintained in the memory. Before each news item is put into storage, the data will be cleaned and cut into words to be stored in the table. After that, the user will find specific items to return according to the index in the table.