I saw an article on the Internet saying: Database is not suitable for Docker and containerization?

  docker, question

Article connection
Just started using docker, a little confused?

On the database is not suitable to be placed in docker, there are two articles about nuts, one is posted by the landlord, the other isThis article,Translation

The same view:

When the quantity is small, you can do it casually, but when the quantity is large, you will not be able to do it. The traditional database and docker are not in the right way. It is recommended not to container the database directly, but to container the database. Then you need the support of various systems, including middleware system and container system.

Docker is a better solution if your database can automatically scale, recover from disasters, switch, bring your own multi-node solution, etc.

But if not, don’t use docker.

The original text is also very clear:

Horizontal scaling in Docker can only be used for stateless computing services, not databases.

When the flow is small, anything can be containerized. Database, Application, hadoop, Various Nodes, nginx.

In case of large quantity, storage-related services are not suitable for containerization, stateless services such as application layer and business layer are suitable for containerization, and memory-intensive services such as cache can be containerized.

In short, there are three problems: disaster tolerance, performance and data consistency.

For a traditional database like mysql, there are just as many problems as I can list:

  1. How can mysql be containerized?

  2. What if mysqld, the main library, kneels down?

  3. What if dockerd, the main library, kneels?

  4. What if I kneel down from mysqld?

  5. What if I kneel down from dockerd?

  6. Can mysql be rapidly expanded through containers at the peak? Scheme?

  7. Data master-slave switching scheme? How to ensure consistency?

  8. The peak volume is large enough, sometimes the capacity of a physical machine is only enough for a mysql process.

  9. So it is also a single machine, why can’t mysql be started directly?

  10. Why do you need a layer of container outside? How much is the performance loss?

  11. How to upgrade mysql?

  12. Do data volumes lose data? (I have met many times with damaged containers …)

However, mysql is not completely containerized.
Services that are not sensitive to data loss (e.g. products found by JD.com) can be digitized, and database fragmentation can be used to increase throughput by increasing the number of instances.

Some of the problems mentioned in the original title have slots, but they have been carefully considered. For example, the following questions have a lot of slots (about shared data directories):

Easy to scale horizontally?  Do you want to share data directories among multiple instances?  Are you not afraid of direct data concurrency problems and possible data corruption?  Wouldn't it be safer to deploy multiple instances using a dedicated data environment?  Finally make a master-slave copy?

As far as the databases I have come into contact with at present are concerned, only cassandra (including tidb and cockroachdb, but there are no use cases from large companies) databases are suitable for containerization.

However, cassandra itself is almost stateless: it provides disaster recovery, expansion and handover schemes.

Let’s talk about Jingdong.

JD.com is a different kind of company, but JD.com also mentioned similar problems and points for attention.

Computing applications and stateless applications are preferred, for example, microservices are especially easy to migrate to elastic cloud.
 When the application is migrated to elastic cloud, it is better to choose uniform specifications to avoid unbalanced load of each instance.
 After applications migrate from physical machines to elastic clouds, the number of instances will increase, and the corresponding number of connections to back-end services will increase, especially database connections, so it is necessary to prevent connection overload.
 In order to share disk IO on elastic cloud, it is necessary to avoid log brushing and reduce local reading and writing of files. JFS or JIMDB is adopted to meet the requirements of file storage or data sharing.
 The number of CPU cores in the container is lower than that of the original physical machine. The application needs to reasonably configure the number of threads and network parameters according to the number of CPU cores.
 Modify the bottom layer so that the application can accurately get the core number of its container at runtime.

Even, there are many customizations to docker.

Can seeJD.com shares.