Take enterprise real-time data platform as an example to understand what agile big data is.

  Agile, Big data

Agile big data is to build a series of common platform tools and a whole set of big data application life cycle methodology under the guidance of agile concept principles to support lighter, more flexible and lower threshold big data practice. This article explains what we understand as “agile big data” from a theoretical perspective.

First, the concept and principles of agile big data

1.1 Component/Platform/Product/Localization

Componentization/platformization: through modular abstraction of big data processing links, a plurality of componentized platforms with highly cohesive functions are formed; Modular platforms can be used independently and integrated with existing platform components, or combined to solve problems on more different links.

Productization/localization: By combining different componentization platforms, together with abstracted business logic models and rule algorithm models, it is easy to build a product solution for a specific business domain. The solution product can be localized when actually landing, mainly including data model adaptation/rule set introduction/algorithm model parameter adjustment, etc.

1.2 Unification/Openness/Control

Unification aims at simplifying the complexity of the system and improving the control ability. Openness aims at enhancing adaptability and flexibility. The two complement each other and need to find a reasonable balance point without losing overall control.

1.3 Standardization/Interfacing/Configuration/Visualization

Standardization/Interfacing: In the big data processing link, a series of standardization protocols are formed, including data namespace protocol/metadata and data type specification protocol/data access interface protocol/query language protocol/data transmission protocol/data security protocol, etc. The interaction between systems is provided in the form of service interface and queue interface.

Configurations/visualizations: provide human-computer interaction in a configurable and visualized manner.

1.4 Self-help/Automation/Intelligentization

Modern data applications require capability output, allowing domain users to more self-help use platforms and data to meet business needs in a controlled environment. Self-service routine operations can be better supported in an automated manner; Self-help insight analysis can be better supported in an intelligent way.

1.5 Engine Driving (Event Engine/Action Engine/Rule Engine)

By introducing advanced engine driving capability, agile big data applications can reach external audiences more quickly, flexibly and actively. At this time, big data applications themselves have become powerful business driving engines.

Second, general platform tools that can be abstracted

Taking the enterprise real-time data platform as an example, under the guidance of the agile big data concept principle, we modularized the whole end-to-end of the real-time data platform and formed a series of standardized protocols. Finally, we determined which common platform tools to develop, their boundaries and interface specifications based on the unified and open principle.

The above diagram is the conceptual module architecture diagram of the real-time data platform. In subsequent articles, we will take the real-time data platform as the starting point to elaborate the abstract concept and architecture design of the derived universal platform tools.

III. Throughout the Life Cycle of Big Data Applications

3.1 Requirements Analysis and Verification Phase

In the requirement analysis phase, we need to be able to quickly develop a prototype POC for data application, and be able to quickly iterate to cover all requirement points as soon as possible after verification is effective.

The platform/configuration/visualization capabilities of agile big data can support business developers to rapidly carry out requirements iterative verification through configuration and visualization. Business developers only need to pay attention to the business issues themselves, and do not need to pay too much attention to big data technology issues.

3.2 Architecture Design and Selection Stage

In the process of actual storage and calculation engine selection, many factors need to be considered. In addition to meeting SLA and data size requirements, it has to be subject to various restrictions and problems of open source technology selection.

The standardization/interface/unification/openness of agile big data provides a set of best practices for architecture selection, which not only greatly reduces the complexity of system design, shields the incompatibility of open source technologies, but also supports the flexibility of selecting different storage and computing engines.

3.3 Implementation of Test Tuning Stage

Implementation testing and tuning of customized development of big data is often a time-consuming and labor-consuming task, and the complexity of testing and tuning increases with the length of processing links and the diversity of technology selection.

The platform/interface/configuration/visualization/unification/management and control capabilities of agile big data can change the process of test tuning into an iterative process that requires only visual configuration/experiment/verification. The problem of too long data processing links is configured/visually shielded, and the problem of too many technology types is unified/controlled shielded.

3.4 Online Deployment Migration Phase

The online deployment and migration of customized development of big data is often complicated and prone to errors. Even if it can be supported by scripts, it may bring potential problems due to non-uniformity and non-intuition. The platform/configuration/visualization/unification/control/self-service capabilities of agile big data can make online deployment and migration easier. All of these benefits from the platform’s unification capability, and these capabilities are open to users in a self-service manner.

3.5 Management, Operation and Maintenance Monitoring Stage

Management, operation and maintenance monitoring is usually centralized and controlled in enterprises. Platformization, control and self-service of agile big data also provide corresponding capabilities. In addition, it can also provide automation, intelligence and other capabilities to further reduce operation and maintenance workload. At the same time, it can also integrate with the existing monitoring and operation system of the environment through the interface capability.

IV. Practice Agile Big Data Practice

The above figure is the relationship between the various components of agile big data that we summarized:

Agile Big Data Concept+Agile Big Data Platform Stack+Agile Big Data Methodology → Agile Big Data Practice

This paper gives the definition of “agile big data” and the concept of agile big data, and briefly describes how to build platform stack and how to practice methodology based on this concept. In the following articles, we will launch our agile big data journey in detail around specific agile big data practical experiences.

Author: Lu Shanwei

Source: Yixin Institute of Technology