What is Agile Big Data and Agile AI?

  Agile, Artificial intelligence, Big data

The main goal of agile big data intelligence is to combine the agile big data implementation concept, develop flexible and lightweight intelligent models, and carry out real-time intelligent processing of data streams on the agile big data platform, so as to finally realize one-stop intelligent big data analysis practice.

I introduction

The birth of artificial intelligence can be traced back to the 1950s. McCarthy put forward the concept of AI at Dartmouth Conference. However, after the initial heat, the development of artificial intelligence has gone through many low points. It was not until the nearly 20 years from the mid-late 1990s to the present that artificial intelligence really ushered in the golden age. Especially in the past 10 years, various factors have promoted its continuous development: in theory, machine learning, especially statistical learning and neural network theory, has been breaking through with remarkable results; In the external environment, advances in software and hardware technologies provide sufficient computing power for the realization of artificial intelligence models. In addition, an extremely important factor is in data. The development of big data technology has finally freed artificial intelligence from the shackles of data and can improve the model’s ability on the basis of sufficient samples. It can be said that the research and development of intelligent models in various fields cannot be separated from the support of big data technology.

On the other hand, artificial intelligence also plays an extremely important role in big data technology.

  • On the one hand, the value of data collected by big data technology can only be found through some intelligent analysis processes;
  • On the other hand, through the intelligent analysis of the existing data, we can deduce more data features and even further guide the direction of data production.

Therefore, when we talk about the use of big data today, we inevitably involve concepts such as artificial intelligence and machine learning.

Agile Big Data Platform Stack, as a real-time data infrastructure platform, is the result of further development of big data theory and technology. Naturally, there will also be research and layout on intelligence. The main goal of agile big data intelligence is to combine the agile big data implementation concept, develop a flexible and lightweight intelligent model, and carry out real-time intelligent processing of data streams on the agile big data platform, and finally realize one-stop intelligent big data analysis practice.

In order to achieve the above goals, we have conducted in-depth research and analysis on artificial intelligence, machine learning, real-time computing and other technologies, as well as knowledge in related business fields, and even product user experience. This series of articles will share our ideas and some experiences and achievements gained in the above process.

Second, real-time data intelligent processing

With the development of technology, we can obtain unprecedented massive data. If we can process these data quickly and efficiently and find high-value information, we can undoubtedly greatly improve the adaptability of enterprises, thus making quick tactical and even strategic adjustments in complex and changeable business scenarios. Therefore, real-time data processing has become the main development direction of big data technology in the future. Real-time data processing will inevitably affect intelligent analysis models closely related to data. It can be said that in order to quickly identify and adapt to changes in the external environment, organizations have begun to combine real-time data processing capabilities with AI capabilities to achieve rapid delivery of intelligent data analysis services.

In fact, intelligent processing technology for real-time data streams has already gained a priori knowledge in many industries. For example, in the field of live broadcasting on the Internet, real-time filters and real-time special effects algorithms based on video streams have been widely used in many APP applications such as Fast Handers and Shakers, while overseas live broadcasting websites such as Twitch have also introduced AI plug-ins such as real-time game data analysis to enhance the live broadcasting effect. In the field of sports data, statistics and analysis of team and player data based on real-time match conditions and prediction of match condition trend are also applied at various sports data providers, such as Opta Sports, etc. In the field of transportation, the traffic congestion prediction system based on real-time traffic information has also been implemented. There are many examples of this kind, but they all reflect that real-time AI data processing has been widely used in different fields and different business scenarios, and has played an irreplaceable role.

In many scenes in the financial field, there are also many requirements for real-time AI data processing, such as real-time wind control, real-time data prediction, real-time anomaly detection, real-time user analysis, etc. The following figure is a data flow diagram of real-time product recommendation, which can be used in financial product recommendation scenarios, such as online loans, insurance, funds, stocks and other products.


This figure describes the following process: at the interactive end, we can obtain a large number of behavioral data of different users through embedding points. These data will be collected by the enterprise real-time data platform and provided to various models of the computing layer together with users, products and other data, such as user interest models, product portrait models, etc. These models characterize users and products, and finally provide the recommended models with the final recommended list after calculation, sorting and filtering. In this process, we can update and correct the user interest model according to the collected real-time user behavior data stream, thus realizing real-time tracking of the content of interest to the user.

One process not reflected in the above figure is the real-time updating of the product portrait model. Although the product feature data is relatively stable compared with the user’s behavior data, in practice there are still many products that require high timeliness, and their portrait features also require us to carry out real-time maintenance, such as the data information of the securities market. These product data streams can be collected through other channels into the enterprise real-time data platform, and provided to the product portrait model for product feature reconstruction, and finally provided to the recommendation model for product recommendation. A good real-time product recommendation system can sensitively capture the needs of users and respond to changes in products. It can effectively carry out personalized and accurate marketing for users, improve user experience, and at the same time increase the number of customers and customs clearance documents, thus generating great business value.

In the above figure, the enterprise real-time data platform plays an important role in providing real-time data for the recommendation model. In an agile data environment, agile big data platform can well support the above work. An implementation architecture is shown in the following figure:


In this figure, dbus and wormhole can easily interface with many different data sources, acquire data in real time, and realize real-time source of data pipeline. In addition, wormhole supports on-stream processing, which is very suitable for accessing the product portrait model and the user interest model to depict the features of the product and the user in real time. After these features are stored, they are extracted by moonbox according to the needs, input the recommendation model to obtain the required recommendation list, and finally return to the interactive end. In addition, with the support of davinci data BI, we can easily realize real-time business indicator monitoring, which is convenient for us to evaluate the recommendation effect. The whole process flexibly and conveniently integrates a variety of different open source platforms to quickly build real-time data applications. It can also switch open source selection at any time as needed to support fast iterative trial and error. Combining with the existing algorithm model, it can quickly support the scene of real-time recommendation of intelligent user products.

Iii. agile AI

As mentioned earlier, in the process of real-time AI data processing, various business components based on agile big data, combined with third-party open source components, can quickly arrange and quickly implement the underlying support architecture for algorithm operation through simple configuration. This makes the whole system seem the only trouble is that we have to develop various intelligent models in advance, which still has certain technical threshold for some business organizations. In addition, for some services, rapid promotion and cost control are the primary considerations. It is rather clumsy to customize the intelligent algorithm model and adjust the call interface so that it can be connected to the real-time data architecture. For example, many data analysis business personnel may not need too precise model performance, but it is best to ensure the convenience of analysis system implementation and the rapidity of business logic implementation.

We have already made data processing more agile, so how can we make data intelligence more agile? In order to solve this problem, we put forward the implementation idea of agile AI, that is, on the basis of existing agile big data products, we design and develop a series of pluggable real-time intelligent model operators based on business scenarios. These models cover the common intelligent data analysis requirements in business scenarios, have strong universality and reusability, can seamlessly access the real-time data stream on agile big data platform and output analysis results to the platform, flow into each business end in real time as required, and finally realize the intelligent analysis process based on real-time data stream. With the support of agile big data products and agile AI, business personnel can quickly construct the entire intelligent data governance process from real-time data processing platform to real-time data intelligent analysis and then to real-time data display according to business scenarios, and can flexibly adjust trial and error according to the effect, thus greatly reducing the implementation cost of real-time intelligent business analysis.

Under the above-mentioned implementation idea of agile AI, we set out to build an agile AI algorithm library, which is a set of lightweight general data model set based on business domain division. The design of each model should follow the following principles:

  • Lightweight, properly controlling the complexity of the model to ensure real-time data processing;
  • Independence, minimizing environmental dependence or ensuring the independence of environmental deployment, avoiding changes in environmental dependence brought to the whole system by model introduction;
  • Uniqueness, single function of each model as far as possible, to ensure the parallelism of each model function;
  • Data universality. Except for some necessary features in some models, each model should ensure its universal adaptability to access data. It can adapt to most business scenarios through certain configuration or mapping.

In order to achieve the above requirements, we will inevitably make some trade-offs in some aspects when developing the model. For example, if the model is to be used universally, it will lead to a certain degree of performance degradation. How to find a reasonable compromise among these contradictions is also a problem that needs to be considered in the design. At present, we have started to develop agile AI models for some fields. After actual testing and application, we will integrate them into the current agile big data product stack in the near future. In addition, we can also publish relevant interfaces and protocols in the future, so that users can also add their own models to the library.

IV. Conclusion

Intelligent analysis of real-time data is one of the important directions for the development of big data technology and artificial intelligence technology in the future. How to reduce the economic cost, time cost, technical cost and change cost of this implementation process is the key problem that agile big data and agile AI focus on solving. This paper puts forward a solution combining agile big data products, hoping that our products can help organizations to build their own real-time big data intelligent analysis systems conveniently, quickly and flexibly. Source: Yixin Institute of Technology

Author: Jing Yuxin

Yixin Institute of Technology