This article focuses on the application practice of NLP technology based on machine learning in various business fields within Yixin, and shares relevant experiences in this process, including the exploration of intelligent robots in business support and customer service, the construction of user portraits based on text semantic analysis, and the implementation ideas of NLP algorithm service platform. This is the background article, please read ~
Author: Jing Yuxin.He graduated from eecs with a doctor’s degree. His research interests include computer software and theory, logical reasoning, etc. He currently works in Yixin Technology Research and Development Center and is engaged in research on artificial intelligence, machine learning, natural language processing and knowledge engineering.
Yixin Company was established in Beijing in 2006. After 12 years of development, it has launched many related products around the two major business sectors of Puhui and Fortune, such as Yiren Loan, Pleasant Wealth, Sincere Use, Bo Cheng Insurance, etc.
In fact, behind these products, AI technology has been widely used in various related business lines.
According to the sub-fields of the financial field where Yixin is located, it can be divided into five aspects: intelligent transaction, intelligent credit, financial information, financial security and personalized service, each of which is assisted by relevant artificial technologies.
For example, in the field of intelligent trading, there are technologies such as intelligent investment and research, quantitative analysis and automatic/auxiliary trading. In the field of intelligent credit, there are related artificial intelligence products capable of identity identification, user portrait and intelligent wind control. In the field of financial information, we will carry out knowledge engineering, atlas analysis, intelligent question and answer, etc. In the field of financial security, anti-fraud analysis should be carried out. The personalized service field is more extensive. We have a series of mature AI products such as behavior analysis, intelligent marketing, recommendation and matching, and intelligent financial advisors.
Let’s continue to explore further. Behind these AI products, we will find some NLP(Natural Language Processing) technologies. For example, in the field of intelligent trading, we need to understand quite a lot of investment and research reports, and NLP technology in report understanding will be used here. In the field of intelligent credit, wind control reports may need to be generated and analyzed, and relevant NLP technologies also need to be used. In the field of knowledge engineering, financial information should be extracted from knowledge, or relationship extraction and event extraction should be carried out in order to build a knowledge map. Intelligent marketing and intelligent financial management consultants need the processing technologies of intelligent chat and speech extraction.
It can be said that NLP technology runs through AI products in various fields. The direct reason is that there are a large amount of natural language data in our business, such as electricity sales call data, customer analysis summary, customer service communication content, internal communication information and other various text reports, etc. These data are stored in natural language. Moreover, using natural language text to store these data has some advantages that other forms of data cannot match. as shown in fig. 1, natural language data have rich sources, various information expressions, complete information retention, and conform to user habits, etc.
Figure 1 Characteristics of Natural Language
However, we need to note that compared with these advantages, natural language data also has some disadvantages, such as unstructured data is not easy to handle, there are possible ambiguities, grammatical irregularities, unknown language phenomena, etc. In addition, these natural languages have some unique characteristics combined with Yixin’s business field: stronger vocabularies, wider data sources, various data forms (recorded data, text dialogue data, short/long text reports, summaries, etc.), larger data volume and uneven distribution.
These shortcomings make natural language data not easy to process and NLP technology is not easy to implement, but why is natural language data still getting more and more attention and NLP technology being implemented more and more widely?
In fact, in recent years, enterprises and organizations have begun to pay more and more attention to the high-value information contained in a large amount of unstructured data. We know that structured data is easier to handle, but after years of development, the information that can be mined from it is more and more limited. However, the unstructured data we come into contact with at ordinary times is several times more than structured data, which contains a lot of high-value information.
Typical unstructured data include pictures, videos, etc. Another important part is natural language text data. We can dig out a lot of valuable contents from these natural language texts. For example, we can get customer information, product data, public opinion tendency, strategy feedback, etc. from the above-mentioned Yixin natural language data.
In addition, natural language processing has brought us new ways of conversational interaction. More specifically, the interactive user interaction based on natural language understanding and natural language generation is more natural, efficient, attractive and more in line with user habits. This is what we call Conversational UI. New ways of interaction are increasingly applied in various fields. For example, the smart speaker Xiao Ai, whom we came into contact with, was very impressive.
Therefore, more and more businesses begin to pay attention to unstructured data and natural language data, which are valuable information of high order of magnitude. Some of its characteristics, some of its interaction methods and more expanded forms have led to more and more importance of natural language data and more and more necessity of NLP technology.
We make a simple positioning for NLP technology, that is, NLP technology undertakes the tasks of classification, extraction, conversion and generation of natural language data in the field, and is one of the important and basic technical services in the business field.
NLP Technology in Yixin
Yixin has rich business and product lines, which generate a large number of artificial intelligence enabling requirements. Since the establishment of the algorithm team, it has been facing a lot of project pressure. During the driving process of the whole project, the team also gradually grew up, combining with relevant business knowledge in the financial field, honing a series of skills from rule analysis to statistical algorithm, to more complex neural networks, and NLP domain expertise.
Fig. 2 correlation algorithm technology stack
Specifically, we have developed from being able to undertake basic processing tasks (part-of-speech analysis and syntactic analysis using some existing rule analysis and basic algorithm models) to being able to provide some model services such as text classification, text clustering, information extraction, etc. using relatively complex neural network models to the outside, and then to advanced scenarios such as intelligent chat robots (Chatbot), user portraits, knowledge engineering, etc. currently being implemented, the technology has also turned to models with increasingly stronger capabilities and more complex structures such as transformer, GAN, reinforcement learning, in-depth learning networks, etc. This process shows that the technology is continuously improving.
In addition to the continuous development of technology, we have also accumulated a number of valuable corpus. In terms of business links, we have accumulated data such as electricity sales data, customer service data, accompanying data and collection data. In the field of business, we have accumulated data on loan (car, house, consumption), financial management (investment, insurance, life, inheritance, public welfare), etc. In terms of data form, we have collected data in the form of dialogues (telephone, written communication information) and articles (summaries, news, reports).
This is a batch of valuable corpus information, which eventually forms the professional corpus data within the company, including the company’s product list, business glossary, business entity list and even the wealth product knowledge map in the wealth field. Our ultimate goal is to form a high-value, professional data set in the financial field after a certain expansion, abstraction and processing, so as to enable external output. For example, we can output financial glossary, synonym forest of financial terms, ontology related to financial field and knowledge base of various sub-fields.
In addition, we also have an evolutionary process in our service model. In the early days, we were a project-driven service model, and there were some common pains in this process:
- Numerous products and complicated business requirements;
- Different businesses are combined and the demand is constantly changing.
- Timeliness requirements, the sooner the better, the later on-line will affect the demand side;
- The R&D team has limited manpower and occasionally has to take into account environmental deployment, on-line model monitoring and maintenance, etc. The R&D team is busy and under great pressure during the whole process and has no time to effectively optimize the model.
So how to solve these pain points? After reflection, we took an important step, that is, service platform. Through the construction of a unified NLP model platform, to provide a unified NLP service, its advantages are:
- Reduce costs and improve efficiency;
- The models on the platform can be flexibly combined to quickly respond to customers’ needs.
- Relevant standards can be unified to facilitate centralized management of the model.
- Through platform service, our work has got rid of the original extensive service mode and improved the output ability of AI team.
Figure 3 Platformization of Services
Fig. 3 is a logical functional view of our platform, which is divided into resource layer, preprocessing layer, model layer and scene layer from bottom to top. The resource layer mainly includes a series of resources such as corpus, tags and pre-training models. The preprocessing layer includes some commonly used NLP technologies, such as word segmentation, part-of-speech analysis, syntactic analysis, topic analysis, named entity recognition, etc. The model layer includes some algorithm models that can provide services to the outside world, such as text clustering, classification, generation, retelling and other models, emotion analysis models, etc. The highest scene layer builds solutions for some advanced complex scenes that can form a certain closed loop capability. For example, for complex scenes such as intelligent robots and user portraits, we will form a packaged solution for users to use.
Figure 4 NLP Platform Architecture
Fig. 4 is the NLP platform engineering architecture. We have built a multi-task scheduling micro-service architecture that supports Python environment. From the figure, we can see that we have built the relevant data access layer and model algorithm layer by using Mongo, HDFS, ES, MQ and other systems. On the basis of these two layers, we manage the task scheduling of the algorithm model in the micro-service layer. Externally, we exposed the relevant Web interfaces and App interfaces. In addition, vertically, we have integrated some rights management and multi-tenant management functions, which can interface with single sign-on, identity authentication, rights control and other systems within the enterprise.
The practical background of NLP technology in Yixin is introduced here for the time being. Next, we will introduce two scenarios of NLP technology application in Yixin: intelligent chat robot and client portrait construction. Please look forward to ~