[quicksand] practice of Yixin secure data platform

  Big data, Safety

Introduction: Yixin has realized a set of safe data platform-quicksand platform, which integrates collection, analysis and storage, in combination with its own actual situation. This article mainly introduces the framework of quicksand platform, what optimization and improvement have been made compared with OpenSOC, and the experience summary of quicksand platform during landing.

Preface

OpenSOC is a secure big data analysis framework presented by Cisco at BroCON Conference. It is a big data analysis framework for network packets and flows. It is a combination of big data analysis and security analysis technology. It can detect abnormal network conditions in real time and can expand many nodes. Its storage uses Hadoop, an open source project; its real-time index uses ElasticSearch, an open source project; and its online flow analysis uses Storm, a famous open source project.

Yixin has also realized a set of safe data platform-quicksand platform, which integrates collection, analysis and storage. This article mainly introduces the framework of quicksand platform, what optimization and improvement have been made compared with OpenSOC, and the experience summary of quicksand platform during landing.

I. quicksand platform architecture

The whole platform architecture is divided into several layers, including acquisition layer, pretreatment layer, analysis layer, storage layer and response layer. If necessary, kafka is used as a message queue for data transmission between layers to ensure reliable data during transmission.

1.1 acquisition layer

The acquisition layer is mainly used for data acquisition and then sends the acquired data to kafka. The data collected mainly include:

  • Traffic Data-Parse with packetbeat
  • Log data-logs in file form are collected using filebeat; Syslog is used to collect data in the form of syslog.
  • Operation and Maintenance Data-To facilitate troubleshooting and cluster performance monitoring, metricbeat is used to collect operation and maintenance data of cluster servers on quicksand platform

In the actual operation process, we found that packetbeat had some defects in the attack scenario and gave corresponding solutions, such as:

  • Compression of Web Pages Causes body to Confuse
  • If there is no content-type field, the package will not be unpacked.
  • Parameters located in the body section are added to the params field
  • Urlparse error due to connect request
  • Urlparse error caused by nonstandard url encoding field

1.2 pretreatment layer

The preprocessor of quicksand platform (hereinafter collectively referred to as “ybridge”) is based on a set of preprocessor framework developed by golang, which supports distribution. The user-defined input and output as well as the functions required for data can be realized by writing configuration. By writing plug-ins, each kind of data can be processed independently, which can meet various needs in actual use.

Ybridge has the following advantages:

  • Flexible function
  • High performance
  • No dependency
  • Support redundant deployment, high reliability
  • Supports docker/vm deployment and is easy to expand.
  • Ability to send operational data to metricbeat and monitor performance

Ybridge’s main work includes the following:

  • Gzip decoding is supported.
  • Data formatting
  • Field extension
  • Field extraction
  • Code sensitive fields
  • Journal intelligence
  • Data encryption and decryption
  • Delete useless data
  • data compression

1.3 analysis layer

As a big data analysis platform, data analysis is the core. Although the analysis can be implemented in kibana or by writing a program separately, this method needs to pull data from ES and then analyze it. On the one hand, timeliness will be poor, and on the other hand, too much reliance on ES clusters will lead to poor platform stability. For this reason, quicksand platform has implemented a set of analysis engines based on spark, which uses kafka as the data source and stores the analysis results in ES. The rules can be analyzed manually on the kibana and then applied to the analysis engine.

The functions implemented at the analysis layer include:

  • Asset discovery
  • Attack discovery
  • Information disclosure
  • Sources of Internal Threats
  • Business climate control

1.4 storage layer

The storage layer includES two es clusters (ES_all cluster and ES_out cluster) and one hbase cluster. The reason why two ES clusters are used is that they have different functions, thus avoiding the unavailability of the whole cluster due to one reason and improving the stability of the platform. ~ES_all cluster is used to store full amount of original data, convenient for manual analysis and traceability ES_all cluster is used to store full amount of original data, convenient for manual analysis and traceability ES_out cluster is used to store analyzed result data, convenient for program calling ~

ES stores short-term hot data while hbase stores long-term cold data.

Data is in time stamp unit in hbase, and a rowkey stores data for one second. Users can play back the cold data in hbase to kafka for a certain period of time through ybridge, and then carry out subsequent operations, such as analysis or traceability.

1.5 Response Layer

The response layer is used for analyzing and processing user data and responding. The response layer mainly includes:

Kibana: used for data search, monitoring display, attack tracing, etc.

Visualization of monitoring: The main current risks are presented on the large screen through icons, so that the security threats facing the enterprise can be more intuitively understood.

AlertAPI: After the problem is found through monitoring, a follow-up action, such as automatic response or alarm, is often required.

There are many ways to invoke subsequent actions, such as:

  • Write program analysis and alarm
  • Watcher Elastic Official Tools)
  • Elastalert (free of charge. Python-based alarm framework)

Subsequent actions may include:

  • Short message warning
  • Mail alert
  • Automatically intercept malicious IP
  • Secondary processing of data and re-writing of ES

Jupyterhub: Extract data from ES cluster and use python for offline data analysis, multi-person data analysis platform.

Second, compared with OpenSOC

OpenSOC also stores traffic data and log data. After data collection, it is sent to kafka. After formatting and field expansion by storm, it is written into hive, ES and HBase respectively. Finally, the data is analyzed by webservise or analysis tools. The quicksand platform architecture is basically the same as the OpenSOC, but there will be slight differences in several places shown in the above figure, which will be described separately below.

2.1 Data Acquisition Using beats

Beats are widely used in quicksand platform for data collection. Beats is officially produced by Elastic, with high community activity and excellent performance and function.

Packetbeat is used to analyze network data for network traffic, filebeat is used to collect log files, and metricbeat is used for system performance monitoring and ybridge performance monitoring. Beats have the following advantages:

  • High performance
  • Easy to use
  • Use the same technology stack as ybridge, which is easy to expand or transform functions.
  • Beats version updated synchronously with elasticsearch

2.2 treatment layer splitting

The real-time processing part of OpenSOC includes analysis, enrichment and analysis of many pairs of data. The quicksand platform divides this part into two layers according to its functions. The first part is the preprocessing layer and the second part is the analysis layer. The pre-processing layer uses a self-developed ybridge program, which mainly realizes ETL function and supports lateral expansion. Timeliness and processing speed can ensure that the results of the pre-processing layer can be directly stored or thrown to kafka for the next processing. The analysis layer has a set of analysis framework based on spark implementation for analysis. Compared with storm, although spark has less timeliness, spark can perform aggregation analysis better and is inherently compatible with machine learning and graph calculation, which is very suitable for data analysis, while storm has higher cost for implementing similar functions.

2.3 abandon hive

One piece of data in OpenSOC will be stored in hive, hbase and ES respectively, which will undoubtedly have huge input of storage resources. Considering the slow query speed of hive, and the data can be directly analyzed or extracted through ES before analysis, the quicksand platform does not use hive storage. Short-term data are directly searched in ES, and long-term data are taken out of hbase before being used. Such a framework would be more suitable for small and medium-sized enterprises, striking a balance between functions and resources.

2.4 Different data playback methods

OpenSOC stores pcap files in hbase, while quicksand platform stores json formatted data after preprocessing, which is more convenient and saves more storage space when playing back data.

OpenSOC plays back data via webserver, while quicksand platform plays back data via ybridge.

2.5 Closely Combining with Threat Intelligence

OpenSOC enriches data in the real-time processing layer, and quicksand platform is also connected with threat intelligence. Threat intelligence plays a more and more important role in enterprise security, which can help enterprises find potential security problems.

For quicksand platform, on the one hand, internal alarms are converted into internal intelligence; on the other hand, internal intelligence and external intelligence are combined to form Yixin’s unique threat intelligence, which is fed back to logs and traffic to help enterprise security analysts make data analysis and decision more conveniently.

III. Experience in Landing

3.1 Platform High Availability

The most important thing for quicksand platform is to provide stable and reliable data service, therefore, high availability of platform is very important. First of all, the whole platform is deployed redundantly except beats and kibana, and even the log receiving server of quicksand platform adopts a double-live deployment mode. Secondly, the preprocessor can start and stop at any time to realize the smooth upgrade of the program without the user’s perception. Finally, in order to ensure stability, quicksand platform has added a large number of monitoring alarms, such as:

  • ES cluster anomaly monitoring
  • Data loss monitoring
  • Packet loss rate monitoring
  • Monitoring of Service Survival Status

3.2 How to Solve Packet Loss Problem

When analyzing network traffic, the biggest problem is packet loss. The first thing to do is to find the packet loss. It can send 100 designated UA packets on a regular basis, and then count the number of packets received at ES end. If the number of packets lost exceeds a certain value, an alarm will be given.

There may be many reasons for packet loss, here we mainly analyze the situation of software packet loss. There are also many solutions. Generally speaking, there are the following ideas:

  • Improve parsing efficiency (such as using pf_ring or DPDK)
  • Hardware split
  • Software distribution
  • Careful selection of mirror access points and no suggestion of accessing core data can greatly reduce the amount of analyzed data.

When choosing a solution, it is suggested that the first one is the most efficient according to the actual situation of the enterprise. However, pf_ring is a charging software, while DPDK often needs to be developed, which requires some cost. In addition, it can also be solved through hardware distribution or software distribution. Hardware shunting is recommended because it is simpler and more reliable.

3.3 Centralized Management of Services

In order to land this platform, a large number of servers are needed, and different programs are run on different servers. Therefore, a centralized management method is needed. For this purpose, quicksand platform uses the following two tools:

  • Gosuv:gosuv is a distributed supervisor framework written by go, which can remotely manage programs on the server through web pages.
  • Consul:consul is a distributed micro-service framework, which realizes service registration, service discovery, unified configuration management and other functions.

Gosuv+consul can easily realize centralized management of programs and program status monitoring.

Summary

Data is the basis of security analysis. With data, threat intelligence, situational awareness, hacker profiles, business climate control, attack traceability, attack identification and asset discovery are all not out of reach. The quicksand platform combined with the actual scene of Yixin, from the comprehensive consideration of efficiency, cost and function, made some improvements to the OpenSOC and landed it.

Through the quicksand platform, the security staff can concentrate most of their energy on data analysis, making up for the shortage of commercial security products and better helping the safety protection students to fully understand the safety status of the enterprise. The quicksand platform is not only a data platform, but also an important supplement to Yixin’s existing security measures.

Author: Safe Development of Gao Yang

First round: Yixin Security Emergency Response Center