Safety is a “bottomless pit”. No person in charge of safety in an enterprise will say that his system is 100% safe, and safety is not particularly good to measure and quantify, especially to quantitatively evaluate who is better than who and how much better. Sometimes I think, or feel confused, “with all these protective measures, can I withstand confrontation?” “The safety self-developed product was manufactured for half a year, used for half a year, and then one day it was abandoned,” and “SDL has been shouting for several years, why can’t it continue to operate?” “The business took the initiative to seek support, but we did not have nuclear weapons.” ……
This article will introduce the thoughts and achievements of different stages of Yixin’s safety construction, the challenges encountered in each stage, the pits stepped on, and the experiences gained, share the development of Yixin’s internal safety products, and explore the path of enterprise safety construction.
In 2013, the company officially began to carry out safety construction, investing resources to set up special safety teams and basic safety facilities. The safety construction of Yixin Company can be roughly divided into three stages from the beginning till now:
The period 2013 -2016 is in V1. The V1 phase mainly realized: basic security environment (such as firewall, zone isolation, host IDS, network IDS, network access, anti-virus, etc.); Before and after the listing, the company has gradually established and improved its information security system through compliance inspection, and has passed the third-class ISO27001 certification. In 2016, it established its own safety emergency response center.
The period 2016-2018 is in V2 phase. The main achievement of V2 phase is to improve the safety technology and the coverage of safety work. Participate in some work related to business security (account security, anti-crawler, SMS interface attack, man-machine identification, etc.); Identified SDL’s related processes (not specifically implemented, only one or two methods were used); Some security tools such as vulnerability scanning and GitHub monitoring have been developed.
The current phase is V3, and it should last for about 1-2 years in the future. At this stage, we began to seek to build security capabilities that are more consistent with the long-term development of the enterprise and more focused and in-depth at a certain point, such as the ability to operate safely and the ability to secure data.
Two, a few years ago
The above figure is a summary of the status and development of information security in the industry and some projects we have completed on security by 2016. At that stage, the work on network boundary, IT and other aspects laid a relatively solid foundation, including network access, terminal DLP, anti-virus and other implementation has been completed. The effect is particularly remarkable in basic security, especially terminal security, which ensures that the intranet and the office network are in a relatively safe environment, and can release more energy to do more meaningful things without dealing with such security incidents as Trojan horse, blackmail virus and even APT every day.
III. SDL Practice
3.1 SDL Process
At this stage, we mainly refer to and draw lessons from SDL process shared by the safety team of Proview, and select several key links that are applicable to our current situation, including training, safety coding standards, etc., for promotion in the company. Important project safety will also participate in safety requirements review, and has established a good cooperation relationship with business, product, technology and other teams.
It is worth mentioning that in enterprises, cooperation with safety can be divided into two types: one is “exemption from liability and throwing away the pot” and the other is “win-win cooperation”. The two types will find safety on the surface when confronted with safety-related matters, but the internal motives are quite different. The first is more for exemption. I have informed you of the matter, I have thrown the problem to you, and even I don’t know and care about why I want to find safety and find a safe solution to any demand problem. The rest is none of my business. If something goes wrong, I will take the responsibility for it. The second is more to seek safe cooperation. I know what security risks may exist and I have special concerns about the business security requirements. I need to cooperate with security to solve the product security problems and ensure the safety of the system and business after going online. The two teams promote and improve each other.
The two types are completely different in cooperation mode, daily interaction and final safety effect. There may be many reasons for this situation, including the comprehensive ability and quality of non-safety personnel, the effectiveness of safety training, and the professionalism and influence of the safety team (whether safety has really helped others to solve the pain points and whether both sides have carried guns together).
3.2 SDL Case
The above two diagrams show what we have done better. They can be called SDL or DevSecOps. They are embedded into the publishing process automatically and focus on solving the security problem of third-party components. They can not only quickly retrieve the third-party components contained in the published software products, but also define rules to directly block the components with serious vulnerabilities during the construction process. This part of the work can be fully automated, supporting Maven, Gradle, Docker, etc., and will not affect the ability of continuous delivery. Unified asset management, code base, software warehouse and CICD platform will be more convenient to implement and have the lowest maintenance cost. Of course, this cannot be separated from the ability support of DevOps team.
3.3 SDL Threat Modeling
This year, we also tried SDL threat modeling and designed the modeling rules suitable for us, including the data security requirements and audit requirements that are the focus of our attention. This part of the work is still in the stage of small-scale pilot and exploration. There are still many things to be solved and optimized from the process to the tools. When the actual situation is ripe, we will consider investing more safety testing and safety service personnel to promote it in the company on a large scale.
3.4 SDL White Box Scanning
In the aspect of white-box code audit, we have also invested a small amount of resources to try and encapsulate the code audit platform. The core relies on Sonarqube and Findbugs Security, and also supports writing rules by ourselves. We have implemented methods such as trigger scan, upload source code scan, and submit vulnerabilities automatically. However, the biggest consumption of this part is the operation of the rules and the elimination of false positives. At present, no better solution has been found. Hearing more ideas is also a simplification of the rules. At the beginning, “it is better to fail to report than to make false positives.” At present, the main usage methods are: the security personnel need to upload the source code to scan for temporary tasks, and scan and send test reports to some accessed projects every week.
3.5 SDL Passive Scanning
Another attempt by SDL is to perform passive scanning and playback based on the test environment traffic collected by the quicksand platform. The general idea has been shared by many people before. By replacing cookie and request-param, it is mainly used for testing and finding security problems such as unauthorized access, vertical ultra vires and horizontal ultra vires. The difficulty is also to optimize the rules and sort out and collect the commonalities of the company’s business (e.g. error page tips, etc.). This scan has found several high-risk problems before, and the input-output ratio is still quite high. However, in order to achieve a higher scanning accuracy, improve the degree of automation and achieve the effect of sustainable operation, it requires more manpower. Let’s look at the team’s choice. Recently, we have also seen some projects at home and abroad that use AI to improve safety testing efficiency and even replace people to carry out manual safety testing. We don’t judge whether we can land well in a short period of time, but we agree that “people are tired, but machines don’t”.
3.6 SDL Vulnerability Management
Vulnerability management mainly depends on insight platform, including management of application asset system, management of vulnerability life cycle and management of security knowledge base. The Insight platform opened source last year, with more users than we expected. Judging from the usual communication and consultation of the community WeChat group, the users are mostly 1-5 security teams, and there are many industries such as Internet, manufacturing and logistics. Every time someone adds us WeChat to look for help in the deployment configuration and function use of the platform, although it takes up some of our working time to answer or solve the problem (we will review the software quality problem), we are still very happy to be able to really help our security counterparts.
This matter also makes me some reflection:
First, many enterprises have limited investment in security and really need good open source solutions.
Second, the landing of the product requires some thinking of Party B. Sometimes, the product needs subtraction. Large and complete products are not necessarily required by everyone, and the premise for good use is good deployment and configuration.
Insight Open Source Address:https://github.com/creditease …
IV. Insight 2.0
This year we will open source insight version 2.0.
First, it will optimize the previous interactions, functions and business logic to improve ease of use.
Second, perfect the data of vulnerability operation and strengthen the report function to pay attention to the overall security situation.
The third and largest update combines the functions of SRC’s front and back offices, allowing enterprises to establish their own security emergency response centers on a customized basis, and unifies vulnerability management from various sources.
The above diagram shows a prototype diagram. In the process of development at present, the safety colleagues who need it can look forward to it.
Five, quicksand platform
In the past year, many other Party A teams have been discussing SOC and SIEM, including commercial secure big data products, Splunk Enterprise Security and ELK-based open source solutions. We chose the third one. At the present stage, we have achieved data collection, storage and some less complicated calculations.
Data comes from the traffic mirror of the switch, log files, syslog of each security device, etc. The architect designed and implemented a set of pre-processing program to perform data access configuration, filtering, formatting, assembly, marking, desensitization, etc. The core code part is written with go to improve processing performance.
The above figure shows the architecture of the whole “quicksand” platform, as well as hardware resources, data volume, writing speed, etc. With the data, in the application scenario, the current implementation includes asset discovery, weak password detection, information leakage discovery and so on, which can be realized based on simple rules without very complicated calculations.
Specific reference:Quicksand: Practice of Yixin Secure Data Platform
5.1 quicksand application: internal control
Based on quicksand security big data platform, how to meet more complex security analysis and correlation analysis scenarios is also the focus of our subsequent development.
The above figure is an upper-level application used to satisfy internal control. Colleagues also shared it on QCon, collecting the login, query operation, online behavior (custom rules), DNS, GitLab, WiKi, DLP alerts, etc. of the company’s internal business system in real time.
The first is to satisfy the operation behavior of the audit business operation system, such as who accessed which sensitive data at what time and make records for tracing.
The second analysis, such as a person’s operation is different from that of other personnel in the post, cluster and locate high-risk personnel and focus on them.
The above picture is the information we have compiled about the user’s assets.
VI. WAF Products Developed by Self
Gradually replace commercial WAF products,
- Has the traditional WEB security defense capability
- Have CC attack protection capability
- Have the ability to protect reptiles
- Have the ability to prevent information leakage
- Have the ability of data analysis and identification of abnormal traffic
6.1 pleasant shield
Let’s focus on our own WAF product: pleasant shield. it took about a year and a half before and after, and three major versions were iterated. eight security team members were put in charge of system design, development and protection rule collection, one operation and maintenance colleague was responsible for installation package production and deployment, and two test engineers assisted in stress testing.
We used commercial WAF equipment before and ranked first in Gartner quadrant. We have purchased about 10 units in recent years. The product itself is very good, and everyone uses it skillfully and stably. However, there are still some deficiencies:
- First, the product has strong protection against traditional rule-based malicious requests, but weak protection against crawler, which has a time window context, and turning on this part of the function will consume the overall hardware performance.
- Second, it needs to be wired into the network in the form of hardware. When it comes to implementing new services and new network areas, it needs to install new equipment and the implementation period is long.
- Third, the ability of horizontal expansion is not very strong. When a single point encounters a bottleneck, it can only choose to expand or split the flow, which is very dynamic.
To sum up, we have chosen to develop a pleasant shield on the premise of commercial products, which is in line with the trend of SDS software-defined security (thanks to the strong support of the company and leaders). Pleasant Shield is based on OpenResty expansion and is divided into three parts: gateway, big data analysis platform and operation background management end. All configurations are shared and read through Redis. Pleasant Shield has WEB protection, CC protection, blacklist protection, semantic identification protection, sensitive data protection and AI protection. Product design and development are based on commercial product standards: more than 100 basic rules are selected, custom rules can be added, black and white lists of rules are divided into global and domain names, and each protection switch of each domain name can be independently turned on, report analysis and query of each interception event are distinguished by domain name, polishing the usability and interaction of the product.
- Software definition, horizontal extension
- Fast access
Current progress and operation
- The iteration lasted for about one and a half years.
- At present, the entire line has been connected to the Yiren Loan.
- Peak flow of pressure test: 5000qps(2C8G)
At present, Yiren Shield has been fully connected to Yiren Loan. Because it is a gateway product and requires high performance and stability, it has done a lot of stress tests with the support of two testing colleagues. 2C8G virtual machine runs pleasant shield, QPS is around 5000, which can meet our requirements. At the same time, we have monitored every service (MQ, Flink, Counting Service, Redis, Full Walkthrough, etc.) and set up a function to check the system status in the operation background. We can see the domain name access status of each pleasant shield node and the node has an error alarm. In the inquiry of each protection event, we have also made more optimization to ensure that the inquiry can be carried out quickly even if there are more interceptions.
Using Machine Learning to Identify High and Low Frequency Crawlers
- URL accessed by serialization
- The visit route is formed according to time.
- Use the graph to extract the number of cycles of the ring.
- Clustering to find abnormal IP and SID
Identifying CC attacks and crawler behaviors that are not easily found by traditional rules is also the key goal of pleasant shield. Besides UA, IP blacklist and single interface access frequency judgment based on IPSID, we have also added algorithms to identify such abnormal access.
For example, let’s take a look at the “path clustering model” that we use. This part is implemented on the data analysis platform of pleasant shield: periodically extract the access of the previous time period, serialize the URL of the access, form the access path, extract the number of rings (loops, a single point is also a ring) with the graph, and cluster to find out the abnormal IP and SID.
For example, as marked in the above figure, the first IP access  this URL is 86 times, the second IP access [2821, 2832, 2827] is 14 times, and then another cycle is 36 times. Here, it is explained that the paths are sorted according to time, not according to Referer, the library NetworkX used for graph calculation, which can be understood by those interested. After going online, we found that bonus hunter’s behavior of climbing for financial articles and brushing the sign-in to refresh the manual met our expectations.
Seven, what is currently being done
These are some of the things we have done in the past two years and some of my own experiences:
- The project system is conducive to the iteration of safe development, making it clear that the output and target efficiency can be improved a lot.
- A bad plan is better than no plan. During the planning process, for example, brainstorming, everyone can contribute more innovations and ideas. The plan should also pay attention to risk points, such as how long the investment can last, whether the project will face the risk of being stopped, whether the core personnel are stable, whether the chosen architecture or development language is good at the team, and so on, so as to avoid the occurrence of unfinished projects.
- At the level of safety products, more and more Party B’s products are more and more in line with Party A’s actual needs, and the landing effect is getting better and better. Subdivided products can find more suitable solutions. Except for a few large factories, whether to consider some things again and again needs to be combined with short-term, medium-term and long-term changes to meet the long-term development needs of the enterprise as far as possible.
- In terms of security services, offensive and defensive confrontation, SDL and other aspects that are close to traditional security, many companies still have a lot to practice and optimize. Recently, we can also see a lot of discussions on this aspect, such as ATT&CK Matrix, such as Didi’s continuous construction of SDL.
This year, we have also focused on several projects, in addition to insight 2.0 mentioned above, more to contribute to the open source community. There are also two more important projects inside.
The first project, code-named “Super Scanner”, uses various means (including internal work order, CMDB, search engine, CMS fingerprint, etc.) to discover external assets, realize the monitoring of GitLab, hidden network, negative public opinion, and the important task of improving the efficiency of security testing and assisting SDL in promotion. It reuses the previously developed distributed security service orchestration service “sumeru” and crawler service. “Discover assets like hackers and integrate them into SDL” was the original intention of the project.
The second project, code-named “safety perception”, is the re-integration and expansion of quicksand, internal control audit and office network security system. Data security is increasingly mentioned separately, becoming one of the core issues of security. The “Data Security Law” has entered the legislative stage. No matter from security construction, compliance, or strategic development of enterprises, many enterprises leading the industry have already changed to “data-centric security strategy”. Therefore, the focus of this project is to focus on data security. In the application layer, attention should be paid to the use of data, to get through all kinds of information that is safe and available, and to conveniently configure correlation relations. Each type of business system or scenario can set up its own detection model, which can be regarded as an intelligent audit product. Of course, this is only one corner of the whole data security governance. Data security strategy, data security committee, data classification and classification, operation process, reward and punishment system, traditional database desensitization, data leakage prevention, data file ferry, data map, big data security, etc. are put together to form a complete block of data security. There are many and complicated things to do.
Author: Wang Zhe
First round: Yixin Security Emergency Response Center