In-depth understanding of HTTPS working principle

  Front end, https, java, javascript, Programmer

Preface

In recent years, great changes have taken place in the Internet, especially the HTTP protocol, which we have always been accustomed to, has gradually been replaced by HTTPS protocol. With the joint promotion of browsers, search engines, CA institutions and large Internet enterprises, the Internet has ushered in the “HTTPS encryption era”. HTTPS will completely replace HTTP as the mainstream of transmission protocols in the next few years.

After reading this article, I hope you can understand:

  • What is the problem with HTTP communication
  • How does HTTPS improve HTTP? what are the problems?
  • How does HTTPS work

If you want to read more excellent articles, please stamp them fiercely.GitHub blogFifty excellent articles a year are waiting for you!

First, what is HTTPS

HTTPS is a secure version of the HTTP protocol, which establishes an SSL encryption layer over HTTP and encrypts the transmitted data. Now it is widely used in secure and sensitive communications on the World Wide Web, such as transaction payment.

HTTPS is mainly used to:

(1) Encrypt data and establish an information security channel to ensure data security during transmission;

(2) Authenticate the website server.

We often use HTTPS to communicate on the Web’s login page and shopping settlement interface. When using HTTPS for communication, it is no longer usedhttp://Instead, usehttps://. In addition, when a browser accesses a Web site where HTTPS communication is valid, a locked tag will appear in the address bar of the browser. The way HTTPS is displayed will vary from browser to browser.

Second, why do you need HTTPS

There may be security problems such as information theft or identity disguise in HTTP protocol. Using HTTPS communication mechanism can effectively prevent these problems. Next, let’s take a look at the following
What are the problems with HTTP protocol:

  • Communication uses clear text (not encrypted) and the content may be eavesdropped.

Since HTTP itself does not have encryption function, it is impossible to encrypt the whole communication (the contents of requests and responses communicated using HTTP protocol). That is,HTTP messages are sent in clear text (referring to unencrypted messages).

The flaw of HTTP plaintext protocol is an important reason for data leakage, data tampering, traffic hijacking, phishing attacks and other security issues. The HTTP protocol cannot encrypt data, and all communication data is “streaked” in the network. Through network sniffing equipment and some technical means, HTTP message contents can be restored.

  • The integrity of the message cannot be proved, so it may be tampered with.

Integrity refers to the accuracy of information. If its integrity cannot be proved, it usually means that it is impossible to judge whether the information is accurate. Since the HTTP protocol cannot prove the integrity of the communication message, there is no way to learn even if the content of the request or response has been tampered with during the period from when the request or response is sent to when the other party receives it.
In other words,There is no way to confirm that the request/response sent and the request/response received are the same before and after..

  • The identity of the correspondent is not verified, so it is possible to encounter disguise.

Requests and responses in the HTTP protocol do not confirm the correspondent. In HTTP protocol communication, since there is no processing step to confirm the communication party, anyone can initiate the request. In addition, as long as the server receives the request, no matter who the other party is, it will return a response (but only if the IP address and port number of the sender are not restricted by the Web server)

The HTTP protocol cannot verify the identity of the communication party. Anyone can fake a fake server to cheat the user and realize “phishing fraud”, which the user cannot detect.

In contrast, HTTPS protocol has the following advantages over HTTP protocol (described in detail below):

  • Data privacy: content is encrypted symmetrically, and each connection generates a unique encryption key
  • Data Integrity: Content Transmission Passed Integrity Verification
  • Identity authentication: the third party cannot forge the identity of the server (client)

Three, HTTPS how to solve the above problems of HTTP?

HTTPS is not a new protocol in application layer. Only the HTTP communication interface is replaced by SSL and TLS protocols.

Generally, HTTP communicates directly with TCP. When SSL is used, it evolves to communicate with SSL first, then with SSL and TCP. In short,The so-called HTTPS is actually HTTP in the shell of SSL protocol.

After SSL is adopted, HTTP has the functions of HTTPS encryption, certificate and integrity protection. That is to sayHTTP is HTTPS after adding encryption, authentication and integrity protection..

The main functions of HTTPS protocol basically depend on TLS/SSL protocol. The realization of TLS/SSL mainly depends on three basic algorithms: hash function, symmetric encryption and asymmetric encryption.It uses asymmetric encryption to realize identity authentication and key negotiation. Symmetric encryption algorithm uses negotiated key to encrypt dat a and verifies the integrity of information based on hash function.

1. To solve the problem that the content may be eavesdropped-encryption

Method 1. Symmetric encryption

This way encryption and decryption use the same key. Both encryption and decryption use keys.Without the key, it is impossible to decrypt the password. On the other hand, anyone who holds the key can decrypt it..

When encrypting with symmetric encryption, the key must also be sent to the other party. But how can it be safely handed over? When forwarding the key on the Internet, if the communication is monitored, the key will fall into the hands of the attacker and lose the meaning of encryption. In addition, we must try to keep the received key safely.

Method 2. Asymmetric Encryption

Public key encryption uses a pair of asymmetric keys. One is called private key and the other is called public key. As the name implies,The private key cannot be known to anyone else, while the public key can be released freely and can be obtained by anyone..

The public key encryption method is used by the party sending the ciphertextThe other party’s public keyAfter receiving the encrypted information, the other party uses its own private key to decrypt it. In this way, there is no need to send the private key used for decryption, and there is no need to worry about the key being eavesdropped and stolen by the attacker.

Asymmetric encryption is characterized by one-to-many information transmission. The server can encrypt communication with multiple clients only by maintaining a private key.

This method has the following disadvantages:

  • The public key is public, so for the information encrypted by the private key, hackers can use the public key to decrypt and obtain the contents after interception;
  • The public key does not contain the information of the server. The use of asymmetric encryption algorithm cannot ensure the legitimacy of the server id entity, and there is a risk of man-in-the-middle attack. The public key sent by the server to the client may be intercepted and tampered with by the man-in-the-middle during the transmission process.
  • Use asymmetric encryptionIt takes some time to encrypt and decrypt the data.And the data transmission efficiency is reduced;

Method 3. Symmetric encryption+asymmetric encryption (HTTPS adopts this method)

The advantage of using symmetric keys is that decryption is faster. The advantage of using asymmetric keys is that the transmitted content cannot be cracked, because even if you intercept the data, you cannot crack the content without the corresponding private key. For example, you have a safe, but you cannot open it without the key to the safe. Then we will combine symmetric encryption and asymmetric encryption and make full use of their respective advantages.Asymmetric encryption is used in the key exchange phase, and symmetric encryption is used in the subsequent phase of establishing communication exchange messages..

Specifically:The party sending the ciphertext uses the other party’s public key to encrypt the “symmetric key”, and then the other party uses its own private key to decrypt and obtain the “symmetric key”, which can ensure that the exchanged key is secure, and uses the symmetric encryption method to communicate.. Therefore, HTTPS adopts a hybrid encryption mechanism of symmetric encryption and asymmetric encryption.

2. To solve the problem that messages may be tampered with-digital signature

In the process of network transmission, many intermediate nodes are needed. Although the data cannot be decrypted, it may be tampered with. How can the integrity of the data be verified? -verify digital signature.

Digital signatures have two functions

  • It can be confirmed that the message was actually signed and sent by the sender, because others cannot fake the sender’s signature.
  • Digital signature can confirm the integrity of the message and prove whether the data has not been tampered with.

How to Generate Digital Signature:

A piece of text is first generated with a Hash function to generate a message digest, and then encrypted with the sender’s private key to generate a digital signature, which is transmitted to the receiver together with the original text. The next step is for the receiver to verify the digital signature.

Verify Digital Signature Process

The receiver can decrypt the encrypted summary information only with the sender’s public key, and then generate a summary information for the received original text with HASH function, which is compared with the summary information obtained in the previous step. If it is the same, it means that the received information is complete and has not been modified during transmission. Otherwise, it means that the information has been modified, so the digital signature can verify the integrity of the information.

Let’s assume that message passing takes place between Kobe and James. James sends the message to Kobe together with the digital signature. Kobe can verify that the received message was sent by James by checking the digital signature after receiving the message. Of course, the premise of this process is Kobe knows James’ public key. The crux of the problem is that, like the message itself, the public key cannot be sent directly to Kobe in an insecure network, or how the public key obtained proves to be James’s.

This is the time to introduceCertificate Authority(Certificate Authority, CA for short), the number of CA is not large, Kobe client has built-in certificates of all trusted CA. CA digitally signs James’s public key (and other information) to generate a certificate.

3. Solve the problem that the identity of the correspondent may be disguised-digital certificate

The digital certificate authority is in the position of a third party that both the client and the server can trust.

Let’s introduce the business process of the digital certificate authority:

  • The server operator submits public key, organization information, personal information (domain name) and other information to the third-party organization CA and applies for authentication.
  • CA verifies the authenticity of the information provided by the applicant through online and offline means, such as whether the organization exists, whether the enterprise is legal, whether it owns the ownership of the domain name, etc.
  • If the information is approved, CA will issue the certification document-certificate to the applicant. The certificate contains the following information: the public key of the appl icant, the organization and personal information of the applicant, the information of the issuing authority CA, the effective time, the serial number of the certificate and other information in clear text, and also contains a signature. Among them, the signature generation algorithm is as follows: firstly, the hash function is used to calculate the information digest of the public plaintext information, then the private key of CA is used to encrypt the information digest, and the ciphertext is the signature;
  • When the Client client sends a request to the Server Server, the server returns the certificate file;
  • The Client reads the relevant plaintext information in the certificate, uses the same hash function to calculate the information digest, then decrypts the signature data using the public key of the corresponding CA, and compares the information digest of the certificate. If it is consistent, the legitimacy of the certificate can be confirmed, that is, the public key of the server is trustworthy.
  • The client will also verify the domain name information, valid time and other information related to the certificate; The client will build in the certificate information (including the public key) that trusts the CA. If the CA is not trusted, the certificate corresponding to the CA cannot be found and the certificate will be judged illegal.

Four, HTTPS workflow

1.Client Initiates an HTTPS (for examplehttps://juejin.im/user/5a9a9cdcf265da238b7d771cAccording to RFC2818, the Client knows that it needs to connect to Server’s 443 (default) port.

2.Server returns the pre-configured public key certificate to the client.

3.Client verifies the public key certificate: for example, whether it is within the validity period, whether the purpose of the certificate matches the site requested by Client, whether it is in the CRL revocation list, and whether its previous certificate is valid. This is a recursive process until the Root certificate (Root certificate built in the operating system or root certificate built in Client) is verified. If the verification passes, continue; if it fails, a warning message will be displayed.

4.Client uses pseudo-random number generator to generate the symmetric key used for encryption, and then encrypts the symmetric key with the public key of the certificate and sends it to Server.

5.Server uses its own private key to decrypt the message and obtain a symmetric key. At this point, both Client and Server hold the same symmetric key.

6.Server uses symmetric key to encrypt “plaintext A” and send it to Client.

7.Client decrypts the response ciphertext using the symmetric key to obtain “plaintext content A”.

8.Client initiates HTTPS request again, encrypts the requested “plaintext content B” with the symmetric key, and then Server decrypts the ciphertext with the symmetric key to obtain “plaintext content B”.

Five, the difference between HTTP and HTTPS

  • HTTP is a clear text transmission protocol. HTTPS is a network protocol constructed by SSL+HTTP protocol that can carry out encrypted transmission and identity authentication. it is safer than HTTP protocol.


With regard to safety, the simplest analogy to describe the relationship between the two is truck delivery. The truck under HTTP is open and the goods are exposed. Https, on the other hand, is a closed container truck, which naturally improves its safety.

  • HTTPS is safer than HTTP, is more friendly to search engines, and is beneficial to SEO. Google and Baidu give priority to indexing HTTPS web pages.
  • HTTPS requires SSL certificate, but HTTP does not.
  • HTTPS standard port 443, HTTP standard port 80;
  • HTTPS is based on the transport layer and HTTP is based on the application layer.
  • HTTPS displays a green security lock in the browser, but HTTP does not.

Six, why not all websites use HTTPS

Since HTTPS is so secure and reliable, why don’t all Web sites use HTTPS?

First of all, many people still feel that there is a threshold for HTTPS implementation. This threshold lies in the need for an SSL certificate issued by an authoritative CA. From certificate selection, purchase to deployment, the traditional mode will be more time-consuming and labor-consuming.

Secondly, HTTPS generally believes that the performance consumption is greater than HTTP becauseCompared with plain text communication, encrypted communication consumes more CPU and memory resources. If every communication is encrypted, it will consume a considerable amount of resources, and the number of requests that can be processed will surely decrease when it is spread evenly on one computer. However, this is not the case. Users can solve this problem by optimizing performance and deploying certificates in SLB or CDN. For example, during the “Double Eleven” period, Taobao and Tmall of HTTPS still ensured smooth and smooth operation of website and mobile terminals in accessing, browsing and trading. Through tests, it is found that the performance of many optimized pages is the same as that of HTTP or even slightly improved, so HTTPS is not slow after optimization.

In addition,One of the reasons is to save the cost of purchasing certificates.. Certificates are essential for HTTPS communication. The certificate used must be purchased from the certification authority (CA).

Finally, safety awareness. Compared with domestic, foreign Internet industry has relatively mature security awareness and technology application. HTTPS deployment trend is jointly promoted by society, enterprises and government.

To recommend a useful BUG monitoring toolFundebug, welcome to try free!

Welcome to pay attention to the public number:Front end craftsmanWe witness your growth together!

Reference articles and books