Java strings are separated by spaces, but do not contain spaces in the strings. How do you write this regular expression?

  java, question

Sample string

String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";

This is an access log, four of which are records, and the rest are one of the four categories. I hope the result after segmentation is as follows:

127.0.0.1
 -
 [05/Nov/2015:15:07:18 plus 0800]
 "GET /accounts/*** HTTP/1.1"
 200
 2426
 "-"
 "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
 0.031
 0.031
 "192.168.222.251"

My initial realization
Considering the complexity and unpredictability of the string contained in the string, I decided to get the contents of double quotation marks in the string with the following code:

public static void main(String[] args) {
        String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
        String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
        String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
        String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";
        Pattern p = Pattern.compile("\"[\\w\\s\\p{Punct}&&[^\"]]*\"");
        List<String> lines = new ArrayList<String>();
        lines.add(text1);
        lines.add(text2);
        lines.add(text3);
        lines.add(text4);
        for (String str : lines) {
            System.out.println("****************************************");
            Matcher matcher = p.matcher(str);
            while (matcher.find()) {
                System.out.println(matcher.group());
            }
        }
    }

The output is as follows:

****************************************
"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1"
"-"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
"192.168.222.251"
****************************************
"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1"
"-"
"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30"
"192.168.222.35"
****************************************
"GET /favicon.ico HTTP/1.1"
"-"
"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30"
"192.168.222.35"
****************************************
"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1"
"-"
"Mozilla/4.0"
"101.226.62.82"

In this way, I got the string, and the rest of the content was intercepted according to subString ().

Obviously this is not a best practice. Later, after my leader read it, he said there was no need to be so complicated, and in a few minutes he wrote me a new regular expression.

Implementation of Improvement

public static void main(String[] args) {
        String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
        String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
        String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
        String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";
        Pattern p = Pattern.compile(
                "^([\\d.]加) (\\S加) (\\S加) \\[(.加)\\] \"(GET|POST|DELETE|PUT|HEAD) (\\S加) (\\S加)\" (\\d加) (\\d加) \"(\\S加)\" \"(.加)\" (\\S加) (\\S加) \"([\\d.]加)\"");
        List<String> lines = new ArrayList<String>();
        lines.add(text1);
        lines.add(text2);
        lines.add(text3);
        lines.add(text4);
        for (String line : lines) {
            System.out.println("****************************************");
            Matcher matcher = p.matcher(line);
            if (matcher.find()) {
                System.out.print(matcher.group(4) 加 " ");
                System.out.print(matcher.group(5) 加 " ");
                System.out.print(matcher.group(6) 加 " ");
                System.out.print(matcher.group(8) 加 " ");
                System.out.println(matcher.group(14));
            }
        }
    }

Output results:

****************************************
05/Nov/2015:15:06:34 加0800 GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 200 192.168.222.251
****************************************
05/Nov/2015:15:24:40 加0800 GET /accounts/54fd0571e4b055a0030461fb 200 192.168.222.35
****************************************
05/Nov/2015:15:24:40 加0800 GET /favicon.ico 404 192.168.222.35
****************************************
05/Nov/2015:23:55:11 加0800 POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 200 101.226.62.82

Now you can take whatever part of the group results you want.