Sample string
String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";
This is an access log, four of which are records, and the rest are one of the four categories. I hope the result after segmentation is as follows:
127.0.0.1
-
[05/Nov/2015:15:07:18 plus 0800]
"GET /accounts/*** HTTP/1.1"
200
2426
"-"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
0.031
0.031
"192.168.222.251"
My initial realization
Considering the complexity and unpredictability of the string contained in the string, I decided to get the contents of double quotation marks in the string with the following code:public static void main(String[] args) { String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\""; String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\""; String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\""; String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\""; Pattern p = Pattern.compile("\"[\\w\\s\\p{Punct}&&[^\"]]*\""); List<String> lines = new ArrayList<String>(); lines.add(text1); lines.add(text2); lines.add(text3); lines.add(text4); for (String str : lines) { System.out.println("****************************************"); Matcher matcher = p.matcher(str); while (matcher.find()) { System.out.println(matcher.group()); } } }
The output is as follows:
**************************************** "GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1" "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" "192.168.222.251" **************************************** "GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1" "-" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30" "192.168.222.35" **************************************** "GET /favicon.ico HTTP/1.1" "-" "Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30" "192.168.222.35" **************************************** "POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1" "-" "Mozilla/4.0" "101.226.62.82"
In this way, I got the string, and the rest of the content was intercepted according to subString ().
Obviously this is not a best practice. Later, after my leader read it, he said there was no need to be so complicated, and in a few minutes he wrote me a new regular expression.
Implementation of Improvement
public static void main(String[] args) { String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 加0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\""; String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\""; String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 加0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\""; String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 加0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\""; Pattern p = Pattern.compile( "^([\\d.]加) (\\S加) (\\S加) \\[(.加)\\] \"(GET|POST|DELETE|PUT|HEAD) (\\S加) (\\S加)\" (\\d加) (\\d加) \"(\\S加)\" \"(.加)\" (\\S加) (\\S加) \"([\\d.]加)\""); List<String> lines = new ArrayList<String>(); lines.add(text1); lines.add(text2); lines.add(text3); lines.add(text4); for (String line : lines) { System.out.println("****************************************"); Matcher matcher = p.matcher(line); if (matcher.find()) { System.out.print(matcher.group(4) 加 " "); System.out.print(matcher.group(5) 加 " "); System.out.print(matcher.group(6) 加 " "); System.out.print(matcher.group(8) 加 " "); System.out.println(matcher.group(14)); } } }
Output results:
**************************************** 05/Nov/2015:15:06:34 加0800 GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 200 192.168.222.251 **************************************** 05/Nov/2015:15:24:40 加0800 GET /accounts/54fd0571e4b055a0030461fb 200 192.168.222.35 **************************************** 05/Nov/2015:15:24:40 加0800 GET /favicon.ico 404 192.168.222.35 **************************************** 05/Nov/2015:23:55:11 加0800 POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0×tamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 200 101.226.62.82
Now you can take whatever part of the group results you want.