You don’t know regular expressions yet? Look at this!

banner

Regular expression is a weak skill of many programmers, even some developers with many years of experience. Everyone often finds regular expressions difficult to remember, learn and use, but there is no denying the fact that regular expressions are a very important skill. I will sort out the key points in learning and using regular expressions as follows for your reference.

There are some differences in the way regular expressions are written in different languages. This article will use the syntax in Javascript.

What is a regular expression?

Regular Expression or Regex is a combination of characters used to define a specific search pattern. Regular expressions can be used to match, find and replace characters in text, verify input data, find spelling errors of English words, etc.

debugging tool

Here are some excellent online debugging tools that you may need if you want to create or debug regular expressions. Personal preferenceRegex101Regex101 supports switching between different flavor of regular expressions, interpreting your regular expressions, displaying matching information, providing common grammatical references, etc. It is very powerful.

regex101

Regexr

Regexpal

Start

In Javascript, a regular expression follows/The beginning and the end, so simple to/hello regexp/Is a regular expression.

Flags

Flags is written at the end/After that, the matching behavior of the entire regular expression can be affected. Common flags are:

  1. g: global); matching; Regular expressions only return the first matching result by default, using identifiers.gAll matches can be returned
  2. i: Ignore case-insensitive);; Ignore the case of English letters when matching
  3. m: multiline); matching; Consider the start and end characters (and $) as working on multiple lines, i.e. matching each line separately (by\nOr\rSplit) instead of just matching the beginning and end of the entire input string

Flags can be used in combination, such as:

flags combination

Character Sets

Used to match any character in the character set. Common character sets are:

  1. [xyz]: match"x"Or"y"`”z”`
  2. [^xyz]: complement set, match divide"x" "y" "z"Other characters of
  3. [a-z]: match from"a"to"z"Any character of
  4. [^a-n]: complement set, match divide"a"to"n"Other characters of
  5. [A-Z]: match from"A"to"Z"Any character of
  6. [0-9]: match from"0"to"9"Any number of

For example, matching all letters and numbers can be written as:/[a-zA-Z0-9]/Or ../[a-z0-9]/i.

Quantifiers

In actual use, we often need to match the same type of characters many times, such as matching the 11-digit mobile phone number, we can’t[0-9]Write 11 times, at this time we can use Quantifiers to achieve repeated matching.

  1. {n}: matchnnext
  2. {n,m}: matchn-mnext
  3. {n,}: match>=nnext
  4. ?: match0 || 1next
  5. *: match>=0Second, equivalent to{0,}
  6. +: match>=1Second, equivalent to{1,}

Metacharacters

There are some letters with special meanings in regular expressions, which are called metacharacters. In short, metacharacters are characters that describe characters. They are used to describe the content, conversion and various operation information of character expressions.

Common metacharacters are:

  1. \d: Matches any number, equivalent to[0-9]
  2. \D: Match any non-numeric character;\dThe complement of
  3. \w: Matches letters and numbers in any basic Latin alphabet and underscores; Equivalent to[A-Za-z0-9_]
  4. \W: Matches letters and numbers in any non-basic Latin alphabet and underscores;\wThe complement of
  5. \s: Matches a blank character, including spaces, tabs, page breaks, line breaks, and other Unicode spaces
  6. \S: matches a non-blank character;\sThe complement of
  7. \b: Match a zero-width word boundary, such as between a letter and a space; For example,/\bno/Matching"at noon"hit the target"no",/ly\b/Matching"possibly yesterday."hit the target"ly"
  8. \B: Match a zero-width non-word boundary, such as between two letters or between two spaces; For example,/\Bon/Matching"at noon"hit the target"on",/ye\B/Matching"possibly yesterday."hit the target"ye"
  9. \t: Matches a horizontal tab
  10. \n: Matches a newline
  11. \r: Matches a carriage return

Special Characters

There are some special characters in regularization, which will not be matched according to the literal meaning, but have special meaning, such as those used for quantifiers mentioned earlier.?*+. Other common special characters are:

  1. \: Escape character to convert ordinary characters into special characters. such as\w; You can also convert special characters into literal meaning, such as\+Matching"+"
  2. .: Matches any single character except line breaks:\n,\r,\u2028Or\u2029; In a character set ([.]), no special meaning, that is to say'.'The literal meaning of
  3. |: alternate character, match|Expression before or after. For example, simultaneous matching is required"bear"And"pear", you can use the/(b|p)ear/Or ../bear|pear/; But it can’t be used/b|pear/, the expression can only match"b"And"pear"
  4. ^: matches the start of the input. For example,/^A/mismatching"an Apple"hit the target"A", but match"An apple"hit the target"A"
  5. $: matches the end of the input. For example,/t$/mismatching"eater"hit the target"t", but match"eat"hit the target"t".^And$It is often used in form validation because a complete input from the beginning to the end needs to be validated instead of matching a certain segment of the input.

Groups

  1. (xyz): Capturing Group to match and capture matches; For example,/(foo)/Match and capture"foo bar."hit the target"foo". The matched substrings can be found in the elements [1], …, [n] of the result array, or in the attributes $1, …, $9 of the defined RegExp object.
  2. (? :xyz): Non-capturing Group that matches but does not capture matches; Matches cannot be accessed again
  3. \nnIs a positive integer that represents a back reference and points to the matching substring in the nth bracket (number from left) in the regular expression; For example,/apple(,)\sorange\1/Matching"apple, orange, cherry, peach."hit the target"apple,orange,"

Assertion

  1. x(? =y): only matches areyFollowingx; For example,/bruce(? =wayne)/If"bruce"Followed bywayne, then match it./bruce(? =wayne|banner)/If"bruce"Followed by"wayne"Or ..banner, then match it. But,"wayne"And"banner"Will not appear in the matching results
  2. x(? ! y): only matches are notyFollowingx; For example,/\d+(? ! \.)/Only matches are not matched"."The number to follow.

/\d+(? ! \.)/.exec('3.141')Matching"141"Instead of"3.141"

Application

The grammar and rules of so many regular expressions listed above can help us analyze and understand the function of a regular expression to a certain extent, but how to combine these rules and create expressions with specific functions still requires us to practice more. Here are a few examples to illustrate the application of these rules.

1. Match the mobile phone number

Let’s start with a relatively simple match of mobile phone numbers. At present, the domestic mobile phone number is1(3/4/5/7/8)The first 11 digits, so the regularity of mobile phone numbers can be broken down into the following parts:

  1. In order to1Beginning:/^1/
  2. The second digit is3、4、5、7、8One of them:/[34578]/Or/(3|4|5|7|8)/
  3. The remaining 3-11 bits are all numbers and end with numbers:/\d{9}$/

The combination is/^1[34578]\d{9}$/Or/^1(3|4|5|7|8)\d{9}$/Because of the performance loss caused by using capture brackets, the first writing method is recommended.

2. Match email

The standard email composition is<yourname>@<domain>.<extension><optional-extension>,

The format standard of each part is (simplified correspondingly, mainly to show how to write regularly):

  1. Yourname: any English letter (a-z/A-Z), number (0-9), underscore (_), period (.), hyphen (-), length greater than 0
  2. Domain: any English letter (a-z/A-Z), number (0-9), hyphen (-), length greater than 0
  3. Extension: any English letter (a-z/A-Z), length 2-8
  4. optional-extension:"."Beginning, followed by any English letter (a-z/A-Z), length 2-8, optional

The regular expression for each part is:

  1. yourname:/[a-z\d._-]+/
  2. domain:/[a-z\d-]+/
  3. extension:/[a-z]{2,8}/
  4. optional-extension:/(\.[a-z]{2,8})? /

Combine to form the final regular expression:/^([a-z\d._-]+)@([a-z\d-]+)\.([a-z]{2,8})(\.[a-z]{2,8})? $/; In order to increase readability, each part can be used"()"Wrap it up and don’t forget the start and end characters.^$.

Conclusion

Today, the popularization of regular expressions will come here first. I hope it will be helpful for you to write regular expressions in the future. For knowledge not covered in this article, please refer to the following links:

  1. Wikipedia – Regular Expression
  2. MDN – Regular Expression
  3. Microsoft – Regular Expression Reference
  4. W3schools – Regexp

My blog will soon be synchronized to tengxunyun+community, inviting everyone to join us:https://cloud.tencent.com/dev …