Java 中的正则表达式 - Java Regex 示例
欢迎来到 Java 中的正则表达式。它也被称为 Java 中的 Regex。当我开始编程时,Java 正则表达式对我来说是一场噩梦。本教程旨在帮助您掌握 Java 中的正则表达式。我还会回到这里复习我的 Java Regex 学习。
Java中的正则表达式
Java 中的正则表达式为字符串定义了一种模式。正则表达式可用于搜索、编辑或操作文本。正则表达式不特定于语言,但每种语言的正则表达式略有不同。Java 中的正则表达式与 Perl 最为相似。Java Regex 类存在于java.util.regex
包含三个类的包中:
- Pattern:
Pattern
对象是正则表达式的编译版本。Pattern 类没有任何公共构造函数,我们使用其公共静态方法compile
通过传递正则表达式参数来创建模式对象。 - Matcher:
Matcher
是 Java 正则表达式引擎对象,它使用创建的模式对象匹配输入的字符串模式。 Matcher 类没有任何公共构造函数,我们使用模式对象matcher
方法获取 Matcher 对象,该方法以输入字符串为参数。然后我们使用matches
根据输入字符串是否匹配正则表达式模式返回布尔结果的方法。 - PatternSyntaxException:
PatternSyntaxException
如果正则表达式语法不正确则抛出。
让我们看一下 Java Regex 示例程序。
package com.journaldev.util;
import java.util.regex.*;
public class PatternExample {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(".xx.");
Matcher matcher = pattern.matcher("MxxY");
System.out.println("Input String matches regex - "+matcher.matches());
// bad regular expression
pattern = Pattern.compile("*xx*");
}
}
当我们运行这个 java regex 示例程序时,我们得到以下输出。
Input String matches regex - true
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '*' near index 0
*xx*
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.sequence(Pattern.java:2090)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at com.journaldev.util.PatternExample.main(PatternExample.java:13)
由于 Java 正则表达式以字符串为中心,因此 Java 1.4 中扩展了 String 类,以提供执行matches
正则表达式模式匹配的方法。在内部,它使用Java 正Pattern
则Matcher
表达式类进行处理,但显然它减少了代码行数。Pattern
类还包含matches
以正则表达式和输入字符串作为参数并在匹配后返回布尔结果的方法。因此,下面的代码可以很好地将输入字符串与 Java 中的正则表达式进行匹配。
String str = "bbb";
System.out.println("Using String matches method: "+str.matches(".bb"));
System.out.println("Using Pattern matches method: "+Pattern.matches(".bb", str));
因此,如果您的要求只是检查输入字符串是否与模式匹配,则应使用简单的字符串匹配方法来节省时间和代码行数。仅当您需要操作输入字符串或需要重用模式时,才应使用 Pattern 和 Matches 类。请注意,正则表达式定义的模式从左到右应用于字符串,并且一旦源字符用于匹配,就不能重复使用。例如,正则表达式“121”将匹配“31212142121”,而“_121____121”的匹配次数仅为“31212142121”的两倍。
Java中的正则表达式-常见匹配符号
正则表达式 | 描述 | 例子 |
---|---|---|
。 | 匹配任意单个字符 | (“…”, “a%”) – true(“…”, “.a”) – true (“…”, “a”) – false |
^aaa | 匹配行首的 aaa 正则表达式 | (“^ac”, “abcd”) – 真 (“^a”, “ac”) – 假 |
AAA$ | Matches regex aaa at the end of the line | (“…cd$”, “abcd”) – true(“a$”, “a”) – true (“a$”, “aca”) – false |
[abc] | Can match any of the letter a, b or c. [] are known as character classes. | (“^[abc]d.”, “ad9”) – true(“[ab].d$”, “bad”) – true (“[ab]x”, “cx”) – false |
[abc][12] | Can match a, b or c followed by 1 or 2 | (“[ab][12].”, “a2#”) – true(“[ab]…[12]”, “acd2”) – true (“[ab][12]”, “c2”) – false |
[^abc] | When ^ is the first character in [], it negates the pattern, matches anything except a, b or c | (“[^ab][^12].”, “c3#”) – true(“[^ab]…[^12]”, “xcd3”) – true (“[^ab][^12]”, “c2”) – false |
[a-e1-8] | Matches ranges between a to e or 1 to 8 | (“[a-e1-3].”, “d#”) – true(“[a-e1-3]”, “2”) – true (“[a-e1-3]”, “f2”) – false |
xx | yy | Matches regex xx or yy |
Java Regex Metacharacters
We have some meta characters in Java regex, it’s like shortcodes for common matching patterns.
Regular Expression | Description |
---|---|
\d | Any digits, short of [0-9] |
\D | Any non-digit, short for [^0-9] |
\s | Any whitespace character, short for [\t\n\x0B\f\r] |
\S | Any non-whitespace character, short for [^\s] |
\w | Any word character, short for [a-zA-Z_0-9] |
\W | Any non-word character, short for [^\w] |
\b | A word boundary |
\B | A non word boundary |
There are two ways to use metacharacters as ordinary characters in regular expressions.
- Precede the metacharacter with a backslash (\).
- Keep metacharcter within \Q (which starts the quote) and \E (which ends it).
Regular Expression in Java - Quantifiers
Java Regex Quantifiers specify the number of occurrence of a character to match against.
Regular Expression | Description |
---|---|
x? | x occurs once or not at all |
X* | X occurs zero or more times |
X+ | X occurs one or more times |
X{n} | X occurs exactly n times |
X{n,} | X occurs n or more times |
X{n,m} | X occurs at least n times but not more than m times |
Java Regex Quantifiers can be used with character classes and capturing groups also. For example, [abc]+ means - a, b, or c - one or more times. (abc)+ means the group “abc” one more more times. We will discuss about Capturing Group now.
Regular Expression in Java - Capturing Groups
Regular Expression in Java Capturing groups is used to treat multiple characters as a single unit. You can create a group using ()
. The portion of input String that matches the capturing group is saved into memory and can be recalled using Backreference. You can use matcher.groupCount
method to find out the number of capturing groups in a java regex pattern. For example, ((a)(bc)) contains 3 capturing groups - ((a)(bc)), (a) and (bc) . You can use Backreference in the regular expression with a backslash (\) and then the number of the group to be recalled. Capturing groups and Backreferences can be confusing, so let’s understand this with an example.
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
In the first example, at runtime first capturing group is (\w\d) which evaluates to “a2” when matched with the input String “a2a2” and saved in memory. So \1 is referring to “a2” and hence it returns true. Due to the same reason the second statement prints false. Try to understand this scenario for statement 3 and 4 yourself. Now we will look at some important methods of Pattern and Matcher classes.
- We can create a Pattern object with flags. For example
Pattern.CASE_INSENSITIVE
enables case insensitive matching. - Pattern class also provides
split(String)
method that is similar to String classsplit()
method. - Pattern class
toString()
method returns the regular expression String from which this pattern was compiled. - Matcher classes have
start()
andend()
index methods that show precisely where the match was found in the input string. - Matcher class also provides String manipulation methods
replaceAll(String replacement)
andreplaceFirst(String replacement)
.
Let’s look at these java regex methods in a simple example program.
package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
public static void main(String[] args) {
// using pattern with flags
Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("ABcabdAb");
// using Matcher find(), group(), start() and end() methods
while (matcher.find()) {
System.out.println("Found the text \"" + matcher.group()
+ "\" starting at " + matcher.start()
+ " index and ending at index " + matcher.end());
}
// using Pattern split() method
pattern = Pattern.compile("\\W");
String[] words = pattern.split("one@two#three:four$five");
for (String s : words) {
System.out.println("Split using Pattern.split(): " + s);
}
// using Matcher.replaceFirst() and replaceAll() methods
pattern = Pattern.compile("1*2");
matcher = pattern.matcher("11234512678");
System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
}
}
The output of the above java regex example program is.
Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678
That’s all for Regular expressions in Java. Java Regex seems hard at first, but if you work with them for some time, it’s easy to learn and use.
You can checkout complete code and more regular expressions examples from our GitHub Repository.