Leetcode — 10. Regular Expression Matching

Anj
7 min readSep 11, 2022

The link to the problem: https://leetcode.com/problems/regular-expression-matching/

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*' where:

  • '.' Matches any single character.​​​​
  • '*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1.
Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".
--------------------------------------------------------------------Example 2.
Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Thoughts:

I think this question is very hard to solve if this is your first time to solve this problem. It also took me a while to understand how to do it. My first thought is going to try to match the requested string s and the pattern one by one. However, asterisks can represent 0 or more preceding elements. We cannot just return false if we found the current substring of s cannot implement the substring of p. Since it is possible if we add other characters to s or p, it actually can implement the pattern. There are too many possible different splits and it’s impossible to consider all without using any “helper tools”.

Let’s see if we can split the strings correctly and make it easier.

If the input string s really can implement the pattern string p, it means if we cut s and p at the correct position, the substring of s should be able to also implement the substring of p. Otherwise it is impossible that the whole string s can implement the pattern. Hence we can think this way — we can know whether string s can implement the pattern p from their substrings. If we cannot find any substring of s that can implement any substring of p, then we know the whole string cannot implement the given pattern.

We can also rephrase it to — whether the substring of s, s’, can implement the substring pattern p’ depends on whether the substring before s’ can implement p’ or the substring before p’. If not, then it’s impossible.

Wait, isn’t this type of “using the past knowledge to decide the current result” logic sounds a bit familiar? Yes, we need to use dynamic programming to solve this problem. When we are splitting string s and p into substrings, their length won’t be the same. For example when s = “aaa”, p = “a*” OR when s = “a”, p = “a*”, the length of s can be longer or shorter than the length of the pattern p. Since we need to know which part of the substring in s can implement to which part of the substring in the pattern, we use a 2D array to record their positions. We will let dp[i][j] represent whether the top i characters in string s can implement the top j characters in string p.

For example when s = ”aaabcde”, p = “a*bvc”. dp[2][3] represents whether ”aa” can implement the pattern “a*b”.

Let’s see how we can decide the value of dp[i][j]:

⚠️We will let s[0, i-1] represents the top i characters in string s and p[0, j-1] represents top j characters in the pattern p. (1≤ i ≤length of s, 1≤ j≤length of p)

Inevitable, our first thing to do is to check the jth character in p, curP. We can list 3 different cases of the value of curP. (We assume curS is the ith character in s.)

1. If curP is an alphabet, then the situation is simple. We only need to compare whether curS is the same alphabet as curP. If not, then we know s[0, i-1] cannot implement pattern p[0, j-1]. If curS is the same as curP, then we need to check whether the substring right before curS can implement the substring right before curP. (i.e., check if s[0, i-2] can use the pattern of p[0, j-2]). Since we always store the result in the 2D array, we will decide the value of dp[i][j] depends on dp[i-1][j-1] if curS is the same as curP.

👷For example when s[0, i-1] = "ac", p[0, j-1] = "oac". Although curS has the same alphabet 'c' as curP, s[0,i-2] ("a") cannot use the pattern p[0, j-2] ("oa"). Hence, the value of dp[i][j] is false

2. If curP is a dot ‘.’, then the situation is also simple. Since dot matches any single character, as long as curS is not empty, we can assume curS is the same character as curP. Hence just as we did in the above bullet point, we will decide the value of dp[i][j] depends on dp[i-1][j-1].

3. If curP is an asterisk ‘*’, then there are more things to consider. Since an asterisk is always coming with an alphabet or a dot, we need to pull out the previous character before curP. Let’s named it prevP.

The first step is to compare curS and prevP. If they are considered to be the same (either curS equals prevP or prevP is a dot), we will need to again “use the past knowledge”. However, this time we are going to refer to the value of dp[i-1][j], not dp[i-1][j-1]. (If we refer to dp[i-1][j-1], means we only allow s[0, i-2] only contain one alphabet of prevP at the end of the substring, and this is not what we want when we have an asterisk.) When s[0, i-2] can implement the current pattern p[0, j-1], it means we can append one or more prevP right after s[0, i-2] and the “new string” will still be able to implement the current pattern.

👷 For example when s[0, i-1] = “accc”, p[0, j-1] = “ac*”. 
Since curS and prevP both have the same alphabet ’c’, we want to know whether “acc” can implement the pattern “ac*”.
If yes,then we can append curS after “acc” and we know that the new formed string will still follow the pattern. We can add as many as character c at the end of "acc" and the strings will always match the pattern.
(However, if we check dp[i-1][j-1], i.e., whether "acc" can implement "ac", then we will get a wrong answer.)

Don’t forget the rule of the asterisk — it can represent ZERO or more of the preceding element. Currently, we only consider the cases when curP and prevP have the same alphabet. However, it is possible that there’s no prevP in string s.

Hence, dp[i][j] can be true in 2 different cases:
1. When curS and prevP are considered to be the same character AND if s[0, i-2] can implement p[0, j-1] (i.e., when dp[i-1][j] is true) — when we consider prevP can appear one or more in s.

2. if s[0, i-1] can implement p[0, j-3] (i.e., when dp[i][j-2] is true) — when we ignore prevP and its following asterisk.

👷 For example when s[0, i-1] = “ka”, p[0, j-1] = “ka*c*”. 
Although curS has different alphabet with prevP, don't set dp[i][j] so soon! Let's see if we can ignore c* and let them match. Hence, we need to check if "ka" matches the pattern "ka*". The answer is yes, and we will append "c*" into p[0, j-3] and know that s[0,i-1] will still be able to implement the "new pattern".

Code Logic:

In each iteration of the loop, we will get a string s’ which is also a substring of input s from index 0 to the current position. What are we going to do in this iteration is to keep finding the substring of input p (i.e., the pattern string) that always starts from index 0 to each different position and try to see if s’ can implement the pattern of the substring p’. Also, record these results into the 2D array.

Actually, the whole logic is already mentioned in the above paragraph, however, it is important to know how to set the “initial state” of the dp array:

  1. The default value of the whole array is false since we haven’t checked anything yet.
  2. An empty string should be able to implement an empty pattern string. Hence, dp[0][0] is true.
  3. When the length of the requested string s’ is 0, there’s only one situation that s’ can implement a non-empty pattern p’ — when all the characters (either alphabet or a dot) should always being followed by an asterisk, like “a*b*.*c*”. Hence, we will need an extra for loop to check the status of dp[0][j].
  4. When the length of the pattern string p’ is zero, as long as the requested string is not zero, it should be false.

Below is the 1ms code that beats 100% —

public boolean isMatch(String s, String p) {
int m = s.length(), n = p.length();
//dp[i][j] means if s.substr(0,i) and p.substr(0,j) matches
boolean[][] dp = new boolean[m+1][n+1];
dp[0][0] = true;
for(int j=2;j<=n;j++)
dp[0][j] = p.charAt(j-1)=='*' && dp[0][j-2];
for(int i=1;i<=m;i++)
{
char curS = s.charAt(i-1);
for(int j=1;j<=n;j++)
{
char curP = p.charAt(j-1);
if(curS==curP || curP =='.')
dp[i][j] = dp[i-1][j-1];
else if(curP=='*')
{
if(dp[i][j-2])//totally ignore prevP and curP
dp[i][j] = true;
else
{
//need to include prevP
char prevP = p.charAt(j-2);
if(prevP==curS || prevP =='.')
dp[i][j] = dp[i-1][j];
}
}
}
}

return dp[m][n];
}

--

--