Jul 4, 2010

notes of regular expression in javascript

The simplest way to tell whether a regular expression is find in source string is to use the "test" method.


var reg = /a/;
var found = reg.test("abc");
console.log(found);

In lots of occasion, we use regular expression to test user's input, for example to test if a input is in date format. You need the "^" and "$" character to wrap the regular expression pattern.


To do a simple search in string, we can use string.match(regex) syntax. This is useful when we do want to whether a match or how many matchs can be found. If you just care about a first match, you will use non global regular expression. In this case, f a match is found, an array object will be return, the first element of the array is the entire match, the 1 to (length -1)th members of the array is the sub matches which are generated by the round bracket "()". The array or match object also has property "index" and "input". When a regular expression search is perform, the RegExp object also get updated.


var src = "Please send mail to george@contoso.com and someone@example.com. Thanks!";

// Create a regular expression to search for an e-mail address.
var re_non_global = /(\w+)@(\w+)\.(\w+)/;
var result = src.match(re_non_global);

for (var n in result)
{
  console.log(n + ":" + result[n]);
}
/*
0:george@contoso.com
1:george
2:contoso
3:com
index:20
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
*/

console.log("RegExp properties");
for(var n in RegExp)
{
  console.log(n + ":" + RegExp[n]);
}

/*
RegExp properties
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
multiline:false
lastMatch:george@contoso.com
lastParen:com
leftContext:Please send mail to
rightContext: and someone@example.com. Thanks!
$1:george
$2:contoso
$3:com
$4:
$5:
$6:
$7:
$8:
$9:
*/

If we care about more than the first match, we need to do a global search, we need global regular expression. When the match object return is also an array, sub-match is ignord. Each element in the array is a single match. The RegExp store the information of the last match.

var re_global = /(\w+)@(\w+)\.(\w+)/g;
// Because the global flag is included, the matches are in
// array elements 0 through n.
var result = src.match(re_global);
for (var n in result)
{
  console.log(n + ":" + result[n]);
}
/*
0:george@contoso.com
1:someone@example.com
*/

console.log("RegExp properties");
for(var n in RegExp)
{
  console.log(n + ":" + RegExp[n]);
}
/*
RegExp properties
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
multiline:false
lastMatch:someone@example.com
lastParen:com
leftContext:Please send mail to george@contoso.com and
rightContext:. Thanks!
$1:someone
$2:example
$3:com
$4:
$5:
$6:
$7:
$8:
$9:
*/


However string.match(regex) is less powerfull than regex.exec(string), which allow you exam each match object interactively, but to do this you need to turn on the global option of regular expression. Each time the exec method is called, it will continue from the position after the last match. Because of this, we can use while loop.


var src = "Please send mail to george@contoso.com and someone@example.com. Thanks!";
var re_global = /(\w+)@(\w+)\.(\w+)/g;

var match;
while(match = re_global.exec(src)){
  console.log("match is found");
//match is an array with two additional index, and input properties
//  for(var i=0, length = match.length; i >length; i++)
//  {
//    console.log(i + ":" + match[i]);   
//  } 
  
  for (var n in match) {
    console.log(n + ":" + match[n]);
  }

  console.log("RegExp properties");
  for(var n in RegExp)
  {
     console.log(n + ":" + RegExp[n]);
  }
}
​/*

match is found
0:george@contoso.com
1:george
2:contoso
3:com
index:20
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
  
RegExp properties
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
multiline:false
lastMatch:george@contoso.com
lastParen:com
leftContext:Please send mail to
rightContext: and someone@example.com. Thanks!
$1:george
$2:contoso
$3:com
$4:
$5:
$6:
$7:
$8:
$9:

match is found
0:someone@example.com
1:someone
2:example
3:com
index:43
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
  
RegExp properties
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
multiline:false
lastMatch:someone@example.com
lastParen:com
leftContext:Please send mail to george@contoso.com and
rightContext:. Thanks!
$1:someone
$2:example
$3:com
$4:
$5:
$6:
$7:
$8:
$9:
*/  



If the global option is not enabled for regular expression, each call to regex.match will start from the beginning of the test string, so that you can not use previous code to do a global search. The match is always the first match.



var src = "Please send mail to george@contoso.com and someone@example.com. Thanks!";

var re_non_global = /(\w+)@(\w+)\.(\w+)/;

var match = re_non_global.exec(src);

​for (var n in match) {
  console.log(n + ":" + match[n]);
}
/*
0:george@contoso.com
1:george
2:contoso
3:com
index:20
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
*/  
  
console.log("RegExp properties");
for(var n in RegExp)
{
   console.log(n + ":" + RegExp[n]);
}

/*
RegExp properties
input:Please send mail to george@contoso.com and someone@example.com. Thanks!
multiline:false
lastMatch:george@contoso.com
lastParen:com
leftContext:Please send mail to
rightContext: and someone@example.com. Thanks!
$1:george
$2:contoso
$3:com
$4:
$5:
$6:
$7:
$8:
$9:
​*/​;

If we want to replace match with our text, we can use str.replace(regexp|substr, newSubStr|function[, Non-standard flags]) method, we also make sure we turn on global option of the global expression, otherwise it will only replace the first match. We can use some special symbols inside newSubStr to do the replacing, for more See this. We can also use a function to return the string dynamically as replacement string. The function parameter is like the following, for more information see here.


//offset is the position of the match, source is 
function replacer($0, $1, $2, .. ,offset, source)
{ return your_new_string;}