10.2 String Methods for Pattern Matching
Until now,
we've been discussing the grammar used to create regular
expressions, but we haven't examined how those regular
expressions can actually be used in JavaScript code. In this section,
we discuss methods of the String object that use regular expressions
to perform pattern matching and search-and-replace operations. In the
sections that follow this one, we'll continue the discussion of
pattern matching with JavaScript regular expressions by discussing
the RegExp object and its methods and properties. Note that the
discussion that follows is merely an overview of the various methods
and properties related to regular expressions. As usual, complete
details can be found in the core reference section of this book.
Strings support four methods that make use of regular expressions.
The simplest is search(
). This method takes a regular expression
argument and returns either the character position of the start of
the first matching substring, or -1 if there is no match. For
example, the following call returns 4:
"JavaScript".search(/script/i);
If the argument to search( ) is not a regular
expression, it is first converted to one by passing it to the
RegExp constructor. search( )
does not support global searches -- it ignores the
g flag of its regular expression argument.
The replace( ) method performs a
search-and-replace operation. It takes
a regular expression as its first argument and a replacement string
as its second argument. It searches the string on which it is called
for matches with the specified pattern. If the regular expression has
the g flag set, the replace( )
method replaces all matches in the string with the replacement
string; otherwise, it replaces only the first match it finds. If the
first argument to replace( ) is a string rather
than a regular expression, the method searches for that string
literally rather than converting it to a regular expression with the
RegExp( ) constructor, as search(
) does. As an example, we could use replace(
) as follows to provide uniform capitalization of the word
"JavaScript" throughout a string of text:
// No matter how it is capitalized, replace it with the correct capitalization
text.replace(/javascript/gi, "JavaScript");
replace( ) is more powerful than this, however.
Recall that parenthesized subexpressions of a regular expression are
numbered from left to right and that the regular expression remembers
the text that each subexpression matches. If a $
followed by a digit appears in the replacement string,
replace( ) replaces those two characters with the
text that matched the specified subexpression. This is a very useful
feature. We can use it, for example, to replace straight quotes in a
string with curly quotes, simulated with ASCII characters:
// A quote is a quotation mark, followed by any number of
// non-quotation-mark characters (which we remember), followed
// by another quotation mark.
var quote = /"([^"]*)"/g;
// Replace the straight quotation marks with "curly quotes,"
// and leave the contents of the quote (stored in $1) unchanged.
text.replace(quote, "``$1''");
The replace( ) method has other important features
as well, which are described in the "String.replace( )"
reference page in the core reference section. Most notably, the
second argument to replace( ) can be a function
that dynamically computes the replacement string.
The
match( ) method is the most general of the String
regular expression methods. It takes a regular expression as its only
argument (or converts its argument to a regular expression by passing
it to the RegExp( ) constructor) and returns an
array that contains the
results of the match. If the regular expression has the
g flag set, the
method returns an array of all matches that appear in the string. For
example:
"1 plus 2 equals 3".match(/\d+/g) // returns ["1", "2", "3"]
If the regular expression does not have the g flag
set, match( ) does not do a global search; it
simply searches for the first match. However, match(
) returns an array even when it does not perform a global
search. In this case, the first element of the array is the matching
string, and any remaining elements are the parenthesized
subexpressions of the regular expression. Thus, if match(
) returns an array a,
a[0] contains the complete match,
a[1] contains the substring that matched the first
parenthesized expression, and so on. To draw a parallel with the
replace( ) method,
a[n]
holds the contents of
$n.
For example, consider parsing a URL with the following code:
var url = /(\w+):\/\/([\w.]+)\/(\S*)/;
var text = "Visit my home page at http://www.isp.com/~david";
var result = text.match(url);
if (result != null) {
var fullurl = result[0]; // Contains "http://www.isp.com/~david"
var protocol = result[1]; // Contains "http"
var host = result[2]; // Contains "www.isp.com"
var path = result[3]; // Contains "~david"
}
Finally, there is one more feature of the match( )
method that you should know about. The array it returns has a
length property, as all arrays do. When
match( ) is invoked on a nonglobal regular
expression, however, the returned array also has two other
properties: the index property,
which contains the character position within the string at which the
match begins; and the input property,
which is a copy of the target string. So in the previous code, the
value of the result.index property would be 21,
since the matched URL begins at character position 21 in the text.
The result.input property would hold the same
string as the text variable. For a regular
expression r that does not have the
g flag set, calling s.match(r)
returns the same value as r.exec(s). We'll
discuss the RegExp.exec( ) method a little later
in this chapter.
The last of the regular expression methods of the String object is
split( ). This method
breaks the string on which it is called into an array of substrings,
using the argument as a separator. For example:
"123,456,789".split(","); // Returns ["123","456","789"]
The split( ) method can also take a regular
expression as its argument. This ability makes the method more
powerful. For example, we can now specify a separator character that
allows an arbitrary amount of whitespace on either side:
"1,2, 3 , 4 ,5".split(/\s*,\s*/); // Returns ["1","2","3","4","5"]
The split( ) method has other features as well.
See the "String.split( )" entry in the core reference
section for complete details.
|