[ Team LiB ] Previous Section Next Section

Investigating Strings

You do not always know everything about the data with which you are working. Strings can arrive from many sources, including user input, databases, files, and Web pages. Before you begin to work with data from an external source, you often need to find out more about it. PHP provides many functions that enable you to acquire information about strings.

A Note About Indexing Strings

We will frequently use the word index in relation to strings. You will have come across the word more frequently in the context of arrays. In fact, strings and arrays are not as different as you might imagine. You can think of a string as an array of characters. So, you can access individual characters of a string as if they were elements of an array:

$test = "scallywag";
print $test[0]; // prints "s"
print $test[2]; // prints "a"

It is important to remember, therefore, that when we talk about the position or index of a character within a string, characters—like array elements—are indexed from 0.

Finding the Length of a String with strlen()

You can use strlen() to determine the length of a string. strlen() requires a string and returns an integer representing the number of characters in the variable you have passed it. strlen() is typically used to check the length of user input. The following fragment tests a membership code to ensure that it is four digits long:

if ( strlen( $membership ) == 4 ) {
  print "Thank you!";
} else {
  print "Your membership number must have 4 digits";

The user is thanked for his input only if the global variable $membership contains four characters; otherwise, an error message is generated.

Finding a Substring Within a String with strstr()

You can use strstr() to test whether a string exists embedded within another string. strstr() requires two arguments: a source string and the substring you want to find within it. The function returns false if the substring is absent; otherwise, it returns the portion of the source string beginning with the substring. For the following example, imagine that we want to treat membership codes that contain the string AB differently from those that do not:

$membership = "pAB7";
if ( strstr( $membership, "AB") ) {
  print "Thank you. Don't forget that your membership expires soon!";
} else {
  print "Thank you!";

Because our test variable, $membership, does contain the string AB, strstr() returns the string AB7. This resolves to true when tested, so we print a special message. What happens if our user enters "pab7"? strstr() is case sensitive, so AB is not found. The if statement's test fails, and the default message is printed to the browser. If we want to search for either AB or ab within the string, we must use stristr(), which works in exactly the same way but is not case sensitive.

Finding the Position of a Substring with strpos()

strpos() tells you both whether a string exists within a larger string and where it is to be found. strpos() requires two arguments: the source string and the substring you are seeking. The function also accepts an optional third argument, an integer representing the index from which you want to start searching. If the substring does not exist, strpos() returns false; otherwise, it returns the index at which the substring begins. The following fragment uses strpos() to ensure that a string begins with the string mz:

$membership = "mz00xyz";
if ( strpos($membership, "mz") === 0 ) {
  print "hello mz";

Notice the trick we had to play to get the expected results. strpos() finds mz in our string, but it finds it at the first element of the string. Therefore, it returns zero, which resolves to false in our test. To work around this, we use PHP's equivalence operator (===), which returns true if the left and right operands are equivalent and of the same type.

Extracting Part of a String with substr()

substr() returns a portion of a string based on the start index and length of the portion for which you are looking. strstr() demands two arguments—a source string and the starting index. It returns all the characters from the starting index to the end of the string you are searching. substr() optionally accepts a third argument, which should be an integer representing the length of the string you want returned. If this argument is present, substr() returns only the number of characters specified from the start index onward:

$test = "scallywag";
print substr($test,6); // prints "wag"
print substr($test,6,2); // prints "wa"

If you pass substr() a negative number as its second (starting index) argument, it counts from the end rather than the beginning of the string. The following fragment writes a specific message to people who have submitted an email address ending in .uk:

$test = "matt@corrosive.co.uk";
if ( $test = substr( $test, -3 ) == ".uk") {
  print "Don't forget our special offers for British customers";
} else {
  print "Welcome to our shop!";

Tokenizing a String with strtok()

You can parsea string word by word using strtok(). strtok() initially requires two arguments, the string to be tokenized and the delimiters by which to split the string. The delimiter string can include as many characters as you want. strtok() returns the first token found, and after strtok() has been called for the first time, the source string is cached. For subsequent calls, you should pass only strtok() the delimiter string. The function returns the next found token every time it is called, returning false when the end of the string is reached. strtok() usually is called repeatedly within a loop. Listing 8.3 uses strtok() to tokenize a URL, splitting the host and path from the query string and further dividing the name/value pairs of the query string. Figure 8.4 shows the output from Listing 8.3.

Figure 8.4. Tokenzing a string.


Listing 8.3 Dividing a String into Tokens with strtok()
 2:   "-//W3C//DTD XHTML 1.0 Strict//EN"
 3:   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 4: <html>
 5: <head>
 6: <title>Listing 8.3 Dividing a string into
 7:     tokens with strtok()</title>
 8: </head>
 9: <body>
10: <div>
11: <?php
12: $test = "http://p24.corrosive.co.uk/tk.php";
13: $test .= "?id=353&sec=44&user=harry&context=php";
15: $delims = "?&";
16: $word = strtok( $test, $delims );
17: while ( is_string( $word ) ) {
18:  if ( $word ) {
19:    print "$word<br/>";
20:  }
21:  $word = strtok( $delims );
22: }
23: ?>
24: </div>
25: </body>
26: </html>

strtok() is something of a blunt instrument, and a few tricks are required to work with it. We first store the delimiters we want to work with in a variable, $delims on line 15. We call strtok() on line 16, passing it the URL we want to tokenize and the $delims string. We store the first result in $word. Within the conditional expression of the while loop on line 17, we test that $word is a string. If it isn't, we know that the end of the string has been reached and no further action is required.

We are testing the return type because a string containing two delimiters in a row would cause strtok() to return an empty string when it reaches the first of these delimiters. So, a more conventional test such as

while ( $word ) {
   $word = strtok( $delims );

would fail if $word were an empty string, even if the end of the source string had not yet been reached.

Having established that $word contains a string, we can work with it. If $word does not contain an empty string, we print it to the browser on line 19. We must then call strtok() again on line 21 to repopulate the $word variable for the next test. Notice that we don't pass the source string to strtok() a second time. If we were to do this, the first word of the source string would be returned again and we would find ourselves in an infinite loop.

    [ Team LiB ] Previous Section Next Section