Team LiB
Previous Section Next Section

Comparing Strings

Determining the relationship between two strings is not as immediately obvious a feat as performing the same operation on two numbers. The main problem with strings is one of context. If you examine a string based on its binary form, the two words "Marco" and "marco" will be completely different because the byte value of the character "M" ishas to bedifferent from the value of "m". However, depending on your requirements, Marco and marco could be equivalent and should be treated as such.

The easiest way to compare two strings is to use the built-in PHP comparison operators. However, there are a few "gotchas" that you should be aware of. Consider, for example, the following expression:

echo (0 == '0');

Because one of the operators is an integer, the string "0" is converted to an integer value before the conversion is made, resulting in the output 1. Now, this may not look like much of a problem at first sight, but it can very easily become one when something like this happens:

echo (0 == 'Marco');

Because the string 'Marco' is converted to the integer value 0 when the expression is evaluated, the result of the comparison operation is still true, and the preceding code snippet outputs 1. Now, there's a good chance that you will never want something like this to happen to your code and, therefore, you should never use the simple comparison operators when dealing with strings unless you really know what you're doing.

You should, instead, consider using the type-checking comparison operators, which will ensure that the two operands being compared are of the same data type before actually comparing their values. For example, the expression:

(0 === 'Marco')

will return a value of false, which is probably what you were expecting in the first place. The same thing will happen for this statement:

(0 === '0')

The most consistently accurate way of comparing strings, however, is to use the strcmp function:

int strcmp ($val1, $val2)

The result returned by strcmp() depends on the alphabetical relationship between the two strings. If $val1 and $val2 are identical, strcmp() will return 0. However, strcmp() performs a case-sensitive string comparison, so that, for example, "Marco" and "marco" will not be equal.

If the two values are not equal, the comparison is performed according to the current locale collationin other words, using alphabetical sort rules that depend on the locale of the environment in which your script is running. If $val1 is alphabetically inferior to $val2, the result will be negative. Otherwise, it will be positive.

For example, using my collation rules (Canadian-English), I obtain the following results:

echo strcmp ('Apple', 'Banana');        // returns < 0
echo strcmp ('apple', 'Apple');        // returns > 0
echo strcmp ('1', 'test');                // returns < 0

As you can see, numbers have a lower contextual value than letters, and uppercase letters have a lower contextual value than lowercase letters. In the Canadian-English locale, this also corresponds to the binary values of each character, but the same is not always true, particularly in those languages where collections of letters are considered as a single symbol (for example, ae in German, or cz in Czech).

If you need to perform a comparison that is not case sensitive, PHP provides the strcasecmp function, which takes the same parameters as strcmp():

echo strcasecmp ('Marco', 'marco');    // returns 0

    Team LiB
    Previous Section Next Section