Team LiB
Previous Section Next Section

Data Manipulation and Conversion

Dealing with Magic Quotes

A common problem when working and displaying form data can be traced to what is usually considered a nicety in PHPmagic quotes. When you're working with data external to PHP (whether from a form submission or a database), PHP can automatically add an escape character (backslash) to any characters, which could cause problems. For instance, if a string contains a quote character (single or double), it may cause a problem if displayed directly to the browser as shown next:

NOTE

Magic quotes are enabled/disabled by the magic_quotes_gpc, magic_quotes_runtime, and magic_quotes_sybase PHP directives.


<?php $foo = '"this is my value"'; ?>
<INPUT TYPE="TEXT" NAME="myvalue" VALUE="<?php echo $foo; ?>">

When actually executed by PHP, the resulting HTML will contain an extra set of double quotes for the VALUE attribute:

<INPUT TYPE="TEXT" NAME="myvalue" VALUE=""this is my value"">

Unfortunately, when you're working with data submitted to the Web server (GET, POST, and so on) there is no way to turn the magic quotes feature on or off while the script is being executed. To make your script compatible with any configuration of PHP, you'll need to deal with both circumstances. To accomplish this, you'll need two new functions: addslashes() and stripslashes(). These two functions are used to add or remove slashes (when appropriate) from the provided string. The syntax for these functions is as follows:

addslashes($string)
stripslashes($string)

In both cases, $string represents the string to operate on, and each of the functions returns the modified string. Including a stripslashes() function call every time you work with remote data will work no matter whether magic quotes are enabled (because there won't be any slashes to strip if magic quotes are off). However, determining when to add slashes to a string is slightly more difficult. If magic quotes are enabled, calling the addslashes() function will escape the automatically escaped string (hence it will be double escaped), which will undoubtedly lead to bugs in your script. Because of this, the addslashes() function should be used only when you are completely sure PHP has not already done this job for you. To determine the state of magic quotes at runtime, use the get_magic_quotes_gpc() or get_magic_quotes_runtime() functions.

NOTE

In our examples (because we are dealing primarily with form data in this chapter) I will be using only the get_magic_quotes_gpc() function. If you are working with data from a database (or any external sources other than form submissions), get_magic_quotes_runtime() should be used instead.


These two functions are used to retrieve the active setting for their related PHP configuration directives. Each of them will return either an integer 1 (indicating magic quotes are enabled) or zero. This function (see Listing 5.1) can be used to create our own custom my_addslashes() function, which adds slashes only depending on whether magic quotes are enabled in your PHP configuration:

Listing 5.1. A Custom addslashes() function my_addslashes()
<?php
    function my_addslashes($string) {
        return (get_magic_quotes_gpc() == 1) ? $string : addslashes($string);
    }
?>

We now have an eloquent method of dealing with magic quotes, regardless of the configuration of the particular copy of PHP the script is running on. By using our custom my_addslashes() function instead of the internal version, we can always be assured that our data will be formatted in the expected manner.

Data Conversion and Encoding

Often, especially when transferring data between PHP and an external source (such as an HTML form or a database) it is necessary to encode or convert the data to an appropriate format. This section is devoted to those support functions available in PHP used for these purposes. Unlike the two functions addslashes() and stripslashes() discussed in the previous section, the following functions do not have any association with configuration directives and thus require no special care.

Encoding and Decoding Data for URLs

When sending data as part of a form or in a GET request to the server (that is, as part of the URL), often it is necessary to convert characters that bear special meaning in an HTTP request (nonalphanumeric characters) into an acceptable format. In HTTP requests, this format is a hexadecimal number representative of the character's ASCII value prefixed with the % symbol. The one exception to this in modern times is the space character, which is represented by a +. In the following example, assuming you would like to pass the variable myvar whose value is a string "/ value" to another PHP script, the following would not work:

http://myserver.com/myscript.php?myvar=/value

To properly pass the value of myvar, you'll need to convert it to the encoded representation of the string. Because the hexadecimal value of this character is 0x2F and the space character is signified by +, the appropriate URL would be as follows:

http://myserver.com/myscript.php?myvar=%2F+value

Because manually converting each non-alphanumeric character would be an incredible hassle, PHP provides the urlencode() function, which converts all non-alphanumeric characters (except the -, _, and . characters, which have no significance in the HTTP protocol) into their encoded form. This function's syntax is as follows:

urlencode($string)

$string is the string to encode. Upon success, the urlencode() function will return the string in its encoded form. A sister function to urlencode(), rawurlencode(), does not convert the space character into a plus (+). Rather, it converts it into its hexadecimal value 0x20 (%20).

When PHP transfers passed parameters from an HTTP request (regardless of whether they come from GET, POST, or cookies) PHP automatically decodes the values into their actual values. However, for situations where it may be necessary to decode these values manually, PHP also provides the urldecode() function. The syntax for urldecode() is as shown next:

urldecode($enc_string)

$enc_string is the encoded string to decode. This function will return the decoded string when executed. As was the case with urlencode(), there is a sister function that is for spaces represented by their hexadecimal valuerawurldecode().

Encoding and Decoding Binary Data

Another function useful when dealing with encoding of data, particularly binary data, is the base64_encode() function. The syntax for this function is as follows:

base64_encode($data)

$data represents the data to encode. When executed, this function returns the data contained within the $data variable in base64 format.

In a similar fashion, PHP can also decode data received in base64 format back into its original state via the base64_decode() function. Like its counterpart, the syntax for this function is:

base64_decode($enc_string);

$enc_string is the base64 encoded string to decode. This function returns the original data that had been encoded.

Converting to HTML Entities

Although encoding data for transferring back and forth between a HTML form, databases, and so on is extremely useful, PHP also supports a few more simple (and very convenient) conversions. For instance, for argument's sake, assume you would like to display the following text in the browser:

<A HREF="example.php">This is an example HTML Tag</A>

Now, the trick here is to get this string to display to the client browser as it is seen in the example (not as a hyperlink). For purposes such as this, when displaying characters that usually hold a significance in HTML, there are HTML entities. These entities are special strings interpreted by the browser and rendered as a character. For instance, &lt; is the entity representation of the < character.

So, to display the preceding HTML code as text and have it not interpreted by the browser, it would have to resemble something like the following:

&lt;A HREF=&quot;example.php&quot;&gt;This is an example HTML Tag&lt;/A&gt;

Although it's not much different from URL encoding, attempting to manually convert these HTML entities soon becomes quite an annoying task. Luckily, PHP provides two functions to automate this conversion.

The first of these functions is htmlentities(). This function converts all applicable characters into their corresponding HTML entities. The syntax of this function is as follows:

htmlentities($string [, $quote_style [, $char_set]])

$string represents the string to convert, $quote_style is a flag determining how to treat quote characters (single and double), and $char_set is a string representing the character set to use in the conversion. The possible flags for the $quote_style parameter are shown in Table 5.1.

Table 5.1. htmlentities() Quote Style Flags

ENT_COMPAT

Convert only double-quote characters (default).

ENT_QUOTES

Convert both single and double-quote characters.

ENT_NOQUOTES

Leave all quote characters as is.


When executed, the htmlentities() function will convert and return the characters represented in $string to their respective HTML entities (if available). For instance, when the following code snippet is executed:

<?php echo htmlentities("<A HREF='foo'>\"Jack & Jill\"</A>"); ?>

The output will be as follows:

&lt;A HREF='foo'&gt;&quot;Jack &amp; Jill&quot;&lt;/A&gt;

Although effective, at times it may not be necessary to convert every possible character that has an HTML entity equivalent into entity form. Usually, there are a few select characters that need to be converted for the text not to be rendered by the browser as HTML code. For these cases, PHP also provides a watered-down version of the htmlentities() function, which converts only these characters: &, ", ', <, and >. This function is called htmlspecialchars() and has the following syntax:

htmlspecialchars($string [, $quote_style [, $char_set]])

Whereas with htmlentities(), $string is the string to be translated, $quote_style is a flag used to determine how quotes will be handled (refer back to Table 5.1 for a table of possible values), and $char_set represents the character set to use in the conversion.

Serialization

Although not as widely used in forms (more in databases), serialization of variables in PHP can prove extremely useful. What exactly is serialization? Basically, it is a process whereby a complex data structure such as an array or an object (which cannot be transmitted in a form or to a database directly) is converted into a string by some reversible method. Although you could create your own function to serialize a complex data structure, serialization of any PHP variable can be accomplished through the serialize() function. The syntax for this function is as follows:

serialize($input)

$input is the complex data structure to serialize. When executed, the serialize() function returns the string representation of the input data, which looks something like the following (for the defined array):

<?php
     $a= array("foo" => "testing", 0 => 10, 1 => "mystring");
     echo serialize($a);
?>

Which generates the following output:

a:3:{s:3:"foo";s:7:"testing";i:0;i:10;i:1;s:8:"mystring";}

Note that this string is by no means ready to be transmitted over the HTTP protocol (that is, as a hidden form element) or stored in a database. In both cases, the serialization string contains characters that are considered invalid. To overcome this, a number of different methods are available to the developer. If the data is to be stored in a database, often simply using the addslashes() (or the custom my_addslashes() function discussed earlier) will do the trick. However, when you're dealing with the HTTP protocol, the urlencode() function (also discussed earlier) should be used.

After it is serialized and encoded (if necessary), this string can be sent into a database as a hidden element in an HTML form, or even written to a file for future use. To reconstruct the variable from its serialized representation, PHP offers the unserialize() function, which has a similar syntax to its counterpart:

unserialize($input_string [, $callback_function])

$input_string represents the serialization string for the variable to reconstruct, and $callback_function is the name of an optional callback function to use if unserialize() reconstructs an object that has not been defined (see Chapter 7, "Using Templates and Content Management," for more information on dynamically loading of class definitions). Upon success, the unserialize() function will return the reconstructed variable based on the provided data or will return false if PHP was unable to reconstruct the serialized data.

    Team LiB
    Previous Section Next Section