Team LiB
Previous Section Next Section

Form Data Integrity

In this section I'll discuss methods you can use to protect data passed in HTML forms. Often when you're working with forms, it is necessary to pass data in the form of hidden input tags. For instance, let's assume that a form that you are working on requires that the user submits it back to the server within five minutes. Unless you are using sessions (discussed later in the book in Chapter 6, "Persistent Data Using Sessions and Cookies") the only method available to you is to create a hidden form element containing the time at which the form was created (see Listing 5.2):

Listing 5.2. Time-Sensitive Form Example
<FORM ACTION="process.php" METHOD=GET>
<INPUT TYPE="hidden" NAME="time" VALUE="<?php echo time(); ?>">
Enter your message (5 minute time limit):<INPUT TYPE="text" NAME="mytext" VALUE="">
<INPUT TYPE="submit" Value="Send Data">
</FORM>

When this form is submitted, the time can be checked by ensuring that the time hidden element is no more than 300 seconds (5 minutes) smaller than the current value returned by time():

if($_GET['time']+300 >= time()) {
     echo "You took too long!<BR>";
     exit;
}

The major flaw with this system is that there is no way to verify that the time element sent to the server was actually the same value that was originally sent when the form was created. When this form is submitted, in fact, the following is a sample URL that would be displayed in the user's browser:

http://somewhere.com/process.php?time=1037613504

This URL could be easily modified by the user to "turn back time" and make it look like the form was created two minutes earlier than it really was by adding 120 (60 * 2) seconds to the time URL parameter:

http://somewhere.com/process.php?time=1037613684

In situations like this, data validation can prove must useful. In the text to come, I will demonstrate how PHP can be used to ensure that any hidden data will be submitted as it was created.

Securing Hidden Elements

The secret to data validation in this case is the MD5 algorithm. This algorithm is used to create a message digest (a sort of "digital fingerprint") of the data provided to it. As with the fingerprints found on a person, the digital fingerprint generated by the MD5 algorithm is unique to the string that it represents. Although there is a slight chance (1 in 3.40282e+38) that two strings will produce an identical fingerprint, for all practical purposes it can be assumed that the fingerprint is unique. Not only will the MD5 algorithm create a digital fingerprint that is unique, but it also is predictable. For any given string, the MD5 will always generate the same fingerprint every time.

In PHP, using the MD5 algorithm is as simple as calling the md5() function. The syntax for this function is

md5($string)

$string represents the string to generate the fingerprint for. The md5() function will return a 32-character fingerprint based on the data provided in $string.

So how will the md5() function help us ensure that our data remains unchanged between the creation of a form and when it is submitted? By creating MD5 fingerprint values for each hidden element in your document and then checking those fingerprint values when the form is submitted, you now can be confident the data submitted was actually valid.

When creating a MD5 fingerprint for these purposes, it is critical to remember that one of the major benefits of the algorithm can also be its downfall. Because the MD5 algorithm is completely predictable, simply using some combination of the provided $name and $value parameters could be hazardous. For instance, consider the following code snippet:

$fingerprint = md5($name.$value);

Although $fingerprint is indeed a MD5 fingerprint based on the passed values, a malicious (and fairly observant) user could figure out the string used to generate the fingerprint with relative ease. For our MD5 fingerprint to be unique, a value completely unknown to the outside user must be included:

$fingerprint = md5($name.$value.'mysecretword');

Using this method, the malicious user would have to not only decipher the way the string was created for the MD5 algorithm, but would have to know the additional value. For simplicity's sake, let's define a constant in PHP called PROTECTED_KEY using the PHP define statement to store our secret word:

define("PROTECTED_KEY", "mysecretword");

NOTE

When a constant is defined using the define statement, it behaves as any other PHP constant. This means that it is referenced by PROTECTED_KEY (no leading $ symbol) and can be accessed from anywhere in the script automatically, regardless of scope.


The protect() Function

To facilitate the generation of MD5 fingerprints and form elements, what I will be doing in this section is constructing a helper function that will be used to generate the digital fingerprints of a HTML form. This function is called protect(), which has the following syntax:

protect($name, $value, $secret)

$name represents the NAME attribute of a hidden HTML form element, $value represents the actual corresponding value of that element, and $secret represents a secret string used in fingerprint generation. This function, when executed, will return a string representing individual hidden form elementsthe one containing the actual value and the other representing the MD5 fingerprint. The NAME attribute of the MD5 fingerprint will be defined by this function as <name>_checksum, where <name> represents the name of the actual value being passed to the form. This function is shown in Listing 5.3:

Listing 5.3. The protect() MD5 Form Fingerprint Generator
<?php

    define('PROTECTED_KEY', 'mysecretword');

    function my_addslashes($string) {
        return (get_magic_quotes_gpc() == 1) ? $string : addslashes($string);
    }

    function protect($name, $value, $secret) {

        $tag = "";
        $seed = md5($name.$value.$secret);
        $html_name = $name."_checksum";
        $tag = "<INPUT TYPE='hidden' NAME='$name' VALUE='" .
               urlencode(my_addslashes($value))."'>\n";
        $tag .= "<INPUT TYPE='hidden' NAME='$html_name' VALUE='$seed'>\n";
        return $tag;

    }
?>

NOTE

Don't know what my_addslashes() or urlencode() are? The purpose behind these functions is discussed in previous sections of this chapter ("Dealing with Magic Quotes" and "Data Conversion and Encoding," respectively).


In practice, the protect() function would be used anytime a hidden form element is required:

<FORM ACTION="process.php" METHOD=GET>
<?php echo protect('time', time(), PROTECTED_KEY); ?>
Enter your message (5 minute time limit):
<INPUT TYPE="text" NAME="mytext" VALUE="">
<INPUT TYPE="submit" Value="Send Data">
</FORM>

When processed by PHP, the following is the actual HTML that is displayed to the client browser:

<FORM ACTION="process.php" METHOD=GET>
<INPUT TYPE="hidden" NAME="time" VALUE="1037613504">
<INPUT TYPE="hidden" NAME="time_checksum"
       VALUE="3b6f5fa33bb4fb99e68cf1e3f5bf5478">
Enter your message (5 minute time limit):
<INPUT TYPE="text" NAME="mytext" VALUE="">
<INPUT TYPE="submit" Value="Send Data">
</FORM>

Now, by checking to ensure that the time hidden form element matches the MD5 fingerprint stored in time_checksum (with our secret string) the validity of the data can be ensured.

The validate() Function

After the form has been submitted, the fingerprint for each function much be confirmed for the data to be valid. To do this, we must construct the validate() function. This function has the following syntax:

validate($input, $secret)

$input represents a reference to the appropriate superglobal array ($_GET, $_POST, and so on) and $secret represents the secret string used to create the fingerprint (in this case, the string defined as PROTECTED_KEY). Unlike protect(), which represents a fairly simple function, the validate() function is considerably more complex for a number of reasons. First, there must be a number of different checks to account for all the ways a malicious user could attempt to manipulate the data, including (but not limited to) the following:

  • Modifying one or more of the protected values

  • Modifying one or more of the protected value fingerprints

  • Removing one or more of the protected values or fingerprints

To determine whether a user has removed or manipulated a protected value, the validate() function must know what values are supposed to be protected. To facilitate this, the validate() function looks for a hidden value (and its corresponding checksum) whose NAME attribute is protected_list. The value of this hidden form element is a serialized array listing the names of protected keys. If this parameter is not found, the validate() function should check all parameters with the following exceptions:

  • The name of the form element is submit.

  • The name of the form element ends in _checksum.

NOTE

If you are wondering why the validate() function ignores form elements named submit during validation, it is for circumstances where the form is being processed by the same script that displayed it. In these circumstances, often a hidden form element named "submit" will be included in the form to indicate to the script that it should process the form rather than display it.


For most cases, you'll need to provide a list of fields that are considered protected. To do this, create an array containing a list of element names that are protected and serialize it; then protect that list itself using the previously discussed protect() function:

$protected = serialize(array('myvar1', 'myvar2', 'myvar3'));
echo protect('protected_list', $protected, PROTECTED_KEY);

For the sake of avoiding repetition and confusion during my explanation of the validate() function, Listing 5.4 displays this function in its entirety and will be heavily referenced as I explain how the function actually works:

Listing 5.4. The validate() Function
<?php

    function validate($input, $secret) {

        if(!is_array($input)) {
            return false;
        }

        if(!isset($input['protected_list']) &&
           !isset($input['protected_list_checksum'])) {

            foreach($input as $key=>$val) {

                if(!preg_match("/(submit|_checksum$)/i", $key)) {

                   $protected[] = $key;

                }

            }

        } else {

            if(!isset($input['protected_list']) ||
               !isset($input['protected_list_checksum'])) {

                return false;

            }

            $checkval = 'protected_list' .
                        stripslashes(urldecode($input['protected_list'])) .
                        PROTECTED_KEY;

            $checksum = md5($checkval);
            if($checksum !== $input['protected_list_checksum']) {
                return false;
            }

            $protected = unserialize(stripslashes(urldecode(
              $input['protected_list'])));

        }

        foreach($protected as $val) {


            if(isset($input[$val."_checksum"]) && isset($input[$val])) {

                $temp = urldecode($input[$val]);

                $checksum = md5($val.stripslashes($temp).PROTECTED_KEY);

                if($checksum != $input[$val."_checksum"]) {

                    return false;

                }

            } else {

                return false;

            }

        }

        return true;
    }
?>

When the validate() function is called, its first task is to rule out a very basic validation taskensuring that the $input variable it was provided was actually an array. The next step the function takes is to determine what fields it will be validating. To determine this, first the validate() looks for a valid (with checksum) protected_list element in the $input array. If this element is found and validated based on its MD5 fingerprint, the array is reconstructed using the unserialize() function. In the event that the protected_list element is not provided in the form data, we use a simple regular expression to construct an array dynamically following the previously discussed rules. In either case, the $protected variable is populated with an array list of all the form elements in the $input array to validate.

With the $protected array now containing a list of the form elements that should be validated, the array is then iterated through using a foreach statement. For each element, the validate() function checks first to ensure that both the element itself and its fingerprint value exist. Assuming both elements exist, a MD5 fingerprint is then generated against the passed values and compared to the original fingerprint provided in the form submission. If the fingerprints are identical, the element's validity is confirmed and the script moves on to the next element. If at any time a particular element fails to validate or does not exist, the validate() function will return a Boolean false, indicating this failure. Upon a successful validation of all the required elements, the validate() function will return TRue.

Putting protect() and validate() into Action

Now that you understand both the theory and implementation of hidden element validation, let's put the complete script into action. Listing 5.5 creates a time-sensitive form that the user must submit within 5 minutes, using the protect() and validate() functions described in this section:

Listing 5.5. A Time-Sensitive Form Using protect() and validate()
<?php

    define('PROTECTED_KEY', 'mysecretword');
    function my_addslashes($string) {
        return (get_magic_quotes_gpc() == 1) ? $string : addslashes($string);
    }

    function protect($name, $value, $secret) {

        $tag = "";
        $seed = md5($name.$value.$secret);
        $html_name = $name."_checksum";
        $tag = "<INPUT TYPE='hidden' NAME='$name' VALUE='" .
               urlencode(my_addslashes($value)) .
               "'>\n";
        $tag .= "<INPUT TYPE='hidden' NAME='$html_name' VALUE='$seed'>\n";
        return $tag;

    }


    function validate($input, $secret) {

        if(!is_array($input)) {
            return false;
        }

        if(!isset($input['protected_list']) &&
           !isset($input['protected_list_checksum'])) {

            foreach($input as $key=>$val) {

                if(!preg_match("/(submit|_checksum$)/i", $key)) {

                   $protected[] = $key;

                }

            }

        } else {

            if(!isset($input['protected_list']) ||
               !isset($input['protected_list_checksum'])) {

                return false;

            }

            $checkval = 'protected_list' .
                        stripslashes(urldecode($input['protected_list'])) .
                        PROTECTED_KEY;

            $checksum = md5($checkval);
            if($checksum !== $input['protected_list_checksum']) {
                return false;
            }

            $protected = unserialize(stripslashes(urldecode(
              $input['protected_list'])));

        }

        foreach($protected as $val) {


            if(isset($input[$val."_checksum"]) && isset($input[$val])) {

                $temp = urldecode($input[$val]);

                $checksum =md5($val.stripslashes($temp).PROTECTED_KEY);

                if($checksum != $input[$val."_checksum"]) {

                    return false;

                }

            } else {

                return false;

            }

        }

        return true;
    }

    if(isset($_GET['submit'])) {
        if(validate(&$_GET, PROTECTED_KEY)) {
            if($_GET['time']+300 > time()) {
                echo "Thank you " . $_GET['username'] .
                     " for submitting this form on-time!";
            } else {
                echo "Sorry, you took too long!";
            }
        } else {
            echo "Data was invalid!";
        }
    }

    $protect_str = serialize(array('time'));
?>
<HTML><HEAD><TITLE>Validating Hidden elements example</TITLE></HEAD>
<BODY>
Please fill out the below form within 5 minutes:<BR>
<FORM ACTION="<?=$_SERVER['PHP_SELF']?>" METHOD=GET>
<INPUT TYPE="hidden" NAME="submit" VALUE="1">
<? echo protect('time', time(), PROTECTED_KEY); ?>
<? echo protect('protected_list', $protect_str, PROTECTED_KEY); ?>
What is your name: <INPUT TYPE="text" NAME="username" SIZE=30>
<INPUT TYPE="submit" VALUE="Send">
</FORM>
</BODY>
</HTML>

    Team LiB
    Previous Section Next Section