Basic input validation

By | October 21, 2010

In “My version of Waltzing Matilada” and “Why we have to squash ants” I have been demonstrating a number of trivially simple Cross-site scripting vulnerabilities on a number of web-sites. Based on my statistically insignificant data points site-owners doesn’t care. The same is true for higher profile web-sites (I have reported quite a number of vulnerabilities to higher profile web sites with similar lackluster results). Let’s take a look at home easy those vulnerabilities creep into the code. But also how easy they can be prevented.

If this is as far as you read, then the take away is: filter ALL input data

PHP

PHP is a very popular language used by some of the most popular web sites, like Yahoo! and facebook.

<?php
if (empty($_GET['name'])) {
  echo "<form>";
  echo "What is your name? <input name=name type=text size=32>";
  echo "<input type=submit value=Register!>";
  echo "</form>";
} else {
  echo $_GET['name'] . " registered successfully";
}

You can try this on your own server if you put the above in a name.php. Try this http://YOUR_SERVER/name.php?user=<script>alert(“hello world”)</script>. The documentation for $_GET doesn’t mention anything about the potential danger, there are even a few examples like the above. Fortunately there is a link to the filter extension. With filters it’s possible to create a safer version of the above code:

<?php
if (empty($_GET['name'])) {
  echo "<form>";
  echo "What is your name? <input name=name type=text size=32>";
  echo "<input type=submit value=Register!>";
  echo "</form>";
} else {
  echo filter_input(INPUT_POST | INPUT_GET, 'name',
    FILTER_SANITIZE_SPECIAL_CHARS) . " registered successfully";
}

If you have a need for untreated input data (FILTER_UNSAFE_RAW) then it’s a good practice to name the variable in way that removes any doubt about the safety of the data, eg.

$unsafe_user = filter_var($_GET['name'], FILTER_UNSAFE_RAW);
$unsafe_age = $_GET['age'];

This will help both the programmer writing the initial version, but certainly also the programmers coming after him.

I wonder how the security landscape on the internet would look today if PHP had put a filtered version of the input data in $_GET and $_POST and then kept the raw version in $_UNSAFE_GET and $_UNSAFE_POST. I am fully aware it will not solve all problems, but it would take care of the majority of the trivial problems. In addition if a developer was forced to use $_UNSAFE it would hopefully make him think: “Why is this unsafe?”

Python

Disclaimer: This is my first attempt at using Python.
On python.org I found a few examples on how to write a cgi script. I copy-pasted a few lines from that page to create the following cgi-script:

#! /usr/bin/python

import cgi
print "Content-Type: text/html"    # HTML is following
print                              # blank line, end of headers

form = cgi.FieldStorage()
if "name" not in form or "addr" not in form:
    print "<H1>Error</H1>"
    print "Please fill in the name and addr fields."
    exit
print "<p>name:", form["name"].value
print "<p>addr:", form["addr"].value

I added the “#!” line and changed “return” to “exit”, saved it in name.py and dropped it in the cgi-bin directory on my server. I can now trivially exploit that script with http://your.machine/cgi-bin/name.py?name=<script>alert(“hello world”)</script>&addr=here and in return I get hello world in a little alert box. Oops

The documentation does have a section about security but it only mentions os.popen and os.system. It basically says you have to be careful, but doesn’t offer any pointers on how to filter the data. And nothing is mentioned about filtering in other situations. Since the documentation mentions cgi.escape it would have been a nice touch to change the example code to:

#! /usr/bin/python

import cgi
print "Content-Type: text/html"    # HTML is following
print                              # blank line, end of headers

form = cgi.FieldStorage()
if "name" not in form or "addr" not in form:
    print "<H1>Error</H1>"
    print "Please fill in the name and addr fields."
    exit
print "<p>name:", cgi.escape(form["name"].value)
print "<p>addr:", cgi.escape(form["addr"].value)

That would prevent against stuff like http://your.machine/cgi-bin/name.py?name=<script>alert(‘hello world’)</script>&addr=Hawaii But you could still get into trouble. Let’s say your cgi-script was like this:

#! /usr/bin/python

import cgi
print "Content-Type: text/html"    # HTML is following
print                              # blank line, end of headers

form = cgi.FieldStorage()
if "name" not in form or "addr" not in form:
    print "<H1>Error</H1>"
    print "Please fill in the name and addr fields."
    print "<form method=get><input name=\"name\"",
    if "name" in form:
        print " value='"+cgi.escape(form["name"].value)+"'",
    print "><input name=\"addr\"",
    if "key" in form:
        print " value='"+cgi.escape(form["key"].value)+"'",
    print "><input type=submit value=click></form>"
else:
    print "<p>name:", cgi.escape(form["name"].value)
    print "<p>addr:", cgi.escape(form["addr"].value)

That is unfortunately exploitable with: http://your.machine/cgi-bin/name.py?name=Enter+name’+onmouseover=alert(‘hello world’) and the javascript would fire when the user moves the cursor over the name field. The user is very likely to do just that since the page looks like (please note the below will trigger a javascript alert box)

cgi.escape(form["name"].value, true)

in combination with using in the html elements will protect against that.

In PHP

 filter_input(INPUT_POST | INPUT_GET, 'name', FILTER_SANITIZE_SPECIAL_CHARS)

will filter all the dangerous characters, and hence allow you to include filtered data in input value fields.

Node.js

Node.js is one of the most recent languages/environment for writing web applications. I have not been able to find a solution for filtering input data in Node.js. Please let me know in the comments what you are using. Hope this doesn’t mean a rise in new security vulnerabilities. Here is a Node.js example which is vulnerable to the same attacks as the PHP and Python examples

var http = require('http');

http.createServer(function (request, response) {
  response.writeHead(200, {'Content-Type': 'text/html'});
  if (require('url').parse(request.url, true)['query'] == undefined) {
    response.write("<form>What is your name? <input name=name type=text size=32>" +
      "<input type=submit value=Register!></form>");
  } else {
    response.end(decodeURI(require('url').parse(request.url, true)['query']['name']) +
      " registered successfully");
  }
}).listen(8080);

The message is still the same: filter ALL input data

2 thoughts on “Basic input validation

  1. Jakub Vrana

    PHP:

    You can get filtered data in $_GET, $_POST and $_COOKIE by defining filter.default configuration directive.

    Access to unfiltered data is then possible only through filter_input() function.

  2. paranoid Post author

    My point is that filtering should be on by default.

    I am sure programmers who needs the unfiltered data will know how to access it. If you don’t know how to access the unfiltered data then you probably don’t know how to handle it safely either ;)

Leave a Reply

Your email address will not be published. Required fields are marked *