Abstract:
Designing a database interface for web programming.

Created by Peter Kankowski
Last changed
Filed under Interfaces

Share on social sitesReddit Digg Delicious Buzz Facebook Twitter

PHP database library

This article is a case study of software interfaces. The previously discussed guidelines are applied to database access functions in PHP, which are too low-level and exposed to hacker attacks. The result is a small, practical database abstraction layer.

The library was developed in November 2007 and since used in several real-world projects, including a job search site and a corporate site (both in Russian).

Problems and solutions

There are several interfaces for DB access in PHP, including PDO, DBX, and vendor-specific interfaces (MySQL, PostgreSQL, Firebird/InterBase, Microsoft SQL Server, and SQLite). There are also database access layers in web application frameworks (Silverstripe, CodeIgniter, Drupal) and separate libraries (MDB2, ADOdb, DbSimple). However, all these interfaces suffer from one or more of the problems listed below.

Useful, high-level interface

When designing an interface, you should start from a list of the typical tasks that your caller will do. Most existing interfaces were designed from the other end: they just provide access to all features of the underlying database engine. The authors did not consider how to make the interface handier for the calling code.

Here are the things that a web programmer needs from the database engine:

  • Querying a single row
  • Querying a single value
  • Displaying a list or a table
  • Inserting or updating a row
  • Executing a result-less query

Let's discuss each of them in detail.

Querying a single row or a field

The article "Incredible machines" by John Doe in category "Games > Puzzles"

Consider a simple script for publishing articles online. If you want to display an article, you need to get a single row from the Articles table. This task requires 3-4 function calls when using MySQL interface:

   // Get the article from database
   $res = mysql_query("SELECT * FROM `articles` WHERE `id` = " .
                      mysql_real_escape_string($_GET['id']));
   $article = mysql_fetch_assoc($res);
   echo "<h2>$article[title]</h2>by $article[author]<br/>" .
        nl2br($article['text']);
   mysql_free_result($res);

The code would be shorter if there were a function for this purpose:

   $article = db()->get_row('SELECT * FROM "articles" WHERE "id" = ~0~',
                            $_GET['id']);
   echo "<h2>$article[title]</h2>by $article[author]<br/>" .
        nl2br($article['text']);

Another common task that deserves a function is retrieving a single value:

   // Get the total number of articles
   $count = db()->get_value('SELECT COUNT(*) FROM "articles"');
   echo ($count == 1) ? '1 article' : "$count articles";

The SQLite interface contains a similar function, but as of PHP 5.3, it does not work properly for retrieving a row.

Displaying a list or a table

A list of articles

To show a list of articles, you have to query the database, fetch all rows in a while loop, and free the result. Long, boring, and error-prone:

   $result = mysql_query("SELECT * FROM `articles` ORDER BY `date` DESC");
   echo '<table>';
   while ($a = mysql_fetch_array($result, MYSQL_ASSOC)) {
      echo '<tr><td>' . htmlspecialchars($a['title']) . '</td>' .
           '<td>' . htmlspecialchars($a['author']). '</td>' .
           '<td>' . htmlspecialchars($a['date'])  . '</td></tr>';
   }
   echo '</table>';
   mysql_free_result($result);

Other database engines can return all rows in a multidimensional array (see pg_fetch_all and sqlite_fetch_all). Popular frameworks pack the rows into an array of objects, which is a more fashionable, but essentially the same interface (see CodeIgniter as an example). The array takes a lot of memory if the dataset is large; it's also not very handy, because the final result should be a string, not an array.

A better interface would execute a callback function for each row and concatenate the returned strings:

echo '<table>' . db()->get_list(
     function($a) {return "<tr><td>$a[title]</td><td>$a[author]</td>".
                          '<td>' . format_date($a['date']) . '</td></tr>';},
     'SELECT * FROM "articles"') . '</table>';

If you need to print the rows immediately, you can avoid the cost of string concatenation:

echo '<table>';
db()->get_list(
     function($a) {echo "<tr><td>$a[title]</td><td>$a[author]</td>".
                        '<td>' . format_date($a['date']) . '</td></tr>';},
     'SELECT * FROM "articles"');
echo '</table>';

You can also use this function to enumerate all matching rows without printing anything:

// Send out newsletter to subscribers
$msg = 'Hello!';
db()->get_list(
   function($html_subscription, $subscription) use ($msg) {
      mail($subscription['email'], 'My newsletter', $msg);
   },
   'SELECT * FROM "subscriptions"');

Inserting or updating a row

Form handling typically involves adding a new row to the database or updating an existing one. A single function call is needed for that:

// Adding a new article or editing existing one
db()->add_update('articles', $_POST);

If $_POST['id'] is set, the row will be updated; if not, it will be added to the table.

Some database interfaces provide a similar function, but don't take into account that the $_POST array may include irrelevant items (e.g., an item for Submit button). Such items are ignored by add_update, so you don't have to delete them explicitly.

Executing a query without returning any result

Deleting an article

Besides retrieving or adding information, you often need to delete a row or to increase a counter. There is another function for this:

db()->exec('DELETE FROM "articles" WHERE "id" = ~0~', $_GET['id']);

Protection against SQL injection attacks

SQL injection attacks are the biggest problem of MySQL API in PHP. Here is a vulnerable equivalent of the previous code snippet:

mysql_query('DELETE FROM `articles` WHERE `id` =' . $_GET['id']);

If $_GET['id'] is a number, it works as expected:

http://www.example.com/page.php?id=123

However, a malicious user can pass any SQL expression instead of the ID:

http://www.example.com/page.php?id=123+OR+1=1

The query will be:

DELETE FROM `articles` WHERE `id` = 123 OR 1=1

So all articles in the table will be deleted, including those to which the user has no access through your web interface (e.g., the articles by other authors). By analyzing the database structure, an attacker can delete or modify tables, login as site admin without knowing the password, etc.

Early versions of PHP escaped all incoming data to protect your code against SQL injections (see magic quotes), but this technique turned out to be inconvenient and ineffective.

A better approach is using parameterized statements. In MySQLi and Firebird interfaces, the placeholder is a question mark:

// Firebird
ibase_query($db, 'DELETE FROM "articles" WHERE "id" = ?', $_GET['id']) or die("Query failed");

// MySQL improved interface
$stmt = $mysqli->prepare("DELETE FROM `articles` WHERE `id` = ?") or die("Prepare failed");
$stmt->bind_param('i', $_GET['id']) or die('Bind failed');
$stmt->execute() or die('Execute failed');
$stmt->close();

The code becomes hard to read if you have more than 2-3 parameters. PDO implements named placeholders to solve this problem (the syntax is even more verbose than MySQLi).

A reasonable compromise is using numbered placeholders: ~0~, ~1~, ~2~, etc., similar to $1, $2, $3 in regular expressions:

db()->exec('INSERT INTO "log"("message", "date", "ip") VALUES (~0~, NOW(), ~1~)',
           $message, $_SERVER['REMOTE_ADDR']);

In comparison with MySQLi and PDO code, this one is shorter and more readable.

Protection against XSS attacks

If you allow your users to exchange comments or publish forum posts, you should watch out for malicious HTML code. Not only can the attackers break HTML markup, they also can steal other users' authentication cookies through cross-site scripting. The simplest way to protect your site is escaping all HTML tags with htmlspecialchars, which the get_value, get_row, and get_list functions do:

$title = db()->get_value('SELECT "title" FROM "articles" WHERE "id" = ~0~', $_GET['id']);

If <SCRIPT>alert("XSS")</SCRIPT> was stored in the database, &lt;SCRIPT&gt;alert(&quot;XSS&quot;)&lt;/SCRIPT&gt; will be returned in $title.

When generating plain text messages such as emails, the HTML escaping is superfluous. You should use get_value_text and get_row_text functions, which return strings with bare < and >. The callback function of get_list receives plain-text strings in the second parameter:

// A list of purchased items
$list = db()->get_list(
    function($html_item, $item) {
        return sprintf("|%-30s %10d|\r\n", $item['name'], $item['price']);
    },
    'SELECT * FROM "items", "items_in_order" '.
    'WHERE "items"."id" = "items_in_order"."item" AND '.
          '"items_in_order"."order" = ~0~',
    $order_id);
mail($email, 'Your order', $list);

These functions are also useful if you need to allow few HTML tags (<b>, <i>, <a href>, etc.) in forum posts or comments. In this case, you should also add an HTML filter such as KSES.

Note that some of these filters are buggy and can be bypassed with a crafted JavaScript code. They also don't escape HTML tags, but remove them, so you can lose valuable information. For example, the visitors of smallcode blog lost C code in their comments because of KSES filtering.

Portable SQL code

There are some syntax differences between DBMSes; indexes, index hints, and other optimizations are not portable. However, in most cases it's possible to avoid the idiosyncrasies (such as `backticks` in MySQL) and write a portable ANSI SQL code.

Using an abstraction layer does not solve all portability problems, but still helps. This database library supports MySQL and SQLite. If you want to switch to another DBMS, you don't have to replace each mysql_ function call with its sqlite_ counterpart. Just change one constructor call in the library code and check that your SQL code works in SQLite.

Performance comparison

INSERT (40'000 times)SELECT (10'000 times)
This library12.3 sec15.3 sec
mysql_* functions10.1 sec9.4 sec

PHP is an interpreted language, and not a very fast one, so the library is 25-60% slower than the default MySQL interface.

Download the PHP database library (for PHP 5.3)

Credits

Thanks to Anton Leonov for his permission to publish the library's source code. It was nice to work with you, Anton!

Peter Kankowski
Peter Kankowski

About the author

Peter is the developer of Aba Search and Replace, a tool for replacing text in multiple files. He likes to program in C with a bit of C++, also in x86 assembly language, Python, and PHP.

Created by Peter Kankowski
Last changed

Leave your comment

Your name:


Comment: