Gonzalo Ayuso is a Web Architect with more than 10 year of experience in the web development, specialized in Open Source technologies. Experienced delivering scalable, secure and high performing web solutions to large scale enterprise clients. Blogs at gonzalo123.com. Gonzalo is a DZone MVB and is not an employee of DZone and has posted 56 posts at DZone. You can read more from them at their website. View Full User Profile

5 Things You Should Check Now to Improve PHP Web Performance

07.11.2012
| 67501 views |
  • submit to reddit

We all know how financially important it is for your app’s server architecture to handle peaks of load. This article discusses 5 tips for improving PHP Web performance.

Primarily, you need to understand the key actions that are necessary to enhance the efficiency of your server-side PHP code. But: Why do you need to take those actions? If your application is running smoothly right now, is it worth the effort? Some actions require big investments. However, there are a lot of free resources available that can help you apply some easy changes.

The most important thing is performance data collection. If you want to improve something, you need to measure and compare the situation before and after. But what should you measure? I find that speed and memory usage are the most important, generally.  For PHP, page load times are the most important thing to measure. There are some other issues you can take into account, such as network latency, and filesystem I/O. But problems here will fall into the speed and memory usage category, and here we can measure them easily.

 

Advice: You should be able to switch on/off your monitoring system as it may interfere with performance. You can slow your application down significantly if you flood the code with logs, but sometimes those logs may be the main decision point to take corrective actions.  Find a happy medium and be careful.

 

You can use this code snippet to measure memory usage in PHP:
$time = microtime(TRUE);
$mem = memory_get_usage();

[the code you want to measure here]

print_r(array(

  'memory' => (memory_get_usage() - $mem) / (1024 * 1024),

  'seconds' => microtime(TRUE) - $time

));

Cache like there's no tomorrow

This is not an original piece of advice. This advice probably appears in all performance checklists, which reflects how important it is. There are several tools to help you with this task, including the mythical Memcache or the new and powerful Varnish. Essentially, you must ask yourself if you really need to execute the PHP code over and over. If the information remains the same or maybe your user can afford to see one snapshot of the real status, caches can save you CPU cycles and give you extra speed. There are several types of caches. This example deals with a server-side cache.

function slowAndHeavyOperation() {

    sleep(1);

    return date('d/m/Y H:i:s');

}

 

$item1 = slowAndHeavyOperation();

echo $item1;

This code will run for one second, due to the sleep function, to simulate one slow operation. Refactor this code to:

$memcache = new Memcache;

$memcache->connect('localhost', 11211);

 

function slowAndHeavyOperation() {

    sleep(1);

    return date('d/m/Y H:i:s');

}

 

$item1 = $memcache->get('item');

 

if ($item1 === false) {

    $item1 = slowAndHeavyOperation();

    $memcache->set('item', $item1);

}

echo $item1;

Now the script will take one second the first time, but it will take essentially no time when it runs additional times because you have cached the execution of the function. As you can see, it has one fee. Now the function will always return the same date instead of the current time. But Memcached allows you to set a TTL (Time To Live) in the data stored. With this feature, you can set one refresh policy to the cached data. Your outcomes are not really real-time, but the server will save a lot of resources, especially under heavy load and with a high number of concurrent users. See Memcached documentation here for additional information.

 

Advice: Keep in mind that Memcache does not persist the data. If you restart Memcache, you will lose all data. Your application must be able to rebuild the cache if it is empty. In other words, your application must work with or without Memcached. Do not rely on the existence of data, especially in cloud environments.

 

Memcached gives you a simple and powerful mechanism to create server-side caches. You also can create more advanced caches. You can cache different parts of your site with a different TTL. For example, you may want to cache for your page header for two hours and your sidebar for ten minutes. In this case, you can use Varnish.

Varnish is a mix of cache and HTTP reverse proxy. Some people call these kinds of tools HTTP accelerators. Varnish is very flexible and customizable. Modern PHP frameworks, such as Symfony2, have integrated Varnish because of its popularity.

To review, caches can help us in three ways: First with our CPU/Memory requirements, and second, with the page load times and as a result, the SEO. The standard Google Analytics considers any web page load time over 1.5 seconds to be slow. It’s important to know that slow pages have SEO penalties so we cannot take it lightly.


Loops are evil

We habitually use loops. They are powerful programming tools, but they can frequently cause bottlenecks. One slow operation executed once is one problem, but if this sentence is inside a loop, the problem is magnified. So, are loops bad? No, of course not, but you need to assess your loops carefully, especially nested loops, to avoid possible problems.

Take the following code as an example:

<?php

// bad example

function expexiveOperation() {

    sleep(1);

    return "Hello";

}

 

for ($i=0; $i<100; $i++) {

    $value = expexiveOperation();

    echo $value;

}

This code works, but it is obvious that you are setting the same variable once per cycle.

<?php

// better example

function expexiveOperation() {

    sleep(1);

    return "Hello";

}

 

$value = expexiveOperation();

for ($i=0; $i<100; $i++) {

    echo $value;

}

In this code, you can detect the problem and easily refactor. However, real life might not be this simple.

To detect performance problems, consider the following:

 

●      Detect big loops (for, foreach, ...)

●      Do they iterate over a big amount of data?

●      Measure them.

●      Can you cache the operation inside the loop?

○      If yes, what are you waiting for?

○      If not, mark them as potentially dangerous and focus your inspections on them. Small performance problems in your code can be multiplied.

 

Basically, you must know clearly where are your big loops are and why. It is difficult to memorize all the source code of your applications, but you must be aware of the potentially expensive loops. Yes, I know. This recommendation seems to be written with micro-optimization in mind (like: cache the result of count()) but it isn't. Sometimes I need to refactor old scripts with performance problems. I normally use the same pattern: Find loops with the profiler and refactor the heaviest.

We have one good friend here to help us with this job: The profiling tools. Xdebug and Zend Debugger allow us to create profiling reports. If we choose Xdebug we can also use Webgrind, a web front-end for Xdebug. Those reports can help us detect bottlenecks. Remember, a bottleneck is a problem, but a bottleneck iterated 10000 times is 10000x bigger. It seems obvious, but people tend to forget.

Queues are your friend

Do we really need to perform all the tasks inside the user request? Sometimes it’s necessary, but not always. Imagine, for example, that you need to send one email to a user when he/she submits an action. You can send this mail with a simple PHP script, but this action can take one second. If you wait until the end of the script, you will ensure that when the user sees the message "email sent" that the email has already been delivered. But is it really necessary? You can queue the action and free this one second from the user request. The email will be sent later and the user doesn’t need to wait until it has been sent. If the application is small, you can afford that. But if it scales, there could be a serious problem.

The amazing tool Gearman is a framework that allows you to create queues and parallel processing. Read the documentation for more information. The main idea behind Gearman is simple. Instead of executing your actions inside your scripts, you can define “Workers” that the main script will call.

The following is an example of Gearman in action:

Imagine a simple script to add a watermark to one image:

<?php
$filename = "/path/to/img.jpg";
if (realpath(__FILE__) == realpath($filename)) {
    exit();
}
$stringSize = 3;
$footerSize = ($stringSize==1) ? 12 : 15;
$footer = date('d/m/Y H:i:s');

list($width, $height, $image_type) = getimagesize($filename);
$im = imagecreatefromjpeg($filename);
imagefilledrectangle (
        $im,
        0,
        $height,
        $width,
        $height - $footerSize, imagecolorallocate($im, 49, 49, 156));

imagestring($im,
        $stringSize,
        $width-(imagefontwidth($stringSize)*strlen($footer)) - 2,
        $height-$footerSize,
        $footer,
        imagecolorallocate($im, 255, 255, 255));

header( 'Content-Type: image/jpeg' );
imagejpeg($im);

Now, instead of doing it online, you can create a Worker:

<?php
$gmw = new GearmanWorker();
$gmw->addServer();
$gmw->addFunction("watermark", function($job) {

    $workload = $job->workload();
    $workload_size = $job->workloadSize();

    list($filename, $footer) = json_decode($workload);

    $footerSize = 15;
    list($width, $height, $image_type) = getimagesize($filename);

    $im = imagecreatefromjpeg($filename);

    imagefilledrectangle (
            $im,
            0,
            $height,
            $width,
            $height - $footerSize, imagecolorallocate($im, 49, 49, 156));

    imagestring($im,
            $stringSize,
            $width-(imagefontwidth($stringSize)*strlen($footer)) - 2,
            $height-$footerSize,
            $footer,
            imagecolorallocate($im, 255, 255, 255));

    ob_start();
    ob_implicit_flush(0);
    imagepng($im);
    $img=ob_get_contents();
    ob_end_clean();

    return $img;
});
while(1) {
  $gmw->work();
}

And now the Gearman client in the main script:

<?php
$filename = "/path/to/img.jpg";
$footer = date('d/m/Y H:i:s');

$gmclient = new GearmanClient();
$gmclient->addServer();

$handle = $gmclient->do("watermark", json_encode(array($filename, $footer)));

if ($gmclient->returnCode() != GEARMAN_SUCCESS){
    echo "Ups something wrong happen";
} else {
    header( 'Content-Type: image/jpeg' );
    echo $handle;
}

The coolest thing about Gearman is that you can start as many Workers as you need in the same host or in another one. The client application will remain the same. It allows you to scale out your applications depending on your needs. Imagine that your mailing application works fine but that you suddenly increase your users because of a great market opportunity. Your web server is enough to assume the load, but the mailing service is insufficient. Instead of upgrading your whole server, you can set up new Gearman nodes in a new host or even in the cloud. Simple, isn’t it?

Now a sort list with possible usages of Gearman:

 

●      Massive mailing systems

●      PDF generation

●      Image processing

●      Logs

 

Gearman is widely used within Web applications. For example, sites such as Grooveshark and Instagram use Gearman intensively. When you share one photo to Twitter or Facebook, Instagram uses a Gearman task queue to perform the task. They have about 200 Python Workers. That is another cool thing about Gearman: it is language agnostic. You can use a Python client with PHP workers, Java Workers, C client, Perl, Ruby, and so on.

If you have more specific needs, you can also check out ZeroMQ, which is a messaging library that allows you to design powerful communications systems.

Beware of Database Access

This is probably the main source of performance problems. If you like betting, you could say that the problem with the performance of a site is due to the database access, without inspecting the code. Most likely, you're right. Database connections are expensive operations, especially with languages such as PHP, mainly because of the lack of connection pooling.

Moreover, the difference between a simple query using an index or not may be unbelievably big. Because I’m talking about differences here, it means that we need to measure. Remember the introduction: “You need to measure everything”? If you don’t measure, how would you know that you have improved the process?

The most important advice here is to check your database indexes. SQL queries using wrong indexes can significantly slow down an application's performance.

 

Advice: Checking on database indexes cannot be done only once. You must take into account that as your data grows, indexing may change.

 

Another important tip is the usage of prepared statements. Why? The answer is simple. Let me show you one example:

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$field1 = uniqid();
$dbh->beginTransaction();
foreach (range(1, 5000, 1) as $i) {
    $stmt = $dbh->prepare("UPDATE test.tbl1 set field1='{$field1}' where id=1");
    $field1 = $i;
    $stmt->execute();
}
$dbh->commit();

And another one:

$dbh = new PDO('pgsql:dbname=pg1;host=localhost', 'user', 'password');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

$field1 = uniqid();
$dbh->beginTransaction();
$stmt = $dbh->prepare('UPDATE test.tbl1 set field1=:F1 where id=1');
foreach (range(1, 5000, 1) as $i) {
   $field1 = $i;
   $stmt->execute(array('F1' => $field1));
}
$dbh->commit();

Both work. The first one sends the SQL update as one string and executes it 5,000 times. The database needs to compile each update and execute it. The second one compiles it once and executes 5,000 times with different parameters. There is another great benefit from using prepared statements, which is to prevent SQL injections. But if you are talking about performance, you need to take it into account.

Death By Traffic

What happens if your application is suddenly serving thousands of concurrent users? Will your server be able to handle it? It’s not easy to answer this question at a glance. If you need to check it, you have two possible ways to do so.

One is to test with 1,000 or more users in your development environment. If you don’t have that many people, you need to use tools to automate this kind of operation. There are several tools. The open-source solution apache ab. can create connections to your server and load test simple pages.

Right now I’m using the free version of Load Tester from Web Performance, Inc. It can automate test cases and unlike apache ab, it generates load from your network or a cloud system, such as Amazon’s EC2.  The free version can generate up to 1,000,000 concurrent users.

To run a test in apache ab you can use http://www.google.com/ as our test subject and then run apache ab with the following command:

ab -n 100 -c 10 http://www.google.com/

This command will create 100 connections to your server with one concurrency level of 10 connections at the same time. Let’s examine the output:

gonzalo@desktop:~$ ab -n 100 -c 10 http://www.google.com/


This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.google.com (be patient).....done

Server Software:        gws
Server Hostname:        www.google.com
Server Port:            80

Document Path:          /
Document Length:        218 bytes

Concurrency Level:      10
Time taken for tests:   2.222 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Non-2xx responses:      100
Total transferred:      98200 bytes
HTML transferred:       21800 bytes
Requests per second:    45.01 [#/sec] (mean)
Time per request:       222.174 [ms] (mean)
Time per request:       22.217 [ms] (mean, across all concurrent requests)
Transfer rate:          43.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       82  101   8.9    103     121
Processing:    90  117   8.5    118     144
Waiting:       90  117   8.5    118     144
Total:        171  218  13.2    218     266

Percentage of the requests served within a certain time (ms)
  50%    218
  66%    223
  75%    227
  80%    229
  90%    233
  95%    239
  98%    251
  99%    266
 100%    266 (longest request)

There are several very interesting results. Look at the Request per second, the Transfer rate, and the Time Taken for test. If you don’t want this raw output, you can save the outcome in a csv with:

ab -n 100 -c 10 -e test.csv  http://www.google.com/ 

Don’t let your application die from success if you need to scale or work in high-performance situations.

Summary

If you want to improve your Web performance, you need to answer these questions:

 

●      How many database connections do I have in my application?

●      How much time does each select statement spend?

●      How many select statements do you have?

●      Are they inside loops?

●      Do I really need them? Can I cache them at least with a TTL?

●      Is it really necessary to perform my transactions (Inserts, Updates) online inside the user request?

●      Is it possible to queue them?

●      Does my server support big load conditions and a high number of concurrent users?

●      How much CPU does the application use per request?

●      How much memory does the application use per request?

 

As you can see, there are a lot of questions that you must answer. Maybe you started reading this post looking for the perfect solution. Sorry, but there are no silver bullets. You must answer those questions depending on your needs and take the corresponding actions according to your application. There are different tools at your disposal which I have listed above, but there are plenty more out there and plenty being created each day.

Extra Credit: Front End

This article discusses backend development (in other words, PHP code). We, as developers, understand the difference between Frontend (JavaScript, CSS, HTML, ...) and Backend (PHP, Databases, ...), but the user doesn’t. The user only perceives the time between his click and the browser’s response. It is important to know that. Here, firebug or Chrome's developer tools are our friends.

Imagine this simple script:

<?php

// our amazing application

?>

<html>

 <head>                                                                  

 <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>

 <script type="text/javascript">                                        

     $(document).ready(function() {

        console.log("Hello")

      });

 </script>                                                              

 </head>                                                                

 <body>                                                                  

   <img src="http://placekitten.com/200/300" alt="img">

   <img src="http://placekitten.com/100/100" alt="img">

   <img src="http://placekitten.com/200/200" alt="img">

 </body>                                                                

 </html>



As you can see, the entire amount of time is not simply the application running the PHP script. We need to add the time that the browser takes to load and render all external resources, images, stylesheets, JavaScript, etc.

You can optimize the performance of your Backend by 90%, but you must realize that the Backend time is only 10% of the whole request script time.

Published at DZone with permission of Gonzalo Ayuso, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)