I have been having a problem at work. My homepage is now getting 100,000 hits a month. My home page is big unfortunately, about 43Kb plus auxiliary files. But the worst part is the 10 different calls to the database under 5 different modules, each with it’s own connection to the DB. This racks up some serious MySQL disk IO along with logs, and all that other stuff. Needless to say, I have been encountering some minor performance issues.
Simple solution is to cache the homepage.
I want to be able to regenerate the homepage whenever I want. I want the homepage to regenerate itself periodically, especially immediately into a new day as much of the content on the homepage is date specific. I don’t want to do any extra work to process this cache.
For my implementation, I am taking advantage of the fact that Apache will search for an index page: index.html, index.php, default,htm etc. I let index.html be my rendered, static copy of my homepage with index.php being the dynamic version. The big trick is that every time index.php is called, it writes out to index.html. Now it’s just a matter of getting index.php called at appropriate times.
In index.php, I am using php’s output buffering to capture the content of the homepage. I then write that buffer, unchanged, out to index.html and to the web-browser.
At the top of my homepage you would see the following:
/* Output Buffer
* We are doing something a bit odd here. The home page is the most hit page on the site. There are a LOT of calls on the home page to database, etc.
*
* In an attempt to make the server load go down, this index.php file will write out an index.html file as a static file
* This takes advantage of the fact that apache is looking for index.html before index.php
* running index.php will create index.html
* removing index.html will cause index.php to run and create index.html
*
* A cronjob removes index.html hourly
* Admin tools that modify data that could be displayed on the homepage should cause to have deleted index.html
*
* You can manually force an update to index.html by loading www.r-world.com/index.php (a clever way is to have the webmaster bookmark index.php as his start page.)
*
*/
function callback($buffer)
{
// lets write this page out to index.html.
$indexbufferfilename = 'index.html';
$indexbuffererror = '';
if (!$handle = fopen($indexbufferfilename, 'w')){
$indexbuffererror .= "Cannot open file ($indexbufferfilename)n";
exit;
}
if (fwrite($handle, $buffer) === FALSE) {
$indexbuffererror.= "Cannot write to file ($indexbufferfilename)n";
exit;
}
//echo "Success, wrote ($somecontent) to file($indexbufferfilename)";
fclose($handle);
if(!empty($indexbuffererror))
{
mail('[email protected]', "Error in creating homepage", $indexbuffererror);
}
// send the page to the browser
return $buffer;
}
ob_start("callback");
I then have the content of my homepage. I then end the page with a ob_end_flush();
as the very last line of code on the page.
This takes care of the creating of the homepage cache.
In order to dynamically have the cache created, I added a few lines of mod_rewrite to my .htaccess file that will check to see if index.html exists and has something in it or it calls an alternate file, my index.php. I found the idea on theApache1.3 URL Rewriting Guide.
My .htaccess file looks like:
Directory Index index.html index.php
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-s
RewriteRule ^index.html$ index.php [L]
I have several administration tools to edit data that is being displayed on the homepage. I have added an include to a file that simply deletes the index.html page. Now, whenever I modify data that may be displayed on the homepage, it is automatically displayed on the homepage as the homepage is re-created. I also have a cronjob that runs this file at the top of every hour.
As for how well it works. My average number of MySQL connections per second fell from 32 to 26 for a 20% reduction in MySQL calls. The CPU usage used by MySQL fell a couple of percentage points on the server, as several of the SQL statements for the homepage are 7 SELECT UNIONs that aren’t exactly trivial. The system Load Average also dropped about 33% as MySQL and apache aren’t hitting the drive near as much anymore.
The best part is that the PHP processing time has been taken out of the Load Time for the homepage. 1 poor sod per hour, and the first person to get the latest update will need to wait for the PHP processing (which everyone used to do anyhow) and writing out to disk. I also set me and my bosses bookmarks to load the index.php version of the page as whenever one of the two of us are looking at the site, we are likely looking to verify a change, and this will force an update.
Clever, eh?