Image saving and processing for high traffic sites

I think everyone has already seen code similar like this:

What’s wrong with this? Let’s assume you have an image library behind the Thumb helper. It will process the image supplied in $image. It will try to create the thumbnail, opening a file if it does not already exist at the destination and start processing the image. Once complete, it will write it out to the – as long as process takes – locked file.

The issue with this is the race condition in which another user attempts to load the same page, before the initial image resize and write process has completed. The helper will do exactly as it is told, and attempt to create the image thumbnail as it doesn’t yet exist. Consider a very busy site, and you can see that this would quickly bring a server to its knees.

This is not as much of a problem when you have just a few visitors per day, or per hour. However, if you have lots of visitors and your recently uploaded article appears on your homepage, you can be sure this thumbnail is going to take a lot of hits and the described scenario is very likely to happen.

So, when should you create thumbnails?

Usually you should know the desired image dimensions for all the images on your website. By this you’ll be able to get a list of sizes into an array. I personally prefer to put this list by model into a separate config file in APP/Config/thumb_sizes.php.

Assuming that you follow the MVC pattern, you’ll validate your uploaded form in the model. Once validated, the model performs any processing in the beforeSave() or afterSave() methods. This is an ideal location for the processing of images. You would begin by reading your configured sizes:

Show Plain Text

Run a foreach over the image sizes and create the thumbnails for your image after it was successfully uploaded.

How would you store or name your images?

How you store the images exactly is up to you but it is recommended to store them in a deeper nested folder structure like /images/users/6k/51/d3/ to get around performance issues with file system limits and slowdowns. Check my very old article about the media view and file storage for that.

To avoid the need to store all the thumbnail paths, it’s recommended to store the files in a syntax like this:

Basically you store all the processing you’ve done in the filename string. This will also allow you to delete a certain type of processed image later without much effort.

To get some additional performance you could store images that do not have to be secured inside the webroot, such as:

You should hash the whole filename using md5 or sha1 and not store it in the form like it is shown above! The filenames above were just choosen to be more clear to get a better understanding of how to store the different versions of an image.

If you now think you need to protect user avatars I think protecting them from appearing in search engines a proper entry in your robots.txt should solve this problem for you.

Of course, this is not an ideal mechanism for protected against sensitive and personal information such as medical images or data. If you need to protect images you’ll need to go the whole way through the CakePHP dispatcher and check the requested image for permissions and send it to the browser using the good old Media view. And do not store them in the webroot!

How do you generate new images?

Simple answer: Write a shell for that task.

It might take a long time to generate all the new images depending on the amount of images and changes you want to apply. It’s strongly recommended to set the priority of this process to a very low priority so that more important tasks like serving your site won’t become affected by this. As an alternative you can simply limit the process also to just a limit of using not more than X% of the processing power. You can use the tools of your OS to archive this.

Another solution is to push them off to a worker process on a separate machine, enabling a larger, flexible and scalable solution for your web application. This is something we’ll talk about in another article.

What if I have to generate images on the fly?

Well, if you really have to, you should secure them at least. People can do funny things with links like this:

Just write a tiny script that will increment the width and height in one pixel steps and the script on the server will generate a large amount of images within a very short time. It will either lock your server or fill your storage device over time.

The solution is, same as for form submissions, to secure them with a hash. This requires a helper that will generate a hash with a salt and attach it to the image url.

The controller that will process the request will need to check the params, rebuild the hash from the params (excluding the passed hash), and check the hash build in the controller against the passed hash in the url. If they don’t match somebody modified the url.

The Imagine plugin – everything in a plugin

While phpThumb is highly successful and widely used, it has its own issues and quirks. I was looking for a powerful modern alternative and found Imagine. Imagine is a PHP 5.3+ image processing library that is well tested and has a nice interface.

I’ve written a CakePHP plugin for Imagine that will do the processing for you and it also contains a helper that will build secured urls for on the fly thumbnail generation. The actual image processing happens in the model by the attached Imagine behavior.

It’s fresh out of the oven and ready to use! Feel free to provide feedback on the plugin, fork it on Github and contribute through pull requests. If you are using this in your project, drop me a line and let me know!