Posted: Sep 15, 2011
By: Dhwanit | 0 comments
Category: Apps

amazon aws, cloud

HomeBlogcloud → Drupal in the Amazon AWS Cloud
From our Design blog

How we built the Horlicks WizKids website

Horlicks WizKids is South Asia's largest interschool fiesta with over 200,000 children...

HTML and CSS tricks for good website design

Even the most experienced and best CSS/HTML designer out there does not have the vast myriad sets...

HTML 5: The future of rich-media web

In the ever-changing world of the web, its surprising to see how long HTML 4.x has held on to its...

From our Apps blog

Calculate your Amazon AWS Hosting Costs using Excel

One of the most common questions that come up whenever we recommend Amazon AWS as a hosting...

Drupal in the Amazon AWS Cloud

Recently, we worked on and delivered a user voting web application, hosted on the Amazon Web...

Apache (httpd) and lighttpd on an Amazon AWS Basic AMI

Over the last couple of days we did some intensive work on comparing execution of a complex...

Drupal in the Amazon AWS Cloud

Recently, we worked on and delivered a user voting web application, hosted on the Amazon Web Services (AWS) infrastructure, that scales up or down on demand, and in the process discovered and learned a lot many things about our choice of CMS, and the joy of discovering and developing for the cloud!


Drupal is our CMS of choice. All the websites we’ve developed over the last couple of years have been made using Drupal. While Drupal provides a robust base for developing advanced web applications with extended custom functionality, it still does rely on the LAMP (Linux, Apache, MySQL, PHP) stack and requires a locally available file system to store user-uploaded files. Drupal, out-of-the-box, is inherently cloud averse.

We had to find a way of making Drupal cloud-friendly since the web application’s fundamental requirement was to be running on scalable Amazon Web Services.

Read more about how we did it:

Application architecture for the Cloud

There were certain assumptions already in place before we started work on architecting the application:

  • The application will use Amazon Web Services: We would be hosting the Drupal database on Amazon RDS; the web servers will run on one or more Amazon EC2 instances.
  • The website will rely on Amazon Elastic Load Balancer (ELB) to drive traffic to the individual instances based on their health (processing capability or CPU utilization).
  • All compute intense processing will be on a separate EC2 server to be run asynchronously of the web application. Inter-server messaging will be done through Amazon SQS.
  • The load balancer will auto scale-up or scale-down EC2 instances based on triggered alarms programmed into the auto-scaling group.

With these assumptions, work on the application began. However, it was only until mid-way through the project that the first major bottleneck was discovered: Where would we store the files? One of the assumptions we had, that isn’t listed above, was that one master Amazon Elastic Block Store (EBS) disk would be available across all instances… sort of like a Network Attached Storage (NAS) that is mounted on all running EC2 instances with read/write capability. Boy, were we wrong on that one!

It soon became clear that a “locally available” file system for Drupal to manage files associated with the website wasn’t going to make the cut since EBS volumes could only be mounted on a single EC2 instance. Enter Amazon Simple Storage Service, or Amazon S3 to the rescue!

Storage architecture for the Cloud

Drupal’s behavior is to receive files uploaded by a website user and usually store it in the default location of /sites/default/files. This is a directory on the web server’s hard disk, which Drupal has read/write permissions to.

This behavior had to change such that Drupal never stores files on the local hard disk (or mounted file system). In addition to the primary problem of files uploaded by users that needed to be off the local server, a secondary issue was files generated by Drupal in the due course of normal website functioning also had to be off the local server.

Local versus S3 Storage

These were imagefield thumb files that Drupal generated when users uploaded images; resized imagecache files of certain dimensions based on website look and feel that were generated the first time these imagecaches were fetched by a website visitor and aggregation of CSS and JavaScript files generated for optimization.

The solution was to modify both Drupal core and the various contributed modules to work in the cloud environment:

  • Aggregation: We modified the Drupal internal routines to generate aggregated CSS and JavaScript files to store the aggregated files directly into the static S3 bucket instead of the local file system.
  • Imagefield thumb: We modified the imagefield thumbnail creation routines so that that the created imagefield thumb is moved into the public S3 bucket from the local file system.
  • Imagecache: Image caches are generated when a user tries to fetch a non-existent image from the supplied caching URL on the web page (/sites/default/files/imagecache/<cache_name>). We modified the imagecache module so that the moment it generates the imagecache, our code moved the imagecached file into the public S3 bucket. This works like a charm because the current request returns the images from the local file system in its response, but future requests will directly be to S3 instead of an EC2 instance behind the load balancer.

For all this to work, a separate table was created in the database that maintained a list of Drupal files, which had an associated file in the application’s various S3 buckets. Overriding theme routines that pushed out an S3 URL on a web page were also developed. These routines checked the new table and pushed out S3 URLs for file already created and stored in S3, or returned Drupal default URLs if they weren’t found in S3.

Compute intensive media processing

Some of the files that were uploaded by website users weren’t just images. The major component of the website was to allow music artists to upload original songs as MP3 files. The website then had to process this media to generate several files of varying durations and quality. Since media processing is compute intensive and time consuming, it had to be done asynchronously and not as part of web requests coming in:

  • An artist would upload an MP3 file, which would straightaway be stored on S3.
  • Once consent for publishing has been provided, the web application would send out a request to a separate media-processing EC2 instance via Amazon SQS.
  • If the media-processing EC2 instance wasn’t running, the application would “wake it up.”
  • The media-processing EC2 instance reads from this pre-processing SQS queue and performs all the necessary compute and memory intensive tasks of adding effects or resizing the track etc.,
  • On success (or failure), the media-processing EC2 instance would write out an appropriate message to a post-processing SQS queue.
  • The web application would periodically (on cron run) check the post-processing SQS queue and appropriately update the database with the information returned from the media-processing EC2 instance.
  • When the media-processing EC2 instance didn’t have any pre-processing messages within a certain period of time, it would shut itself down.

Yes, there were error checks involved and we tried all operations multiple times before the application or the media-processing instance altogether gave up.

Load Balancing and Auto Scaling

The final piece of the jigsaw was to ensure that as demand went up, the system scaled up horizontally and when the system detected that CPU utilization was low, the system scaled down by shutting off unused or little used instances automatically.

Amazon AWS Cloud Based Application Architecture

Amazon makes it very simple to achieve all this by using a load balancer that routes user traffic to one or more application EC2 instances. To determine the actual instance to route traffic to, the load balancer analyzes received health check metrics generated by all EC2 instances behind the load balancer.

We created a launch configuration with a custom built AMI for this web application. A script then creates an auto-scaling group, programs the scaling policy and sets up the trigger alarms to execute the scaling policies. This is where a dedicated systems administrator can be very helpful to monitor traffic and make necessary changes to the scaling policies.

Optimization and Testing

Once the beta site was up, a combination of ApacheBench and Blitz.io testing was run. ApacheBench generated simultaneous access by 100 signed-in users with good results. In parallel as the ApacheBench signed-in user test was going on, multiple Blitz.io rushes were performed.

In order to boost the results, a separate EC2 instance running the MemCache daemon was also set up. This not only had the effect of reducing the loads on the RDS instance, but since a lot of data returned back to the visitor was from memory, the performance was much faster. It turned out that a micro EC2 instance with MemCache running was sufficient for processing close to 6 million requests a day! CPU utilization for the micro instance never went up beyond 20% even at full load (while network I/O showed a high level of activity transferring hundreds of megabytes of data directly out of the memory cache)

In the end, the web site loaded and returned pages back to the visitor in just under one second from anywhere in the world.

Blitz.io Rush results

End notes

While there are a lot many more optimizations and additions that can be done, costing definitely was a factor in determining what gets left out of the initial deployment.

The Amazon CloudFront content distribution network (CDN) was given a pass since S3 in itself was very fast (as evidenced by the Blitz.io testing from different locations worldwide). Similarly, an RDS read-replica could have been instantiated in a different availability zone to make the website data more robust, but in the end daily backups on the master RDS instance were sufficient with its processing power never exceeding 50% during the testing.

All-in-all this was a great website to build and we were thrilled that we could use Drupal and modify it to suit the cloud!


Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <p> <h2> <img>
  • Lines and paragraphs break automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Enter the characters shown in the image.

Virtualize your business processes!

Here's music to your ears: Our thorough knowledge of integrating CMSs such as Drupal, with application frameworks like CakePHP, can help you take the risk out of choosing and implementing a CMS solution for your website.