Heavy Traffic: 750,000 hits in 3 days

If I seem a little distracted …

2 weeks ago, I started a small blog called App Rejections. Here’s the traffic graph for the first two weeks. The tallest peak over towards the right is 25,000 unique visitors hitting the site:

traffic graph ar 3 days

It could have been worse; being a die-hard pessimist when it comes to server management, I’d left the webserver “throttled” to a fraction of max connections. The jumps in traffic when I selectively removed that throttling suggest that overall perhaps 10% of visitors couldn’t get through (as high as 35% at peak times).

I *believe* there will be no performance problems now. I should be able to take those peaks again without the server dieing – I’ve put some simple cheats in place, like a pre-generated static front page that I manually cache. Although I’m also hoping to double the RAM in that server (it’s only got 1GB right now!).

What happened?

Well, apart from the obvious (lots of news sites picked up the site, and a few big-hitters on Twitter too, the biggest being someone with almost 1 million followers. Awesome! But also … Argh! Webserver death!).

Actually, what happened was my own laziness combined with a cluster**** between Apache2 and PHP5.

Basically … PHP is a fine language with a sometimes-dubious runtime and a terrible set of libraries, much of which remained incompatible with Apache2 for a very long time. Allegedly, some core parts of PHP’s sphere of libs are *still* only compatible with Apache1 (or the backwards-compatible part of Apache2 – i.e. none of the performance boost). (really? is this still true? hard to believe)

And while I *should* have been running Apache2 in modern mode, ditto PHP5, the last time I checked PHP and my PHP libs, they weren’t yet compatible with Apache2. There are workarounds, but they were enough effort (and this is just a small server) that I hadn’t bothered. In particular, I probably should have setup FastCGI to workaround the PHP incompatibilities. But it’s been a LONG time since I’ve used FastCGI…

Running Debian, I’d left this server with the standard PHP setup: Apache2 in backwards-compatibility mode, and PHP in a default config. This server does relatively little webserving; I figured it wouldn’t matter much. Especially with the “pessimistic” Apache2 settings.

Unfortunately … Apache2 in compatibility mode reserves RAM per HTTP connection, and never releases it, even after its no longer needed. Even with my concurrent connections throttled to a low sensible number (20 in this case), *some PHP process somewhere* was taking advantage of the oodles of RAM, and sucking up all the RAM in my server.

For no reason that I can tell. Which is worrying.

I’ve now put PHP back down to a hard limit of a few tens of megabytes, and – so far as I can tell – no part of any PHP app is failing. I’ve also set a 100-requests limit per process, which will force Apache2 to recycle *all* RAM every twenty minutes or so at high load.

I’ve de-throttled the concurrency, so the webserver can still use all available RAM if it really wants to, but this time only under high traffic – not “merely because some crappy PHP script somewhere is getting greedy”. There’s no complaints in the error logs. I removed Cacti from this server a few months ago (there were bugs in the Debian graphs that I got fed up with), so it’s not that.

WordPress? Probably…

What I didn’t try, but should have

For some reason, I don’t have any PHP accelerator on that server. There was a reason, I can’t remember what it was.

If I had, then I believe I would still have had the same problems, mainly because the real problem was PHP using up too much RAM. But *maybe* things would have run a lot better – I don’t know, because I didn’t try.

The unfortunate pain of PHP and RAM

The thing that I find annoying in this (apart from the long slowness of getting full Apache2 compatibility) is the number of standard PHP apps that “demand” hundreds of megabytes of RAM each … per request. If you google the PHP error message for “script ran out of memory” (I can’t remember the exact text) for any particular webapp, you’ll find hundreds of hits from developers saying “edit php.ini to increase that limit to 200Mb” or similar.

At 200Mb, your 2Gb RAM webserver will thrash (and soon crash) the moment ten people try to view a webpage or download anything – CSS, JPG, GIF, etc – at the same time.

Ten people. Right. And that’s assuming you don’t have a database anywhere, or any other services running.

And the sad truth is that there’s nothing you can do about it (unless you have a *lot* of time to devote to rewriting the PHP apps yourself, or engaging in various convoluted sysadmin tricks like having two independent copies of the webserver – one for the app, one for the rest of the world).

A lot of modern CMS’s seem to “recommend/require” 70-120Mb. This is insane. What are they *doing* with all that RAM? More to the point … WHY?

In the end, it’s probably an example of my number 1 reason to avoid PHP:

99% of PHP programmers either have no idea what they’re doing … OR … so utterly over-engineer every PHP app that it does far too more than what PHP is meant for *and optimized for*

Sometimes I wonder whether as much as 30% of all PHP programmers secretly wish they were writing J2EE apps instead, but won’t admit it to themselves

The kind of things that a CMS is doing to soak up 100Mb per process make perfect sense inside a Java VM container – and, actually, will work fine without killing the server. They make no sense at all in a PHP project. You *need* a powerful VM to be doing that kind of thing.

Notes to anyone with similar problems

  1. If performance gets really bad, use the very old technique of:
    1. restart the webserver (this kicks off all the current users, *and* cleans out RAM)
    2. reload the main pages of the site in your browser, ASAP
    3. save those pages to disk
    4. edit apache’s site config, and add an AliasMatch line for each saved file (saving the front page of the site is always a good start, that’s what most incoming traffic is probably pointing to – use “AliasMatch ^/$ /var/www/path-to-saved-front-page.html”)
    5. restart the webserver
    6. try reloading the affected pages, while the server is under load, to check you didn’t get the AliasMatch rule wrong or something equally stupid
  2. Run “top”, and see how much RAM each apache2 process is using
  3. Edit apache2.conf/httpd.conf, and make sure that MaxClients is less than ((RAM in this server) / (answer from top above))
  4. Edit php.ini, find the line about max RAM usage, and laugh at the “recommended” default of 128Mb. You must be joking. Change it to something sensible – and then redo all previous steps above
  5. Run “ab” to rapidly loadtest your server from the commandline and check that the above things are working. Especially useful for forcing lots of traffic so that you can see all the processes jumping to the top of the list in “top”

But, really, if you don’t already know all that, you should probably be finding yourself a qualified sysadmin who does … or doing a lot more detailed learning instead of relying on some throwaway comments I’ve put on my blog! :)

5 thoughts on “Heavy Traffic: 750,000 hits in 3 days

  1. adam Post author

    Sadly, SuperCache current version is a trojan / malware – it requires you to make your wordpress folders world-writable, which is insane and unnecessary.

    The workaround instructions no longer work, so … I’m not going to bother reading the source and working out what idiotic thing the author is doing, and I’m certainly not removing all file access permissions.

  2. Matthew Weigel

    You briefly touch on FastCGI, but having run up against Apache-related performance problems too many times, as well as security problems with mod_php, and so on… my best advice is to try it, particularly with a faster web server such as lighttpd or nginx. Among other things, you can:
    – remove the web server itself from the performance (particularly memory utilization) equation, leaving you with (in most small cases) PHP + database;
    – prioritize and specialize PHP instances for different web applications (small number of low-memory instances for reporting tools, more and more bloated instances for the main application);
    – load balance PHP instances across multiple machines (this isn’t a noticeable benefit most of the time, though, because you probably don’t need another server, much less want to buy it, but it could drastically simplify using e.g. EC2 for scaling on demand);
    – and secure PHP instances from each other, the web server, and the system as a whole via individual chroots.

    It’s not particularly well-suited for a hosted environment, so it’s not typically considered for anyone who starts with a hosted web site and transitions to running their own servers, but it’s an approach that has served me very well even running very old hardware.

  3. adam Post author

    @Matthew

    So … part of the reason it’s been so long since I touched FastCGI is that I’ve been using a different approach to solve all the above problems :). I wonder if you’ve tried this too?

    I’ve been using cache-centric webservers, so you have three things going on:
    1. frontend of webserver is actually a cache not an httpd (usually Squid)
    2. mid-tier is multiple webservers. Where you describe using FastCGI to separate-out different use-cases to different daemons, I’ve been using an intelligent front-cache to perform the same kind of routing
    3. back-end uses memcached to cache slow resources (e.g. database access)

    I used to use thttpd and lighthttpd for serving static files, but in the end just moved to putting everything in caches, and trusting that the cache was fast enough: it made the systems slightly less effort to administer.

    (I got fed up of debugging multiple unique webserver codebases in parallel)

  4. Matthew Weigel

    Well, memcached on the back-end works just as well for what I describe, which really leaves the differences in the front and middle tiers. I think lighttpd (including mod_cache) is going to be massively simpler to configure in the kind of situation we’re talking about than squid + Apache and is going to involve less duplication of functionality between layers.

    Lighttpd at the front is going to be lighter than squid, support similar (but different) caching mechanisms, and will let you extract the extra weight that Apache adds in the middle tier (never mind not needing to speak HTTP in the middle tier, Apache introduces more heavy processes). In my experience, there aren’t that many features you still might need Apache for, so you don’t have to have multiple web servers running either.

Leave a Reply

Your email address will not be published. Required fields are marked *