Multithreaded Apache In Small VPS

August 9th, 2009

My best-performing small VPS setup was with lighttpd and FastCGI PHP, but I got tired of trying to make rewrites work in lighttpd and switched to a two-process prefork Apache with mod_php and Squid as a web accelerator. That worked pretty well, but not as fast as lighttpd and FastCGI. What I really want is a multithreaded Apache and a FastCGI PHP that will fit in my small, cheap VPS.

I had tried Apache’s worker MPM and FastCGI before, but at the time both Apache and the PHP FastCGI process bloated and took up all my RAM despite my settings. Recently I decided to try again and was able to find out how to make it work.

Under Linux, by default each thread is assigned 8MB of stack memory, so an Apache process with 25 threads would try to take up 25*8=200MB of RAM!!! Plus the size of the Apache parent process, plus anything else that runs on my VPS. Not going to work in my small VPS. However, each thread doesn’t really need that much RAM. In fact, 128k is working fine for me so far. Apache 2.2 has a new directive ThreadStackSize for the worker MPM, and I set mine to “ThreadStackSize 131072″, and now I can have two Apache processes with 25 threads each taking up about 25MB worth of privvmpages. Another way to accomplish this is add “ulimit -s 128″ to the Apache startup scripts. For Apache 2.0 you have to do it this way. Since I am using Apache 2.2 I didn’t have to use ulimit, but when I was testing the effects of changing the stack size I used this script which worked as a temporary measure:

#!/bin/sh

ulimit -s 128
/usr/sbin/invoke-rc.d apache2 restart

My 25 MB RAM usage above is without mod_php, though. Google searches lead to conflicting information about whether PHP is thread safe, so I want to use FastCGI. My problem with Apache FastCGI before was that it spawned several times as many PHP processes as I thought I had told it to. I was using mod_fcgid and pointing it to the same FastCGI PHP wrapper script that I had used for lighttpd. But that script set PHP to launch child processes, and I have since learned that mod_fcgid does not multiplex and therefore will not use the child processes. Instead it launches as many processes as it sees fit, and my configuration had each of those launching 4 children. No wonder my RAM got chewed up so quickly. So now I am letting mod_fcgid call /usr/bin/php-cgi directly:

<IfModule mod_fcgid.c>
        AddHandler fcgid-script .fcgi .php
        # Where to look for the php.ini file?
        DefaultInitEnv PHPRC        "/etc/php5/cgi"
        # Maximum requests a process should handle before it is terminated
        MaxRequestsPerProcess       1000
        # Maximum number of PHP processes
        MaxProcessCount             4
        # Number of seconds of idle time before a php-cgi process is terminated
        IPCCommTimeout              120
        IdleTimeout                 120
        #Or use this if you use the file above
        FCGIWrapper /usr/bin/php-cgi .php
</IfModule>

Unfortunately, since each PHP processes is launched separately, any caching such as eAccelerator or APC will not be shared across each process. And each process uses up another X MB of RAM for the cache. So if I’m using APC with the default 30 MB cache and have 4 FastCGI PHP processes going, my APC caches are taking up 120 MB all by themselves! At the moment this is exactly what I’m doing, because I’ve moved up from a 256 MB VPS to a 390 MB VPS, and my total memory usage seems to be hovering near 256 MB when all processes are running. However, when PHP processes aren’t needed, mod_fcgid will kill them off to save memory, so most of the time I’m using much less RAM. I will see if I can set up the FastCGI processes like I did with lighttpd and then connect to it from Apache. I think it’s doable, but I haven’t tried yet.

I like using the worker MPM and FastCGI better than using the prefork MPM, mod_php and Squid. First of all the log files are a lot easier to parse. Apache (as I have it configured) can handle 50 concurrent static requests including 4 concurrent PHP requests, and I was able to enable KeepAlives again so my sites feel more responsive. With prefork Apache and Squid I could see a row of GIF smileys load up left-to-right when posting a reply in one of my forums. Now it happens so fast I can’t see it anymore. If I can get the PHP cache–previously eAccelerator, but I just switched to APC–to share itself across all my PHP processes then the overal RAM usage will be much lower, and I’ll be able to have more PHP processes.

And I like the new setup better than lighttpd because I can use the Apache rewrite rules provided by software programs like Drupal and SMF rather than try to translate them into lighttpd rewrites.

Using Squid To Cache Apt Updates For Debian And Ubuntu

July 5th, 2009

I run several Debian-based Linux machines and virtual machines at home and periodically install or reinstall one to test something. They all need updates—and mostly the same updates—so I wanted to cache the updates locally rather than download them several times when I upgrade.

There is an apt-proxy package, and although I can’t recall the problems with it I remember deciding it was not going to work well for me. I could rsync the entire package archive, but that’s just wasteful. I finally decided on setting up a Squid proxy dedicated—by intent, not controls—to caching deb packages from Debian and Ubuntu archives. And rpm’s and such if I should use other distro’s.

So I set up Squid and looked through the configuration options. Squid is by default set up to be most efficient at getting cache hits. I wanted to be sure it doesn’t expire the seldom-accessed large deb files to make room for tiny files, so I changed the cache replacement policy to LFUDA to optimize byte hit rate. I also increased the maximum object size to 100 megabytes from the default 4096 kilobytes. In a typical Squid cache the larger files aren’t cached because they often aren’t requested as often as smaller files by web surfers, however my cache’s purpose is to save these large files locally for updating several machines.

Now I needed to make my machines use the proxy for apt. For that I just added a code snippet to each /etc/apt/apt.conf, or in my cases I just slipped this file named jimproxy into /etc/apt/apt.conf.d/ :

Acquire {
        Retries "0";
        HTTP {
                Proxy "http://address-or-URL-of-squid-proxy.example.tld:3128/";
        };
};

Now when I run apt or aptitude or any manager that uses apt, they will use my Squid proxy to obtain the distribution packages.

This worked quite well, but I recently noticed some problems. The issue appeared to be that there were missing deb files from the archives, but what really was happening was that new Package.bz2 lists were on the archives, but my Squid cache was serving older lists it had cached. It listed some older packages which were no longer there. So my “apt-get update” would read an old package list and then “apt-get -u upgrade” wouldn’t find those older packages. So I need to tell Squid to be sure to check for new package lists. To do that I changed the refresh pattern option for “refresh-ims”. Voilà, it works properly now.

Squid.conf lines before:

# maximum_object_size 4096 KB
# cache_replacement_policy lru
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320

Squid.conf lines after:

maximum_object_size 100 MB
cache_replacement_policy heap LFUDA
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320 refresh-ims

I turned on refresh-ims for everything, but I probably would have been fine with turning it on for just the frequently-changing files as shown in the following code. But in my case I don’t think turning it on for all files will adversely affect things.

maximum_object_size 100 MB
cache_replacement_policy heap LFUDA
refresh_pattern ^ftp:          1440    20%     10080
refresh_pattern ^gopher:       1440    0%      1440
refresh_pattern Packages\.bz2$ 0       20%     4320 refresh-ims
refresh_pattern Sources\.bz2$  0       20%     4320 refresh-ims
refresh_pattern Release\.gpg$  0       20%     4320 refresh-ims
refresh_pattern Release$       0       20%     4320 refresh-ims
refresh_pattern .              0       20%     4320

2008 Update on Running Drupal on a Small VPS

November 19th, 2008

For a year or two I was successfully running Drupal with lighttpd and fastcgi. Lighttpd is very efficient, and having 4 fastcgi PHP processes let me limit the memory php used while keeping my small sites responsive enough. But redirects and URL rewriting are done differently, and it was tricky at times to get it to work the way I wanted with Drupal. I eventually got tired of wrestling with rewrites and redirects with every new non-Drupal php-based app I wanted to try, so I started thinking about how to make Apache work for a small site.

Now I am running Apache2/mod_php with a Squid front-end cache. If you’re not familiar with Squid, in brief it is a web proxy that caches web requests. It is often used to speed up clients but can be reversed to cache requests on the server end in web accelerator mode. Apache uses more RAM than lighttpd/fastcgi, and Squid uses RAM for itself and the cache, so I had to cut back two 2 Apache processes.

That may sound like too little, but here’s how it works well: the threads aren’t stuck delivering a request to a slow remote client because the local Squid cache accepts the request locally and then delivers it to the client freeing up the Apache process to handle the next request. 2 processes can handle my traffic because they can quickly deliver their payload to the cache and move on, and Squid can handle the delivery over the network to the client.

Of course sometimes the processes get hung up on slow MySQL queries or slow PHP queries, so occasionally I had delays. In particular I got rid of my RSS requests, because the Apache process had to wait while requesting RSS feeds from other sites, and sometimes that got slow or even timed out leaving just one Apache process handling requests, and if it stumbled on something then I’d have client requests waiting in line not getting served. So I got rid of my news feeds. Note that I am talking about my web site pulling feeds from other sites; of course I can offer RSS feeds for my Drupal blog with no such PHP delays. I also strive to keep my MySQL running smoothly, but everyone should do that, anyway.

The downside of using Squid is logging. Since Squid is my front-end web server I have Apache listen on 127.0.0.1:80, and Squid accesses it there. So my Apache log files show all requests coming from 127.0.0.1, and many static page or image requests don’t come through because Squid has them cached. However I configured Squid to log in Apache log format and just use those logs instead.

Of course dynamic content from Drupal has the nocache header, so Squid isn’t caching the dynamic content for future requests, but it still frees up Apache while delivering it to the client. It does cache the static files like images, style sheets and javascript files, so the Apache threads mostly focus on dynamic content only.

Another way I keep memory usage down is with eaccelerator. It caches PHP scripts so they don’t have to recompile every time they’re run. This can save memory in addition to processor time. After changing Drupal or any of my scripts I usually delete the cache and click around my sites to force all the php to run so eaccelerator will cache it. Then I restart my php processes (Apache2 in the case of Apache/mod-php or the fastcgi server if using fastcgi) to lower their memory usage. After that the cached scripts should run and the PHP processes shouldn’t bloat as much. Note that every time PHP is updated eaccelerator must be recompiled. In older PHP versions it would crash if you didn’t, but now it just silently (except for a log entry) fails to cache your scripts if you forget to recompile after a PHP update.

With lighttpd/fastcgi I was able to run 4 PHP processes (memory_limit from 8MB – 16MB), lighttpd, MySQL and Exim (my mail daemon) in a 256mb VPS with good speed. With Apache2/mod_php I am running 2 Apache2/mod_php processes, Squid (8 MB cache memory), MySQL and Exim in a 256mb VPS. Having only two processes forces me to watch for slow requests like a hawk, but Squid takes care of slow clients. I still ran into memory problems occasionally, but now I have a 384mb VPS and haven’t had a privvm failure yet.

Logitech Mouseware 9.76 Crashes Windows XP Service Pack 2

July 18th, 2007

A client was getting blue screen errors with STOP 0×0000007E. This indicates drivers or hardware, and given that the machines were fine until upgrading to WinXP SP2 I decided it must be a driver.

Through trial and error I found that Logitech Mouseware 9.76 was installed on the PCs and was causing sporadic blue screens when a USB mouse was connected. Just uninstall Mouseware, let Windows XP install the native driver and all is well.

UPDATE: It has come to my attention that this problem may be related to DeviceLock USB security program installed on my client’s computers. It’s possible that Mouseware and SP2 alone won’t cause the blue screen crashes but that Mouseware and DeviceLock are conflicting.

VPS and Sneaky CPU Problems

October 16th, 2006

While trying to find the perfect balance to make my sites run well out of my small VPS I noticed that my CPU usage was spiking. Unlike memory issues the Virtuozzo Power Panel didn’t issue a QoS alert for CPU overages. Apparently when you use too much CPU you just don’t get cycles for a while. Due to the spiky nature of CPU usage you don’t see the problem unless you catch it near the top of a spike.

I realized that my backup processes were helping to spike the CPU. I have a cron job to a mysql dump, and I have a remote machine regularly ssh/rsync in to copy files off. ssh and rsync use quite a bit of CPU. I should’ve realized that would happen, but “duhhhh”. I changed all my nonessential scripts to “nice” the commands. rsync was a bit tricky to get “nice”d, but I found the answers via Googling. What’s interesting is that Virtuozzo doesn’t seem to count the nice’d processes against the VPS’s CPU usage. They used spare host cycles apparently, and there are tons of spare host cycles. I think I actually sped my backups up with “nice”.

It’s tempting to try to run services nice’d, because I seem to get more “spare” cpu cycles than I get normal ones, but I have a feeling that will cause one problem or another down the line.

Drupal and Small VPSes: Resource Issues

October 13th, 2006

I never did upgrade my VPS RAM. Part of it is laziness, but part of it is that I keep thinking my web server doesn’t do enough and isn’t busy enough for 256mb to not be enough.

I’m using mod_php for drupal and CivicSpace on several sites. I’m running Apache 2 with the prefork MLM. The problem with this setup and limited resources is that the running Apache processes bloat to handle the biggest PHP script they’ve run. To counter that I reduced the number of Apache processes. Per earlier blogs, I also deleted unused drupal modules and all my sites work fine under a PHP memory limit of 12mb. Between those two things I’ve kept my memory issues at bay.

However, running only 4 Apache proceses is causing problems, too. If a PHP script is slow to complete due to business or MySQL slowness, then that thread can’t handle any more requests.

Using the Apache worker MLM would relieve both the sustained memory bloat issues (memory can be released upon completing the PHP script) and the concurrent connection issues (no problem to make a new thread to handle a new request), but then you have all that PHP & thread stuff to worry about.

I started looking into another solution that I’m going to try: FastCGI. With FastCGI you take mod_php out of the web server and run a persistent PHP (or other language) interpreter. The web server passes requests to the persistent interpreter. In PHP’s case, the php-cgi program will spawn multiple child processes to handle requests. I got this working on my home server and it works fine. Now the web server (I also switched to lighttp, but Apache can do FastCGI, too) can handle tons of requests with little resource usage and pass off the PHP scripts to the persistent php-cgi group. Sure, I can still overload my php-cgi group, but at least I can keep servicing small requests while PHP is jammed up. And I’m not servicing small requests with fat processes. But the biggie is now I can seperately manage my web server resources and my PHP resources for better fine tuning.

Portable Media

May 9th, 2006

A somewhat unrelated note on floppies and flash drives. I say this due to recent and ongoing flabbergasting experiences at work.

If you have important documents, you should have them backed up somewhere. This does *not* mean keep your only copy on a floppy, CD-RW or flash drive. If you need a working copy on portable media, fine, but *frequently* copy it to a backup location. If your office, say, has a server that the local admin backs up daily to tape, freakin use it! If you have a USB hard drive for backup, that would be a good place.

Floppies: I can’t believe people still use these for important documents, but I swear somebody asked me for help with a floppy drive last week. And in the past 5 years I’ve had 4 instances of people coming to me in a panic that their only copy of a critical document is on this floppy and quit working. (They had been updating the file on the floppy for years.) Floppies *will* physically wear out, especially if you’re updating an Excel or Word file and re-saving it frequently.

CD-RWs: Sometimes these just quit working. I won’t explain how or why, just understand that there are slight incompatibilities with these things, and one day they’ll quit working in one of your PCs. They may or may not continue to work in other PCs. Be aware of this if you keep a working copy on CD-RW, make frequent backups and be prepared to start another CD-RW or reformat the existing one and restore from backup…it will be necessary sooner or later.

Flash: I haven’t had a problem with these yet, but I’m sure I eventually will. I don’t think most people realize that flash drives can physically wear out…theoretically after a few hundred thousand writes. Modern flash devices have tricks to balance writes and wear and tear across their memory range rather than “burning a groove” in one spot. The point is they’re not infallible. If you have a working copy on flash, keep it backed up elsewhere. As I say, I haven’t had a problem yet, but I see a bunch of users carrying around flash drives, and I have no reason to think they’re treating them differently than the floppies or CD-RWs.

Funny Javascript News Fader Experience

April 22nd, 2006

Over at the Early Retirement Forum, for which I assist in maintaining the forum software, I’ve been adding RSS icons and “Add to [My Yahoo!, Google, Bloglines, My MSN, etc.]” icons and working on getting a sensible RSS link and info system laid out. I have a news fader/ticker item that lists some of the icons and points to more RSS feed info. After collecting a number of icons in the images directory I decided to make and rss subfolder and move the images there so as not to clutter up the site owner’s images directory with RSS icons.

After moving the images I updated all my image links and made a mental note to check the log for 404 erros in the next day or two. The next day I checked, and one IP was creating three 404 errors every 30 seconds looking for the old image locations! Yikes! I tracked which user it was, and it’s a user that I know uses RSS but is not destructive. But there were no RSS feed requests associated with the image requests, so my first conclusion that he’s requesting RSS feeds to often was not correct. I made symlinks for the images in their original locations to stop the 404 errors and hopefully get the user agent, IE6 on WinXP, to cache the icons and quit requesting them.

I figured it out after checking the referer fields: The user must’ve left his browser open to the forum index before I made the image location changes. So his browser is staying open unattended all day while the news fader rotates through the messages and links to the images’ old addresses every 30 seconds or so. That’s why the images are being requested while he’s not hitting page views and not showing up in Who’s Online.

Normally this wouldn’t be a problem because the browser will cache the images and not reload them. But every now and then it will check to see if the image changed, but in this case it found the image missing, so after that it re-checked every time the image was requested, which was every time the image came up in the news fader rotation.

Now that I’ve symlinked the images back to their original location the user’s browser should cache the images and quit requesting them every time they come up on the fader. Next time he gets on the forum or closes the browser the issue will fix itself and I can take away the symlinks.

I love figuring stuff out. Note how some of the information seemed relevant at first but wasn’t: the icons were about RSS, and the IP belonged to a user known to use RSS, but it turns out that had nothing to do with the issue. On the other hand I guess we’re lucky it wasn’t an anonymous user who left his browser open or I might’ve assumed he was misbehaving somehow and banned his IP telling him he’s making too many requests when from his point of view he did nothing of the sort.

Shared and Concurrent Drupal Sites with Symlinking

April 22nd, 2006

I’m currently running Drupal 4.6.6, Drupal 4.7 RC3 and CivicSpace 0.8.3 . I preferred to have one set of source files, but I wanted each site (Apache vhost) to have its own root folder.

So, with some symlink magic, here’s what I do. Each version of Drupal has its own directory under my servers web root. I have the following directories and symlinks:

  • drupal-4.6.6
  • drupal-4.7.0-rc3
  • civicspace-0.8.3
  • drupal-4.6 -> drupal-4.6.6
  • drupal-4.7 -> drupal-4.7.0-rc3
  • civicspace -> civicspace-0.8.3

Each vhost has its own DocumentRoot with the following symlinks, substituting drupal-4.6 or civicspace for drupal-4.7 as appropriate:

  • cron.php -> ../drupal-4.7/cron.php
  • database -> ../drupal-4.7/database
  • includes -> ../drupal-4.7/includes
  • index.php -> ../drupal-4.7/index.php
  • misc -> ../drupal-4.7/misc
  • modules -> ../drupal-4.7/modules
  • scripts -> ../drupal-4.7/scripts
  • sites -> ../drupal-4.7/sites
  • themes -> ../drupal-4.7/themes
  • update.php -> ../drupal-4.7/update.php
  • xmlrpc.php -> ../drupal-4.7/xmlrpc.php

Each vhost gets its own files/ directory, robots.txt, .htaccess and favicon.ico files.

Now, when the next version (e.g. 4.7.1) is released I can unpack the files, copy the sites folder from the previous version, and then update the version symlink (drupal-4.7 -> drupal-4.7.1) and run the update.php script.

If I want to change a particular site from Drupal 4.6 or CivicSpace 0.8.3 to Drupal 4.7 I update the symlinks in the DocumentRoot to point to the 4.7 series, copy the appropriate site config file over and run the update.php script.

This is working well so far, but in my recent “Memory Hogging” blog I mention that having too many modules installed (even if not activated) make the admin/modules screen take up tons of RAM. My sites have different modules needs, so I think I’m going to give each vhost its own modules directory and then symlink the modules under that. This way I’ll have only one module version per drupal version in my filesystem and be able to remove modules I know don’t need from individual sites.

Memory Hogging

April 22nd, 2006

I’ve been hitting the limit of my VPS’s (virtual private server) 256mb RAM limit since installing CivicSpace, and I keep adjusting the php limit to try to avoid having forked processes fail, but then CivicSpace will fail on certain admin pages.

I now understand that when looking at the module activate/deactivate page it loads *every* module installed, even if it’s not activated. (For other pages it only loads activated modules.) Since CivicSpace includes so many modules, this almost guarantees I’m going to run out of php memory if I have the limit set at 20mb or 16mb, and that’s about the same range where I bump my head on the server privvmpages limit given my current configuration.

I don’t think I’ve had this problem with Drupal yet, but I probably will as I keep adding modules.

As a temporary fix I’m going to remove modules that aren’t used and aren’t likely to be used. For a more permanent fix I’m going to pay more to get a higher RAM configuration. Instead of just picking 384mb or 512mb I’d like to look into how I want Apache, MySQL and PHP tuned and figure out how much RAM should be dedicated to each. I might also setup my own test box and run stress tests on Drupal with various configurations. Then I’ll know my RAM target.