Posts Tagged ‘squid’

Using Squid To Cache Apt Updates For Debian And Ubuntu

Sunday, July 5th, 2009

I run several Debian-based Linux machines and virtual machines at home and periodically install or reinstall one to test something. They all need updates—and mostly the same updates—so I wanted to cache the updates locally rather than download them several times when I upgrade.

There is an apt-proxy package, and although I can’t recall the problems with it I remember deciding it was not going to work well for me. I could rsync the entire package archive, but that’s just wasteful. I finally decided on setting up a Squid proxy dedicated—by intent, not controls—to caching deb packages from Debian and Ubuntu archives. And rpm’s and such if I should use other distro’s.

So I set up Squid and looked through the configuration options. Squid is by default set up to be most efficient at getting cache hits. I wanted to be sure it doesn’t expire the seldom-accessed large deb files to make room for tiny files, so I changed the cache replacement policy to LFUDA to optimize byte hit rate. I also increased the maximum object size to 100 megabytes from the default 4096 kilobytes. In a typical Squid cache the larger files aren’t cached because they often aren’t requested as often as smaller files by web surfers, however my cache’s purpose is to save these large files locally for updating several machines.

Now I needed to make my machines use the proxy for apt. For that I just added a code snippet to each /etc/apt/apt.conf, or in my cases I just slipped this file named jimproxy into /etc/apt/apt.conf.d/ :

Acquire {
        Retries "0";
        HTTP {
                Proxy "http://address-or-URL-of-squid-proxy.example.tld:3128/";
        };
};

Now when I run apt or aptitude or any manager that uses apt, they will use my Squid proxy to obtain the distribution packages.

This worked quite well, but I recently noticed some problems. The issue appeared to be that there were missing deb files from the archives, but what really was happening was that new Package.bz2 lists were on the archives, but my Squid cache was serving older lists it had cached. It listed some older packages which were no longer there. So my “apt-get update” would read an old package list and then “apt-get -u upgrade” wouldn’t find those older packages. So I need to tell Squid to be sure to check for new package lists. To do that I changed the refresh pattern option for “refresh-ims”. Voilà, it works properly now.

Squid.conf lines before:

# maximum_object_size 4096 KB
# cache_replacement_policy lru
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320

Squid.conf lines after:

maximum_object_size 100 MB
cache_replacement_policy heap LFUDA
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern .               0       20%     4320 refresh-ims

I turned on refresh-ims for everything, but I probably would have been fine with turning it on for just the frequently-changing files as shown in the following code. But in my case I don’t think turning it on for all files will adversely affect things.

maximum_object_size 100 MB
cache_replacement_policy heap LFUDA
refresh_pattern ^ftp:          1440    20%     10080
refresh_pattern ^gopher:       1440    0%      1440
refresh_pattern Packages\.bz2$ 0       20%     4320 refresh-ims
refresh_pattern Sources\.bz2$  0       20%     4320 refresh-ims
refresh_pattern Release\.gpg$  0       20%     4320 refresh-ims
refresh_pattern Release$       0       20%     4320 refresh-ims
refresh_pattern .              0       20%     4320

2008 Update on Running Drupal on a Small VPS

Wednesday, November 19th, 2008

For a year or two I was successfully running Drupal with lighttpd and fastcgi. Lighttpd is very efficient, and having 4 fastcgi PHP processes let me limit the memory php used while keeping my small sites responsive enough. But redirects and URL rewriting are done differently, and it was tricky at times to get it to work the way I wanted with Drupal. I eventually got tired of wrestling with rewrites and redirects with every new non-Drupal php-based app I wanted to try, so I started thinking about how to make Apache work for a small site.

Now I am running Apache2/mod_php with a Squid front-end cache. If you’re not familiar with Squid, in brief it is a web proxy that caches web requests. It is often used to speed up clients but can be reversed to cache requests on the server end in web accelerator mode. Apache uses more RAM than lighttpd/fastcgi, and Squid uses RAM for itself and the cache, so I had to cut back two 2 Apache processes.

That may sound like too little, but here’s how it works well: the threads aren’t stuck delivering a request to a slow remote client because the local Squid cache accepts the request locally and then delivers it to the client freeing up the Apache process to handle the next request. 2 processes can handle my traffic because they can quickly deliver their payload to the cache and move on, and Squid can handle the delivery over the network to the client.

Of course sometimes the processes get hung up on slow MySQL queries or slow PHP queries, so occasionally I had delays. In particular I got rid of my RSS requests, because the Apache process had to wait while requesting RSS feeds from other sites, and sometimes that got slow or even timed out leaving just one Apache process handling requests, and if it stumbled on something then I’d have client requests waiting in line not getting served. So I got rid of my news feeds. Note that I am talking about my web site pulling feeds from other sites; of course I can offer RSS feeds for my Drupal blog with no such PHP delays. I also strive to keep my MySQL running smoothly, but everyone should do that, anyway.

The downside of using Squid is logging. Since Squid is my front-end web server I have Apache listen on 127.0.0.1:80, and Squid accesses it there. So my Apache log files show all requests coming from 127.0.0.1, and many static page or image requests don’t come through because Squid has them cached. However I configured Squid to log in Apache log format and just use those logs instead.

Of course dynamic content from Drupal has the nocache header, so Squid isn’t caching the dynamic content for future requests, but it still frees up Apache while delivering it to the client. It does cache the static files like images, style sheets and javascript files, so the Apache threads mostly focus on dynamic content only.

Another way I keep memory usage down is with eaccelerator. It caches PHP scripts so they don’t have to recompile every time they’re run. This can save memory in addition to processor time. After changing Drupal or any of my scripts I usually delete the cache and click around my sites to force all the php to run so eaccelerator will cache it. Then I restart my php processes (Apache2 in the case of Apache/mod-php or the fastcgi server if using fastcgi) to lower their memory usage. After that the cached scripts should run and the PHP processes shouldn’t bloat as much. Note that every time PHP is updated eaccelerator must be recompiled. In older PHP versions it would crash if you didn’t, but now it just silently (except for a log entry) fails to cache your scripts if you forget to recompile after a PHP update.

With lighttpd/fastcgi I was able to run 4 PHP processes (memory_limit from 8MB – 16MB), lighttpd, MySQL and Exim (my mail daemon) in a 256mb VPS with good speed. With Apache2/mod_php I am running 2 Apache2/mod_php processes, Squid (8 MB cache memory), MySQL and Exim in a 256mb VPS. Having only two processes forces me to watch for slow requests like a hawk, but Squid takes care of slow clients. I still ran into memory problems occasionally, but now I have a 384mb VPS and haven’t had a privvm failure yet.