I was reading Jay Kuri's article about CGI alternatives the other day, and I got thinking. How much memory does these various modules for simple (or advanced) web serving use?
After having looked through Mark Stosberg's article on startup penalties I was even more bewildered. It was hard to track the actual cost of each module, because the perl interpreter footprint was also in there (and that we cannot do anything with).
I wrote a small script called perlbloat.pl to check how each of the mentioned modules come out. It uses the GTop module, which is Gnome's cross-platform way of counting things such as memory.
The results was from this command:
$ for name in $(echo CGI HTTP::Engine FCGI::Engine Catalyst CGI::Application Squatting Continuity Mojo Mojolicious Titanium HTML::Mason CGI::Simple); do perlbloat.pl $name; done
|Continuity||1 163 264|
|HTTP::Engine||2 072 576|
|Mojo||2 719 744|
|HTML::Mason||2 916 352|
|Mojolicious||3 526 656|
|Titanium||3 559 424|
|FCGI::Engine||10 280 960|
|Catalyst||11 046 912|
Version numbers are as follows (running on perl 5.10.0 on Ubuntu 8.10):
What is interesting to notice here is that CGI::Application actually comes out with a lower footprint than CGI::Simple. Considering CGI::Application has a somewhat bigger API this is surprising.
There is of course no surprise that Catalyst is the most memory hungry module of them all. What seems surprising, though, is that FCGI::Engine eats so much. It would be nice to know why.
If you consider these numbers I would like to know good reasons for using Catalyst in a high-performing environment. To me it seems like the application servers will take a trashing because of the increased memory usage of each process if you compare it to e.g. CGI::Application. Even Titanium which is pretty feature packed comes out at almost three times less memory used.
What is interesting to notice is that if you consider the typical deployment scenario for a Catalyst-based app you get these numbers:
$ perlbloat.pl Moose DBIx::Class Catalyst
Moose added 4.8M
DBIx::Class added 392k
Catalyst added 5.7M
Moose DBIx::Class Catalyst added 10.9M in total
If you consider a similar app based on HTTP::Engine you will have this overhead:
$ perlbloat.pl Moose DBIx::Class HTTP::Engine
Moose added 4.8M
DBIx::Class added 396k
HTTP::Engine added 1.4M
Moose DBIx::Class HTTP::Engine added 6.7M in total
If you turn the loading order around a little bit you get this:
$ perlbloat.pl DBIx::Class Moose HTTP::Engine
DBIx::Class added 528k
Moose added 4.7M
HTTP::Engine added 1.4M
DBIx::Class Moose HTTP::Engine added 6.7M in total
What you can see from this last dump is that Moose and DBIx::Class shares some code (132k), but it is mostly irrelevant when you consider the cost of the rest.
Another package that is getting wildly popular these days is MooseX::Declare (0.22 tested), and as you can see, it has pretty large footprint aswell:
$ perlbloat.pl MooseX::Declare
MooseX::Declare added 10.3M
If you separate Moose and MooseX::Declare you can see that it adds up by itself (it's not only Moose that costs):
$ perlbloat.pl Moose MooseX::Declare
Moose added 4.8M
MooseX::Declare added 5.4M
Moose MooseX::Declare added 10.2M in total
If you have something to say about the numbers I've collected here I would love to hear them. Feel free to post comments.
I usually find that you tend to run somewhere between 1.5x and 2x number of cores of application processes per server - more if you spend a lot of time blocked waiting for the db on most requests.
The upshot of this is that on the average web node, an extra 10Mb per process costs you somewhere between 60mb (1.5x, 4 cores) and 320mb (2x, 16 cores). And of course then there's copy on write which means you're probably losing less than that in terms of actual memory usage.
Most real production web nodes I see have either 4 or 8Gb of RAM these days, since it simply doesn't cost enough to be worth bothering with less.
So does a couple hundred meg of RAM actually matter in terms of your overall hardware requirements if it reduces your application's development time significantly?
CGI::Application and many of its plugins use 'lazy loading.' so startup costs are defered until objects are actually used. If
your script had used $self->query which instantiates CGI.pm by default, CGI::Application memory usage would go way up. Add a template module, some plugins, and it would go up even more. And realistically, you would do just that in the lifetime of a real-world application. So how much does it matter in the long run? If you want features, you have to pay, either now or later.
Comment from: egor Visitor
Memory is cheap.
Development time is valuable.
@Matt: I'm currently in a situation at work that my application is memory-bound. That's why I'm investigating memory footprint of CPAN modules (since we're using a lot of them). If I can choose a similar module with a lower footprint but with equal performance, it's a win in the end (higher concurrency per server).
@jaldhar: I wasn't aware that CGI::Application used lazy loading. And yes, it will make it rise in cost as you're using features. But that's actually a very good point. Why should you take the penalty of code you're not using? I would very much like it if more modules used this approach so that memory would better be preserved.
@robin: You said "Why should you take the penalty of code you're not using? I would very much like it if more modules used this approach so that memory would better be preserved."
You're unfortunately completely disregarding the effects of Copy on Write here. For most long-running servers, you'll be better off if as much code is loaded as soon as possible before the fork. Modules, which use lazy loading by default and do not respect e.g. the "prefork" pragma, often have a lot worse memory (and maybe even CPU) usage because of that, because they'll often load their code only after the fork, which means every running fork of your application server will not only have to parse the modules anew, it will also always use freshly created memory.
To have optimal memory usage, you actually want to have everything you use loaded before the fork. Unfortunately, most CPAN modules do not yet use the prefork-pragma to achieve this.
@markus: I like what the original CGI.pm does, that it offers a :compile option so that you can preload if you want to and make use of copy-on-write. That way the user of the module has the option of choosing based on their intended usage and deployment scenario.
Let's take Moose. It's a really good package and more and more modules use it (which is very nice). But when you consider what it is used for it has broad applications, which means that the writers of Moose cannot automatically assume what environment you're going to use it in and what are your primary concerns. In some cases it might be startup time, in other it can be cpu usage, and in a third it could be memory consumption. Wouldn't it be better if the user was able to specify their focus and if the module (in this case Moose) had to make decisions based on tradeoff between cpu/memory it would follow the users preferred way. This way the module could cater equally well to single instance shell scripts, long-running servers, highly forked/threaded environments. This of course would increase the complexity of the module, but for some usage patterns it could be a huge win.
But lazy-loading combined with a preload/compile option (or proper use of prefork pragma) would be a good start. Maybe more visibility needs to be put on this subject.
@everyone: In my opinion you should always make an educated choice of when to cater for increased cpu usage or increased memory usage (or IO for that matter). If the user (in this case a developer) can choose which one to focus on things will be better for all of us in the end. Just make the default the one that the majority of users will probably want.
@egor: Memory is not cheap on embedded platforms (like cellphones, STBs and routers). Neither is CPU or IO. Which is why developers that work on those platforms need to be very talented. Wouldn't it be nice if perl actually was an option for them? I don't see a reason why we (as a community) should exclude embedded platforms. There is, after all, a lot of money to be made in that arena.
@Markus: "To have optimal memory usage, you actually want to have everything you use loaded before the fork."
Well, yes, but "everything you use" is kind of a key point (and the one I read Robin as having been making), isn't it?
For example, a lot of web frameworks carry their own mini web servers along with them. If I'm going to be running my application under apache, then the framework's built-in server will never be used, so it should never be loaded, either before or after apache forks.
I agree completely that everything you're going to use should be loaded before forking, but it would be nice if there were also a way to arrange things so that features you're not going to use don't get loaded at all.
Hey, neat! Thanks for this.
I'm not going to repeat the "memory is cheap" argument. Twitter requiring insane amounts of horse power to do very little (and still failing for resource starvation left and right) makes it clear that that argument has limits.
CGI of course hides most of its code behind AutoLoad blocks so it's actually more of a pig.
Thank you for including Continuity. Not taking time to really pimp it (or polish it), I'm always surprised when it gets mention. On that note... of pimping it... unlike CGI and mod_perl based servers, all users share a single Continuity instance, which adds another dimension. Resource leaks hurt more on one hand, but on another hand, the price of that overhead is only paid once. And of course there are other dimensions... of being able to do crazy in-process debugging tricks and inspect user's data and application state. But that's tangent to this. Not that other systems don't have their own unique sets of perks.
Enlightening, interesting, and a good conversation piece. Cheers!
I did some memory surveys of my.opera.com back in the day, and back then, GTop wasn't very reliable. I don't remember how I discovered it, but found that I had to check the Linux SMAPS to get any reasonable results. I hacked a Munin plugin to monitor it on an Apache system.
Then, I think it is very important how the application scales with more users. If it is capable of sharing memory between processes, you can scale sublinearly relative to unity, i.e. if you double the number of users you don't need to double the hardware resources. If you can do this well, I'd say development time counts much more. Remember 1 GB of DDR2 RAM comes at around NOK 70 these days...
BTW, I've been working in the Java world lately, and found myself very happy if the app consumes less than half a gig of RAM, and sublinearity is not at all realistic... ;-)