Flickr: PHP4 vs PHP5 drop in CPU comparison

andr · on May 8, 2009

Both Flickr and PHP5 appeared in 2004. For one of the biggest PHP sites in the world, I'd imagine they'd upgrade sooner.

amethyst · on May 8, 2009

PHP5 had a very unfortunate and slow pickup from the developer community, mainly stemming from a wave of backwards-incompatibilities in PHP5, some of which was in the new object model for PHP5, the rest of which was changes to way some core functions took their parameters. There are still some applications that are not yet compatible with PHP5, even five years later and after PHP4 has already been end-of-lifed.

It's much like the Python 2.x -> 3.0 migration, in that it needs a lot of attention from developers to make sure applications are ready. The difference is that with PHP, you have scores of uneducated shared hosting clients who don't know anything about their application other than it works in PHP4, but not in PHP5. Hence the reason that it took years before PHP5 outnumbered PHP4 on the webserver...

encoderer · on May 8, 2009

OK, I'll bite... I'm a co-founder of a software consultancy and we've actively marketed PHP4->PHP5 upgrades after we did a couple large, successful projects that just happened to include upgrading from 4 to 5.

We wrote a Compiler that does 95% of the work for us. Took a few weeks to write and it's made our 4->5 upgrade jobs a very profitable part of the business.

To the best of my knowledge, there was no changes to function parameters order. It's nearly all object-model stuff.

Some might find this interesting.. The 1.0 of the Compiler introduced bugs into the compiled (5.0 version) of the codebase in cases where the orginal dev relied on the Assign-By-Copy behavior of v4.

In most cases by-ref behavior is what was wanted, and many probably didn't notice the distinction. But there is a situation where the dev may have taken advantage of the fact that eh was getting a copy, eg:

function x($o) { $o->foo = 'bar'; $o->save(); unset($o); }

$y = new o(); $y->foo = 'baz'; x($y); $y->save();

This code in PHP4 works as the developer intended. In PHP5 it throws an error on the $y->save line because the object was passed by reference to x() and it was garbage collected inside x.

It's rare, but it exists, and it was a little tough to fix because we're running a single-pass Compiler.

The nice thing is that it's also able to introduce optimizations in the code, as well as use a Static Analysis approach to detecting probably bugs and security holes (for example, unfiltered use of $_REQUEST values, or assigning to $this) and it fixes them as well.

The end result is an app that always runs faster, sometimes significantly so.

IMO, the real reason 4->5 was a slow process is because it's biggest selling point was a "real-ish" object model and most PHP developers are Morts and just don't need to care about that.

amethyst · on May 8, 2009

> To the best of my knowledge, there was no changes to function parameters order. It's nearly all object-model stuff.

Now that I think about it, I'm wanting to say the parameter-order hoopla was centered around one of the point releases to 5.x, but I could be wrong.

Your compiler is an interesting topic though, and it seems similar to the "upgrade" scripts available for the Python 3 jump. Do you make the compiler publicly available, or no? Would be really nifty to see what all you handle in it.

chops · on May 8, 2009

I've venture a guess that "no, it's not publicly available", primarily since he mentioned that it's a very profitable part of his company.

mattyb · on May 8, 2009

Can anyone shed some light on the difficulties involved in upgrading a codebase of that size?

paulhammond · on May 9, 2009

(I work at flickr)

The problem is finding all the places that the language has changed and all the places you relied on the old behavior. The documented stuff was easy to find, it was the undocumented (or unintended?) changes that were hard. For example, strtotime($null) returns now in php4 but returns null in php5.

QA and automated testing will only get you so far in this process. Nobody had thought to test how we display the "Date Taken" on photos from before December 14th 1901 until a member pointed out they were displaying wrong (caused by another change to strtotime() behavior).

At some point you have to upgrade a subset of your servers, watch the error logs, fix the bugs as they come up and be prepared to rollback quickly. It wasn't a fun experience...

axod · on May 8, 2009

Cool, and forgive me if I'm wrong here, but:

CPU usage is a silly stat to measure, unless it's more than 100%. Granted, this will allow PHP5 to scale more before it gets to 100%, so that's a good thing. Surely the optimum case is to have all your machines pegged at 100% CPU, using them to their maximum capacity.

It'd be useful to compare page latency across PHP4->PHP5 or some metric the user can see.

neilc · on May 8, 2009

Surely the optimum case is to have all your machines pegged at 100% CPU

True, but given the typical difference between average load and peak load, you wouldn't want to have your systems at 100% most of the time anyway. Of course, what you really ought to do is scale down during non-peak periods and scale up when traffic increases so that your utilization is always high, but before EC2/etc. that wasn't feasible for most people to do.

mbrubeck · on May 8, 2009

An online retailer I worked for usually aimed for 66% peak utilization, so any one of three datacenters could fail without affecting availability.

emmett · on May 9, 2009

Definitely not. If your boxes are pegged at 100%, you have no spare capacity!

The real question is, are you CPU constrained at peak capacity? And in the case of dynamic web apps, the answer is almost always yes. So CPU improvements are a huge deal. Double your CPU efficiency, double your server efficiency.

axod · on May 9, 2009

Yes, I agree - in terms of extra capacity. But for your current users, they may see no difference.

zackattack · on May 8, 2009

This is a very insightful comment.

What are the biggest server-side bottlenecks with respect to user experience, and what practical steps can be taken to remedy them?

whacked_new · on May 8, 2009

Funny, the caption of the picture is "Webserver CPU...anyone guess what the drop is from? (Flickr Staff, don't answer :) )"

Is this supposed to be obvious? Otherwise, how did you know?

brlewis · on May 8, 2009

Look at the other 2 lines of the caption.

neilc · on May 8, 2009

The line that says "Answer is: PHP5" was presumably added later.

brlewis · on May 8, 2009

It was there when this news.yc item had 0 comments. Also notice in the flickr comments the answer was given 6 weeks ago.

neilc · on May 8, 2009

Yeah, obviously. The OP's point is that attributing the drop in CPU usage to PHP5 (which was the answer to the original Flickr poster's question) is non-obvious (or it is a plausible candidate among a bunch of possibilities).

brlewis · on May 8, 2009

If he wrote "how might one guess" instead of "how did you know" I'd say you were right. Occam's razor suggests it was a question directed at vaskel as to how he chose to title this item.