.::t3rmin4t0r::. : php
< May 2008
SuMoTuWeThFrSa
     1 2 3
4 5 6 7 8 910
11121314151617
18192021222324
25262728293031

Mon, 26 May 2008:

APC 3.0.19 Released !

That annoying file descriptor leak that snuck into 3.0.17 has finally been laid to rest. A few double free issues were fixed as I spent quite a long time staring at the same code, till enlightenment hit me like a clue bat. Along with that, there are a bunch of quickfixes for 5.3 quirks. I'm not happy with those, but this is the 3_0 stable branch and by the time 5.3 is popular enough the HEAD should be taking care of those problems. The build is broken in VC++ in this release, but excluding apc_pool.c from the build should work.

Expect more changes... as soon as I get back home.

--
Delay always breeds danger and to protract a great design is often to ruin it.
              -- Miguel De Cervantes

posted at: 04:27 | path: /php | permalink | Tags: , ,

Tue, 06 May 2008:

Running Infinite Loops quickly - Part II

There's a certain cultural bankruptcy which shows itself in sequels. It indicates, that you're reduced to imitating yourself. But this isn't that kind of a sequel. No, not the kind where there are T Rexes in the city, trying to make a living drawing cartoons or Arnie switching from ammo boxes to ballots. This is the kind which gives a New Hope.

Yesterday, I had an outpouring of hate against the linux capability model. But the problem turned out to be that setuid resets all the capabilites. In hindsight that makes a lot of sense, but didn't even strike until the kernel people (y! has those too) got involved (and I didn't RTFM).

Enter Prctl: The solution was to use the prctl() call with PR_SET_KEEPCAPS to ensure that the capabilities are not discarded when the effective user-id of a process is changed. But, even then, only the CAP_PERMITTED flags are retained and the CAP_EFFECTIVE are masked to zeros.

So, with the prctl call and another cap_set_proc to reset CAP_EFFECTIVE, it was on a roll. Here's the patch on top of unnice.c.

 #include <sys/resource.h>
+#include <sys/prctl.h>;
@@ -26,12 +27,14 @@

    if(!fork())
    {
+       prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);

        /* child */
        if(setuid(nobody_uid) < 0)
        {
            perror("setuid");
        }
+       cap_set_proc(lcap);

        if(setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0) - 1) < 0)

Thus concludes this adventure and hope that this blog entry serves as warning of things to come. Watch this space for more Tales! Of! INTEREST!.

--
Only great masters of style can succeed in being obtuse.

posted at: 18:34 | path: /php | permalink | Tags: , ,

Mon, 05 May 2008:

Running Infinite Loops quickly?

Running infinte loops is a tricky challenge. What happens to a process when a programmer writes an infinite loop, should be familiar to all. But the challenge is to not let that affect the *other* processes. There seemed to be a perfect solution to it - setrlimit().

The function lets you set soft and hard limits on CPU, so that if a process does exceed the soft limit CPU usage, a SIGXCPU is raised. The process can catch the signal and do something sensible. Basically, all that was required was for the process to call setpriority and let the linux process scheduler slow it down to a trickle.

But a process can lower its priority, but not raise it - if it is a non-privileged process. But linux capabilities allows you to grant CAP_SYS_NICE to the process which essentially lets a non-privileged process muck around with priority - down and up.

To begin with /proc/sys/kernel/cap-bound is unbelievably confusing to use. It is a 32 wide bit-mask on which the 23rd bit apparently seems to be the CAP_SYS_NICE value. After much mucking around, I came to the conclusion that "-257" would be 0xFFFFFEFF which only disables CAP_SETPCAP. But even then the setpriority call kept failing. Here's my test code.

cap_t lcap;
const unsigned cap_size = 1;
cap_value_t cap_list[] = {CAP_SYS_NICE};

lcap=cap_get_proc();
cap_set_flag(lcap, CAP_EFFECTIVE, cap_size, cap_list, CAP_SET);
cap_set_flag(lcap, CAP_PERMITTED, cap_size, cap_list, CAP_SET);

cap_set_proc(lcap);

if(setuid(nobody_uid) < 0) 
	perror("setuid");

if(setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0) - 1) < 0) 
	perror("setpriority");

Here's a link to the test case in a more compileable condition. Build it with gcc -lcap and run with sudo to test it. Right now, my ubuntu (2.6.22) errors out with this message.

bash$ gcc -lcap -o unnice unnice.c
bash$ sudo ./unnice 
0: =ep cap_setpcap-ep
setpriority: Permission denied

The core issue has to do with apache child-process lifetimes. The only recourse for me is to kill the errant process after the bad infinite loop and have the parent process spawn a new process with a normal priority. But which means blowing off nearly all the local process cache, causing memory churn and more than that, the annoyance of a documented feature not working.

This story currently has no ending, but if any kernel hackers are reading this and should happen to know an answer, please email gopalv shift+2 php noshift+> net. And thus we prepare for a sequel (hopefully).

--
I use technology in order to hate it more properly.
                -- Nam June Paik

posted at: 22:03 | path: /php | permalink | Tags: , ,

Wed, 26 Mar 2008:

APC 3.0.17: Undo, Undo, Undo

In response to CVE-2008-1488, APC 3.0.17 has just been pushed out with the requisite security fixes. But in the process of producing a php4 compatible release, a significant amount of code has been reverted in the merge into an APC_3_0 branch for future bugfixes.

I've spent a couple of hours unmerging my "bye bye php4" cleanups with the help of Kompare. And my sanity is simply due to the fact that I can "cvs diff -u | kompare -" to look at the resulting huge patch. But it is not unpossible that the new code merged from HEAD has regressions, so you could also apply the unofficial patch onto 3.0.16.

--
I took your advice and did my own thing. Now I've got to undo it.

posted at: 22:03 | path: /php | permalink | Tags: , ,

Sun, 30 Dec 2007:

APC 3.0.16: Santa Claus Edition

I got a nice little present for Christmas.

It had +4 lines, a huge mail explaining why and made me feel happy & stupid at the same time (there's some correlation, I think).

The patch fixes a one-off error in the APC shm allocator (read my mail for a shorter paraphrasing) and has triggered the 3.0.16 release. Now, APC should be stable even when running under cache full/heavy load conditions. And I've been barking up the wrong tree of race conditions for months & months. But the important thing is that this is fixed now.

Merry XMas and a happy New Year! [ citation needed ]

--
Patience is a minor form of despair, disguised as virtue.
                -- Ambrose Bierce

posted at: 06:27 | path: /php | permalink | Tags: , ,

Fri, 19 Oct 2007:

APC 3.0.15 Released

APC 3.0.15 has been released - read the release announcement. Not too many changes since 3.0.14, but there's a reason it took this long to make so few changes.

To begin with, I've just been lazy. Just kidding! This release was actually delayed to make sure this could be the very last PHP4 release of APC. And with the amount of major changes coming in, the next release is definitely going to be 3.1.0 rather than a 3.0.16 release. Now I can start working on making some fairly big changes.

Bye Bye PHP4, it was a nice ride while it lasted.

--
While most peoples' opinions change, the conviction of their correctness never does.

posted at: 03:11 | path: /php | permalink | Tags: , ,

Mon, 24 Sep 2007:

APC Autofilter: The Real Story

PHP programmers don't really understand PHP.

They know how to use PHP - but they hardly know how it works, mainly because it Just Works most of the time. But such wilful ignorance (otherwise known as abstraction) often runs them aground on some issues when their code meets the stupidity that is APC. Bear with me while I explain how something very simple about PHP - how includes work.

Every single include that you do in PHP is evaluated at runtime. This is necessary so that you could technically write an include inside an if condition or a while loop and have it behave as you would expect. But executing PHP in Zend is actually a two step process - compile *and* execute, of which APC gets to handle only the first.

Compilation: Compiling a php file gives a single opcode stream, a list of functions & yet another list of classes. The includes in that file are only processed when you actually execute the code compiled. To simplify things a bit, take a look at how the following code would be executed.

<?php
return;

include_once "a.php";
?>

The PHP compiler does generate an instruction to include file "a.php", but since the engine never executed it, no error is thrown for the absence of a.php. Having understood how includes work, classes & OOP face a unique problem during compilation.

<?php
include_once "parent.php";

class Child extends Parent
{
}
?>

Even though the class Child is created at compile time, its parent class is not available in the class table until the include instruction is actually executed & the parent.php compiled up. So, php generates a runtime class declaration which is an actual pair of opcodes.

ZEND_FETCH_CLASS              :1, 'Parent'
ZEND_DECLARE_INHERITED_CLASS  null, '<mangled>', 'child

But what if the class parent was already in the class table when the file was being compiled? Like the following index.php

include_once "parent.php";
include_once "child.php";

$a = new Child();

Since obviously the parent class is already compiled & ready, Zend does something intelligent by removing the two instructions and replacing them by NOPs. That makes for fewer opcodes and therefore faster execution.

Here's the kicker of the problem. Which of these versions should APC cache? Obviously, the dynamically inherited version is valid for both cases - but APC caches whatever it encounters initially. The static version is obviously incompatible in a dynamic scenario.

So whenever APC detects that it has cached a static version, but this case actually requires a dynamic version, it decides to not cache that file *at* all from that point onwards. That's what the APC autofiltering does.

Now, you ask - how could it appear in perfectly normal code?

Assume child1 and child2 inherits from parent class. And here is how the first hit on index.php looks like from an inclusion perspective. Now, it is obvious that the child2 in this case is actually compiled with the faster static inheritance (marked in orange) while child1 suffers the performance hit of not having Parent available till execution time.

Then we have a profile.php which only requires the child2 class. But while executing this file, APC fetches the copy of child2.php which was in cache - which is the statically inherited one.

As you could've guessed, the cached version is not usuable for this case - and APC drops it out of cache. And for all requests henceforth, even for the index.php case, APC actually ignores the cached version and insists on compiling the file with Zend. If you enable apc.report_autofilter, this information will be printed out into the server error log.

Part of the culprit here is the conditional inclusion using include_once. With mere includes, you get an error whenever parent.php is included multiple times - but that can be annoying too. Where include_once/require_once can be debugged with Inclued, userspace hacks like the rinclude_once or !class_exists() checks make it really hard for me to figure out what's going wrong.

So, if you write One File per Class PHP and use such methods of inclusion, be prepared to sacrifice a certain amount of performance by doing so.

--
Doubt is not a pleasant condition, but certainty is absurd.
              -- Voltaire

posted at: 13:55 | path: /php | permalink | Tags: , ,

Fri, 07 Sep 2007:

Inclued 0.3: Classes

After procrastinating for nearly two weeks with the code nearly done, I've managed to find the energy (and some caramel coffee) required to fix it up for the public to use - and here it is. In the process, I also threw out all the ZendEngine2 hacks and started to use zend_user_opcode_set_handler, which should let people use this with the faster CGOTO vm core, though I would advise against using that just yet.

The new & improved inclued can dump out class inheritance dependencies (though not the interfaces, as of now). This gives a slightly bigger picture view of what files depend on what other files and provide a tree of the classes clustered into their own files. For example, this is the graph pulled out from the relatively minimal PEAR::HTML_QuickForm2 library.

The usage is as before, the gengraph.php script now has a -t option which will accept either "classes" or "includes". At the very least, it should help people documenting OOP php code. Next up are interface implementations, the data is already in the dumped files, but not output in any human readable format.

--
It is a very sad thing that nowadays there is so little useless information.
        -- Oscar Wilde

posted at: 06:07 | path: /php | permalink | Tags: , ,

Sat, 11 Aug 2007:

APC Yak Shaving

Yak Shaving: So you start out with that simple problem. But half-way through fixing it, it explodes into this whole exercise in pointless dependencies. It is a rather recent wordification (never heard of that word ? it's a perfectly cromulent word). But considering the fate of the "pre-shaved yaks" guy, who ended up saying "It's a band.", I'd say it is not quite popular enough ... yet.

Now. before I start onto the real topic - let me first say that the next release of APC will be the last release compatible with PHP 4.x. Now, what is wrong with just letting the #ifdefs stay ? That's where this snippet of code comes into play.

<?php 
  apc_store("a", array(new stdclass()));
  print_r(apc_fetch("a"));
?>

It doesn't work. Now, the problem is very simple - the original patch by Marcus only checks for objects in a very shallow way. It will detect & serialize objects which are passed to apc_store - but the check does not extend deeper into the recursive copy functions.

Symmetry: But the zval* copy functions were written to be beautifully symmetric. A copy into cache is nearly the same as copying out of it. And when I say "nearly", I actually mean that until the *_copy_for_execution() optimisations were thrown in, they were actually symmetric - in & out. But objects don't play nicely with that - because they are much more than just data.

In & Out: Objects require assymmetric caching. Storing into cache is a serialize operation, while retrieving from storage is a deserialize. This ensures that they end up with the right kind of pointers, class object initialization and that the resources they hold in their opaque boxes are properly handled. The objects have to implement their appropriate magic persistance methods.

And thus begins the Yak Shaving. I need to rewrite most of the cache copy-in and copy-out functions to handle the basic assymetry. But consider this, most of the code in there has been limited for months because of the fact that I cannot optimize on PHP data structures without breaking the symmetry.

A couple of years ago, I sat through a full hour talk by Rusty Russell about talloc(). Built on top of the trusty old malloc() calls, it simplifies memory management a lot for Samba4. So bear with me as I take a brain dump of my idea - for my very intelligent reader to poke holes in (gopalv shift+2 php net).

APC's allocation strategy is a little brain dead. To allocate 4 bytes of data, it actually requires 24 bytes of space. But much more than the space wastage, I'm more concerned about the number of lock() calls required to cache a single php file - a hello world program takes about 22 lock operations (11 locks, 11 unlocks). Yes, that's actually 22 syscalls just to cache echo "hello world";.

I've previously tried to fix it with partitioned locks. The problem with that was actually cleaning up the locks, because the extension code would have to have special cases for every SAPI - because of some bugs in PHP 5.x. So, the "if you don't succeed, destroy all evidence" principle made me throw out that idea. But the cache-copy, zend-copy separation should help me revive another approach to this.

Pools: So, now that I'm officially b0rking up APC, I could as well slap on a new pool allocator, right on top of sma_allocate - ala, talloc(). The allocation speed would skyrocket, because the in-pool allocs are sequential and do not have any fragmentation issues due to blocks in the middle being free'd. As much as allocates are important, the real advantage of this would be that I could basically speed up cache expunges by a magnitude or more. The 22 syscall cache expunge for hello world would be reduced to a potential pair of syscalls - because it would be a single free of the entire pool space.

Right now the pool is actually built up to be of the following structure.

struct apc_pool_t {
	int capacity;
	int avail;
	void *head;
	apc_pool_t *overflow;
	unsigned char data[0];
};

I've yet to run this through an x86_64 build, but an even multiple of int/void* should align data area right into a wordsize. And I think nearly every pool should be around 4k (i.e 4096 - sizeof(apc_pool_t)) for opcode cache and 1k for data cache. I might make the latter a runtime tuneable, just to pad the APC manual up into an entire book (just in case someone asks me to write one .. *heh*).

None of this is included in APC 3.0.15, which will exit out of the gates as soon I'm sure I'm happy with its stability. The new code will probably be an APC 3.1 release, marking the end of php4 compat & opening up the door for php6 compat.

A two line bug report which exploded into nearly two thousand lines of C code - that's just classic yak shaving.

--
10 If it ain't broke, break it;
20 Fix it.
30 Goto 10

posted at: 09:27 | path: /php | permalink | Tags: , ,

Wed, 08 Aug 2007:

Inclued 0.2

Finally, I got bored enough to update my inclued extension (as promised at OSCON). The extension now comes with a nearly completely non-intrusive data dumping mode. The new inclued.dumpdir can be used to dump the inclued data onto a temporary file without ever modifying any of your php scripts. Also included is some php code to transform the dump data into graphviz formatted .dot files.

Pick up your free & complementary copy of the source code on your way out. And stay clued-in about your includes.

--
This quote intentionally not included.

posted at: 04:27 | path: /php | permalink | Tags: , ,

Mon, 09 Jul 2007:

Schwartzian Transforms in PHP: Stable Sorting

Recently, one of the php lists I'm on was asked how to implement a stable sort using php's sort functions. But since all of php's sort functions eventually seem to land up in zend_qsort, the default sort is not stable. The query on list had this simple example which illustrates the problem clearly.

bash$ php -r  '$a= array(1,1,1); asort($a); print_r($a);'

Array
(
    [2] => 1
    [1] => 1
    [0] => 1
)

The basic problem here is to produce a stable sort which still operates in quicksort O(n*lg(n)) time. Essentially, falling back onto a bubble sort is ... well ... giving up :)

Schwartzian transform: The programming idiom, named after Randal L. Schwartz, is a carry-over of the decorate-sort-undecorate lisp memoization into perl. The real problem here however was putting it into a php syntactic form which was clean and as similar to the original as possible. For example, here's how the python version of the code would look like (despite the fact that the current python sort is stable).

a = [1,1,1]
b = zip(a, range(len(a)))    # decorate
b.sort()                     # sort
a = map(lambda x : x[0], b)  # undecorate

array_walk() magic: Coming from a world of constant iterators, I had read the array_walk documentation and sort of read the "Users may not change array itself ..." as boilerplate. But as it turns out, the callback function is allowed to change the current value *in place* and for that purpose it gets a reference to the value. With that in mind, array_walk becomes a faux in-place map/transform function.

$a = array(1,1,1);

function dec(&$v, $k) { $v = array($v, $k);}
function undec(&$v, $k) { $v = $v[0]; }

array_walk($a, dec);   // decorate
asort($a);             // sort
array_walk($a, undec); // undecorate

And there you have it, a lispism made famous by perl, implemented nearly exactly in php.

--
Whoever knows he is deep, strives for clarity;
Whoever would like to appear deep to the crowd, strives for obscurity.
                        -- Nietzsche

posted at: 08:06 | path: /php | permalink | Tags: ,

Sat, 26 May 2007:

APC at Facebook

Brian Shire has put up his slides (835k PDF) of his php|tek talk.

Quite interesting procedures followed to prevent the very obvious cache slam issues by firewalling the apache while restarting it, as well as the priming sub-system they use. Also the cross-server (aka site-vars) seem like a good idea as well - a basic curl POST request moving around json data could potentially serve as a half-reliable cross-server config propogator.

Seeing my code used makes happy ... very happy, indeed.

--
I'm willing to make the mistakes if someone else is willing to learn from them.

posted at: 22:27 | path: /php | permalink | Tags: , ,

Mon, 21 May 2007:

Inclued gives you a Clue !

Previously while talking about inclusion checks I had included a few helpful digraphs of php includes. Those were drawn with the help of a gdb macro and a bit of sed/awk. But that becomes a real hassle to actually use very quickly while inspecting fairly large php applications.

The solution to repeatable include graphs from php came from the way the include_once override hack was implemented. By overriding the INCLUDE_OR_EVAL opcode in Zend, I could insert something which could potentially log all the various calls being made to it.

That is exactly what inclued does. The extension provides a single function which provides a copy of the collected data, aptly named inclued_get_data().

<?php 
  include("x.php");
  print_r(inclued_get_data());

The above peice of code returns an array with too much information.

Array
(
  [includes] => Array
    (
      [0] => Array
        (
          [operation] => include
          [op_type] => 2
          [filename] => x.php
          [opened_path] => /tmp/x.php
          [fromfile] => /tmp/z.php
          [fromline] => 2
        )
    )
)

Overriding the opcode implies that more information can be collected, like whether the include parameter was a constant (i.e APC can optimize it to a full path), the function the include was in (__autoload *cough*) and the stream wrapper (phar: for instance). The information also includes info about whether the include_once was ignored because of duplication.

Only Data: The extension however does not make any judgements on the code. I have resisted the temptation to put an error log saying "If eval() is the answer, you're almost certainly asking the wrong question" . But there are interesting ways to massage this data into something useful. For example, I just put the following lines into the Wordpress index.php.

$fp = fopen("/tmp/wp.json", "w");
fwrite($fp, json_encode(inclued_get_data());

Loading the JSON data and pushing out a graphviz layout was almost too easy. Here's a clipped section of how my graph ended up looking.

The graph shows about 40-odd includes being hit for a single request, but isn't messy at all. Hopefully, someone finds an actual peice of sphagetti includes which shows off the beauty of this ext (remember, this shows include_once misses & hits).

Hope this helps people debug "Where is this include coming from ?" question I've run into so many times :)

--
It was the kind of mental picture you tried to forget. Unsuccessfully.
            -- Terry Pratchett, "The Light Fantastic"

posted at: 19:43 | path: /php | permalink | Tags: , ,

Thu, 10 May 2007:

How to Dismantle an APC Bomb

APC user cache is cool. It provides an easy way to cache data, in the convenient form of hashes & more hashes within them, to share the data across processes. But it does lend itself to some abuses of shared memory which will leave your pager batteries dead, disk full of cores and your users unhappy. Eventually, the blame trickles down to splatter onto APC land. Maybe people using it didn't quite understand how the system works - but this blog entry is me washing my hands clean of this particular eff-up.

apc_fetch/_store: The source of the problem is how apc_fetch and apc_store are used in combination. For example, take a look at countries.inc from php.net. The simplified version of code looks like this.

if(!($data = apc_fetch('data'))) {
  $data = array( .... );
  apc_store('data', $data);
}

There is something slightly wrong with the above code, but the window for the race condition is too small to be even relevant. But all that changes the moment a TTL is associated with the user cache entry. For a system under heavy load, all hell breaks loose when a user cache entry expires due to TTL. Let me explain that with some pretty pictures and furious hand-waving.

Each of the horizontal lines represent an individual apache process. The whole chain reaction is kicked off by a cache entry disappearing off the user cache. Every single process which hits the apc_fetch line (as above), now falls back into the corresponding apc_store. The apc_store operation does not lock the cache while copying data into shared memory. So all processes are actually allowed to proceed with the copy into shared memory (yellow block) in parallel. The actual insertion into the cache, however is locked. The lock is hit nearly simultaneously by all processes and sort of cascades into blocking the next process waiting on the lock.

Lock, lock, b0rk !: The cascade effect of waiting on the same lock eventually results in one process locking for so long that it hits the PHP execution timeout Or the user get bored and just presses disconnect from the browser. In apache prefork land, these generate a SIGPROF or SIGPIPE respectively. If for some reason that happens to be while code inside the locked section is being executed, apache might kill PHP before the corresponding unlock is called. And that's when it all goes south into a lockup.

So, when I ran into this for the first time, I did what every engineer should - damage limitation. The signals were by-passed by installing dummy signal handlers and deferring the signals while in locked sections. Somebody needs to rewrite that completely clean-room, before it is going to show up in the pecl CVS. The corresponding cache slam in the opcode cache is controlled by checking for cache busy and falling back to zend_compile - but the user cache has no such fallbacks.

I wish that was the only thing that was wrong with this. But I was slightly misleading when I said the copy into shared memory was parallel. The shared memory allocator still has locks and the actual allocation looks somewhat like this for 3 processes.

The allocations are interleaved both in time and in actual layout in memory (the bottom bar). So adjacent blocks actually belong to different processes, which is not exactly a very bad thing in particular. But as the previous picture illustrates, every single apc_store() call removes the previous entry and free's the space it occupies. Assuming there are only three processes, the free operation happens as follows.

The process results in very heavy fragmentation, due to the large amount of overlap between the shared memory copy (apc_cache_store_zval) across processes. The problem neatly dove-tails into the previous one as the allocate & deallocate cost/time increases with fragmentation. Sooner or later, your apache, php and everything just gives up and starts dying.

There are a couple of solutions to this. Since APC 3.0.12, there is a new function apc_add which reduces the window for race conditions - after the first successful insertion of the entry, the execution time of the locked section is significantly reduced. But this still does not fix the allocation churn that happens before entering the locked section. The only safe solution is to never call an apc_store() from a user request context. A cron-job which hits a local URL to refresh cache data out-of-band is perfectly safe from such race conditions and memory churn associated.

But who's going to do all that ?

--
In the Beginning there was nothing, which exploded.
                -- Terry Pratchett, Lords and Ladies

posted at: 13:27 | path: /php | permalink | Tags: , ,

Thu, 26 Apr 2007:

include_once: "Mostly Harmless"

For any php performance freak, include_once has been a pain in the neck for a long time. In a previous post I had talked about how the common workarounds affect something like APC. With the release of APC 3.0.14, there is a decent workaround which doesn't require any changes to the php code. But first, let me drag out another almost-workaround to the whole include_once problem.

rinclude_once: So we create a new function for the purpose. Let me just call it rinclude_once and use that everywhere. The function takes in the filenames, pushes it into a hash before including it.

function rinclude_once($file) 
{
  global $rinclude_files;
  
  if($file{0} != '/') return include_once($file);
  
  if(isset($rinclude_files[$file])) return;

  $rinclude_files[$file] = true;

  include($file);
}

This bit of code works - but only for absolute filenames. I put it through its paces without apc against include_once. Surprisingly, without APC, this code is slower than the php engine's include_once checks. That somewhat makes sense because the extra include for the rinc.php makes a slight dent into the compile and execution time, overshadowing the cycles wasted in the include_once syscall land.

Excluding the slight blip in performance, both the bits of code are nearly neck-to-neck. The real cost of include_once is only evident when you throw in APC. For every file which was included, include_once opens the file before checking for multiple inclusions. The extra system call shows up in the graph below. But the rinclude_once does not work at all for relative path includes (the second pair of bars) and therefore trails badly in this race for performance.

include_once_override: The solution to this problem is freakish. APC meddles with the brains of the Zend interpreter and inserts its own version of ZEND_INCLUDE_OR_EVAL opcode handler. The new handler does not indulge in the gratuitous fopen idiocy present in the default handler (but it does fopen once) and checks for the filename in EG(included_files) hash before doing a normal include() (thank pollita for that). And it should come as no surprise that the C module outperforms the php land equivalent.

But all of these only work for absolute paths as the pathetic numbers on the relative path includes shows. That is where the path canonicalization kicks in. In APC's nostat mode, the filenames have to be absolute paths for the mode to be useful. But rather than force everyone to modify their include lines, APC rewrites the constant strings into the corresponding path names, after lookup. Essentially it converts all relative path includes, such as those from the pear paths, into absolute path includes. This works well with the stat=0 mode because file modifications are ignored after caching.

At last, we see both the relative and absolute includes touching the same level of performance - because they are no different from each other in opcode land. But as you can clearly see, that does not improve the performance of the relative include for the rinclude_once because it does dynamic includes. The opcode cache cannot determine what the value of $file will be for the include_once($file); line and cannot optimise that. The performance actually takes a dive because the relative path name passed in has to be full resolved for every request.

But having said all this, the same benchmark with plain includes is faster than any of these. I think there is a fair bit of optimisation left in this beast, but what is needed for that is the rest of the world to disappear while I code. Deep hack is hard to achieve when you're a ....

--
You can't second-guess ineffability, I always say.
              -- Good Omens

posted at: 02:27 | path: /php | permalink | Tags: , ,

Thu, 05 Apr 2007:

APC 3.0.14 Released

APC 3.0.14 (code named "A bigger boy made me do it, sir") went out a couple of days ago - read the release announcements. The major things in the release is a fair bit of performance improvements for those don't use threads. Also I've figured out a quick way to limit memory fragmentation when APC user cache (apc_fetch/_store) is heavily used - the new fraglimit fixes should solve all the small fragment issues with 3.0.13. And following my recent obsession with drawing pretty graphs for everything, here's how the old version looks compared to the latest code (requests per second for an include_once benchmark).

To get to such levels of performance, the code has some configuration parameters that can be set. The apc.localcache creates a process (yes, not thread) specific lockless cache which is basically a layered shadow cache ontop of the same shm data. The apc.include_override_once is also now usable because of the appropriate checks put in to reduce the overhead of include_once. And now, when you enable apc.stat there's a bit of code which pre-computes the path of the included file so that it can be effective for includes with relative paths or from include_path dirs.

The release is hopefully stable enough to provide someone with enough ramp-up time to get started, if I stop working full-time on APC. I've spent a fair bit of time stabilizing basic functionality and have kept most of these optimisations optional, to be able to look at other work for a while.

--
Periods of productive stability, interrupted by bursts of test-bed change is much less disruptive than constant ripples of change.
              -- Fred Brooks Jr, "The Mythical Man Month"

posted at: 04:11 | path: /php | permalink | Tags: , ,