Life Codecs @

Ruminations. Reflections. Refractions. Code.

Aug 11, 2009 - general philosophy software dev

On speaking out of one’s posterior…

Warning: Colourful language ahead. (My colours are way duller than most though, so your mileage may vary.)

Every now and then I have episodes of deep reflection on languages and semantics, and not just programming languages either. A common phrase for one speaking junk or bullshit is to ‘speak out of one’s @$$/arse/{insert other posterior synonym} (henceforth aliased to the less-accurate-but-will-do term $POSTERIOR in the interest of the DRY principle)’, or ‘did you just pull that out of your $POSTERIOR’, and so on. In my ever so humble view, these phrases and their variations should be used rather carefully and I am not simply looking at it from the viewpoint of manners and aesthetics either. Let’s consider a few comparisons:

The excretory organs, including parts involved in the aforementioned $POSTERIOR expel toxins and unused junk out of the body, ensuring normal functioning of the digestive system, and in fact the body as a whole all things considered – you are what you eat and all that. In many cases, when one speaks out of one’s $POSTERIOR, it is often a trait that is repeated, because one is still evolving, as we all are, or perhaps has chosen not to evolve – also a choice made by many. Neither good nor bad; it just is, no judgement (no, really). The point is that more often than not, this wannabe-$POSTERIOR produce is not expelled for good, rather its source is often more like a bottomless pit (no pun intended.. well maybe just a little).

An astute reader (like yours truly, who just thought of this, teehee) will also bring up that even in the case of the true $POSTERIOR, it can be a bottomless pit – for one keeps eating and recycling, more so if the intake is … excessive – but the crucial invariant here is that output is always less than or equal to input (in fact equal is quite unlikely I think?) for true $POSTERIOR. Contrast this with case of speech or ideas ejected from wannabe-$POSTERIOR: even without additional intake (i.e. no new incoming less-than-valuable ideas to process), the junky output is sometimes reduced, often remains constant, but usually increases. On the rare occasion, it is eliminated. Quite a different invariant, yes?

Hence these phrases make use of flawed comparisons, i.e. wannabe-$POSTERIOR <> true-$POSTERIOR, they are not even all that similar.

These phrases in fact do a disservice to the true $POSTERIOR. They give $POSTERIOR a bad name. The $POSTERIOR works in all ernest supporting life. It is a Divine gift (have you considered life without it?). The bullshit output via the wannabe-$POSTERIOR, on the other hand, quite simply, does not necessarily do the same.

I shall however submit that the outputs (wannabe-output vs true-output) share many more traits, and are worthy of comparison. But let us not discredit true $POSTERIOR unnecessarily.

Please consider the ideas put forward in this post the next time you decide to use phrases involving $POSTERIOR.

Thank you. I wish you, and your $POSTERIOR, fragrances of heavenly descent.

PS. I have also tagged this as software_dev, for I think they kinda explain invariants and DRY rather nicely.

Jul 29, 2009 - software dev

PHP & the APC Bytecode Cache

It is not often that I write about performance – okay more like never, and more so about PHP performance! But this is a must share methinks. On a personal project, I have been forced to use PHP for various reasons. Being rooted very much in the Java space, I went on a hunt for reusable stacks. I finally settled on:

  • Kohana for the MVC framework.
  • Doctrine for the ORM layer, since Kohana’s ORM annoyed me no end. Having worked with Hibernate in Java land, I was offended that Kohana called their implementation ORM – apologies folks, I am very biased here, the MVC bit, module system, cascading file layout, and hook system of Kohana is otherwise neat.

In the interest of ease of hosting and the existence of a vast knowledgebase from PHP’s standpoint, I settled on MySQL for the database. And of course, I use Linux, Debian in particular.

Anyway, Kohana tries to be extremely shared-hosting friendly, so it does not require having a long-running FastCGI process for example (not sure if there’s a recommended FCGI set-up), everything is loaded dynamically as per needed, no preloading of files in memory. Basically, every request, it loads up a bunch of PHP files, any startup hooks, routes the request to your controller, and your controller in turn loads up one or more classes to get its job done – DB queries, etc., you name it. For those of you in Java space, think of it as loading your servlet context at each request (yes, you heard me right) – somewhat of an exaggeration of course, but with enough stuff to load, it may very well be a suitable analogy, and in some ways worse — the source file is interpreted each time, no bytecodes cached by default (yet!). Even with this overhead, it is quite fast, very impressed.

But then I hooked in Doctrine using the integration module kindly provided here (had to upgrade the Doctrine version internally, but the Kohana hook points did not require changes, cool stuff). Now, Doctrine is a full-featured ORM, so it does have its overheads, it also features DQL – the Doctrine Query Language (inspired by HQL), which means for every DQL statement, it first parses it to the target SQL before execution. This caused each request to become a memory hog, from around 1-3 MiB per request prior to Doctrine usage (as output by the Kohana renderer, which in turn uses PHP’s builtin memory_get_usage function), it was now in the 8-12 MiB range per request – consistently!

The Doctrine documentation (v1.1 used here) has a rather decent section on performance, there were 3 main infrastructure-y (as opposed to application code) recommendations:

  1. Use a bytecode cache
  2. Use the doctrine query cache with an appropriate driver
  3. To minimise I/O due to multiple file inclusions, compile the Doctrine framework with a provided compile() function to get one large merged file encapsulating most (all?) of the Doctrine framework

Taking the above one at a time:

  1. There are a few bytecode caches around, but one that caught my virtual fancy was the Alternative PHP Cache (APC) – mainly because it seemed to be the easiest to install (on Debian, aptitude install php-apc) and most well-integrated – I feel that bytecode caching should be a default feature included with the runtime, and the buzz on the net seems to indicate that APC is more or less marked for inclusion by default into future PHP installs (but I could very well have misinterpreted the buzz :P). APC uses shared memory (shm) segments to cache PHP bytecodes, which while requiring some dedicated memory (duh), also makes it blazing fast. It does not, at least by default – have yet to explore – cache on disk (contrast this to Python .pyc files). I have not tuned any APC parameters, in my default install (which checks for source file changes to determine re-caching), it has literally brought the memory usage back down to 1-5 MiBs per request, with an average of 2.5 MiB, the higher end of the scale seems to occur when I load object graphs rather than associative arrays (another Doctrine recommendation, prefer array hydration over object hydration if you do not need the business logic on the objects). As my friend Tom would say, I am a happy camper! The app feels snappy!

    So yeah – PHP bytecode caches can make a HUGE difference! And the default settings for APC such as checking source file timestamps are perfect for development, I’ll have to check if tuning this setting makes any significant difference. A couple of downsides I can think of to the shm approach (again haven’t check tuning params whether APC does disks, etc.):

    • If you disable the source file checking, you’re potentially going to have to restart the process that allocated the shared memory if using a persistence process, I’ve only used FastCGI, not so sure how it affects mod_php with Apache
    • Also not sure how using dedicated shm for a process goes with shared hosting providers!

    • Prior to using the bytecode cache, I had enabled the Doctrine query cache, using the SQLite driver – so an SQLite database (direct file-access-based DB, no dedicated server process) is used as a cache, and this actually increased my memory usage and response times :P. Dang. Essentially what the query cache does is prevent the re-parsing from DQL to SQL each time, it uses the DQL as a key into the cache (or so I think that’s how it should work!) – effectively the query cache is a cache of prepared statements. However, Doctrine also comes with an APC driver for the query cache (another reason to use APC!), so once I had APC enabled, I replaced the query cache SQLite driver with the APC driver, not bad, it saves a further 0.3 to 0.5 MiBs per request!

    • Unfortunately, the compiled (merged to be more precise) Doctrine PHP file actually did not help me at all, it increased memory usage to about 7-8 MiBs after bytecode caching, before that it would’ve easily spiked to the 20 MiB range! Another thing I noticed was that I had to run compile() several times to create Doctrine.compiled.php (that was later included in lieu of just Doctrine.php) since it kept running out of memory. I had to increase the memory limit for a script from about 30MiB to around 100MiB for compile() to successfully complete and produce the merged file. Considering the number of files to merge, and it probably did this naively by loading all or most of them in memory and writing it out as a whole (a guess here), it is not surprising I suppose. The file produced is too large I think – in effect we killed lazyloading of classes by forcing a big read. And yes, I made sure I wasn’t reading BOTH doctrine.php and doctrine.compiled.php (it would not work anyway, we get class redeclaration errors!). Hmm, wonder if the recommendation was made for a set-up where the Doctrine init stuff was maintained in memory across requests.

To summarise:

  • The APC PHP bytecode cache kicks the proverbial @$$ so hard it hurts. Install it!
  • Doctrine’s query cache is neat, but so far only with APC – using SQLite may in fact be detrimental (it is after all yet another File I/O operation, opening a connection, etc. as opposed to an in-memory APC call).
  • The merged Doctrine PHP file actually made things worse!

Phew, hope that was useful!

Update Sat 2009-08-01:

Brain dump: I was thinking some more about the compiled Doctrine file… logically it should be faster because APC would put it into memory, and then simply just check one PHP file for timestamp updates to decide whether the cache should be invalidated for it. So why is it slower and used more memory; actually memory usage seems justified just not that much more? I haven’t confirmed but I wonder if the bytecodes could not fit into the cache… and therefore had to be read from the file each time… and since there’s no lazy loading (the whole file is interpreted at once — the whole Doctrine framework!), it’s hungrier? Hmm, will confirm when I am inclined to.

May 21, 2009 - software dev

Hitler’s attempt at Agile Development

My friend Tom – who does not have a web presence unfortunately (hint hint, Tom) – sent this video link. It is one of the funniest things I’ve seen in a while, top-grade geek humour :-). Note the last sentence in the video. Classic!

Disclaimer: In case this gives the wrong impression – I am a fan of Agile methods, the emphasis on getting quality, well-tested code up and running is breath of fresh air; it takes a certain amount of discipline and culture though – the latter being more crucial in my view, and not always available. Humour like this really brings that much-spoken-about Real World™ to light :P.

May 3, 2009 - arts

“Visions”, an album by Ade ISHS

Disclaimer: I know Ade personally. Though I can’t handle crappy/mediocre music, from friends or foes!

“Visions” is Ade’s first solo album. Ade’s an independent musician based in Melbourne, Australia. The album is a collection of 9 beautifully composed piano pieces. My personal favourites are “Birth of Love”, “Sky” especially the second half of the composition, and “Rain 1”. The pieces are pleasant, relaxing, ambient but non-intrusive, and at times very inspiring (for one thing it digs out my wish to learn a musical instrument every now and then!). On average, each piece runs for about 7 minutes, and it is easy to sense the effort and love that has gone into the album’s creation. Note that although this is his first solo, Ade is by no means new to the music scene, having been involved in various events previously.

I highly recommend “Visions” if you like instrumental music. You can find out more about Ade and “Visions” at Ade’s website. In the spirit of fair use, trust, and freedom of platform choice (yay!) – Ade’s made sure the music is DRM-free.

Finally, only peripheral to this – but nevertheless quite interesting to geekoids and musicians alike – is that Ade’s Computer Science PhD is on Music Information Retrieval – so yeah, he likes his music :-).

Apr 27, 2009 - poetry

Bad Browser Haiku

Yes, I have nothing better to do. Enjoy the following haiku, which is potentially a pseudo-ku (not to be confused with Sudoku – gosh I am funny :P!), though I tried to stick to the rules and spirit of it.

Fox on fire, why, how.
Script gone rogue, Face... book, so cold.
Kill... the tab. Okay.

Yes, poetry is like code, and vice versa ;-).