Contact Us
53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61

Test Driven Development in Real World Apps

Posted on 10/9/06 by Felix Geisendörfer

In this post I want to talk about my experiences with using test driven development as well as sharing my oppinions/attitude on programming in general. It's kind of a behind-the-scenes regarding my current thoughts & approaches towards coding. But let's start with TDD:

So far I've had tons of fun trying to create php code using test driven development. It caused my code to become a lot more structured/consistent because writing the tests shows you interfaces problems right away. This is because the process of writing the test forces you to think a lot more about *how* you want to use some class / piece of code vs. your traditional "how can I get task x done"-thinking.

Another benefit is, that once your code can do the task you want it to do, you are lot more likely to refactor and improve it. I used to write (complicated) code that I wouldn't touch any more after a while, just because I was afriad it would break something without me noticing it. When using test driven development I feel a lot more like improving things, just because I know my test case will notifiy me instantly if something breaks and what exactly went wrong. Great stuff, I love it!

Now I already talked about the fact that I'm working on an Image Library called Kaizhi. It initially started out with my need to resize & crop images for a web page I'm working on. I had written and rewritten image manipulation code over and over again in the past and I finally decided this would be the last time I do this. I'm simply one of those people who enjoy the process of writing code a lot more then the actual results. This often causes me to create little libraries for things I just partially need and which I often release on this blog afterwards. You could argue that this makes me a bad pragmatic programmer, but I feel really uncomfortable if a deadline seems to get into the way of code quality and I much rather spent more time on a piece of code in order to do it "the right way" instead of "the quick & dirty fast way". Now ideally those two meet in the middle, but usally it's complicated to make this balance working well.

But back to the Image Library I'm working on. For the reasons mentioned above I decided to try to write a highly reusable, light weight general purpose image manipulation library. Afer consulting wikipedia's list of artists, I decided to name it Kaizhi after Gu Kaizhi, an ancient chinese painter who wrote three books about painting theory. I've not used TDD for the entire library, but when I started to work on the GifFile class I knew it wouldn't be doable without good testing. Well actually that's a lie. My first attempt to write this class was done without any test writing and resulted in very poor code and finally I wasn't able to continue it any more. So that's the real reason I started to use TDD for it ; ).

But back to the class' origin again. My initial reason for writing a binary gif encoder/decoder was that the GD Library for PHP can create gif files, but unfortunatly no animated ones. But since the GIF documentation can be easily found on the web (87, 89a) I thought it would be a fun challenge to convert several gif files created by the GD library into one animated file.

The problem with using TDD on this was that I didn't know the GIF format very well, so there was a good chance I would write faulty tests. To avoid this, I decided to pick 3 random gif files from my computer and to take them apart manually and save the expected decoding information into some YAML files. Those would look like this example:

        version: 87a
        width:  300
        height: 300
            global_color_table:      true
            resolution:              7
            sort:                    0
            global_color_table_size: 256
        background_color_index: 0
        pixel_aspect_ratio:     0
        - type: image_descriptor
            seperator: "0x2c"
            left:      0
            top:     0
            width:     300
            height:    300
                local_color_table:      0
                interlace:              0
                sort:                   0
                reserved:               0
                local_color_table_size: 0
        - type: image_data
        - type: extension
            label: 254
            comment_data: "#x36felix was here with a hex editor, and had a lot of fun"
          info: comment

So while I added more and more decoding code to my class I completed the expected YAML results for my selected samples. I also wrote a useful little function that would automatically run asserts, comparing the results of the class with the expectations from the YAML files. This way I could add new informations to my YAML samples that would cause my tests to fail until I completed the functionality inside the class. This form of TDD was very easy to work with since I only had to change my test case a couple times to improve the YAML/results comparision function while I got constant feedback on what caused my tests to fail. So like if I was working on an ideal case to demonstrate the advantages of TDD, I had more then one occation, where tweaking one function broke another which might have went unnoticed withouts the tests. So this was great. And the best thing was that I was writing very little testing code and lot's of real application code.

But I still had one big concern: What if my sample data was wrong and therefor my code just satisified the tests instead of the real task of decoding a gif file? That's when I had a really good idea. It basically came straight from my childhood. When I was young I loved to take things apart to figure out how they worked and then put them back together and hoped I didn't break them. Now when working with files you can do the exact same thing. You can simply write an encoder for the format you work with, and if you are able to produce the *exact* same file by decoding and then encoding it again, you would know you've written some decent code. So I did that, and after I was done I simply added ~50 GIF more files to my tests and checked if I could take all of them apart and put them back together correctly. That unvealed a couple more minor issues that the sample files couldn't highlight and after fixing those, it really only was a 5 minute task to take several gd gif files and convert them into an animated gif.

But ok, I promised you to talk about Real World Apps, and I know most of you guys don't code Image libraries all day long, and this "take apart & put back together" technic doesn't work quite that well with MySql or Http post data. So if you are looking for a very detailed tutorial on that stay with me until I complete my CakePHP testsuite called CakeTaster. But what this post offers in terms of real world apps, is a little insight in how you can apply TDD. TDD does not mean you have to test every function of the class you work with (that's extensive unit testing), TDD just means that your tests define the requirements your application (classes) need to meet. By thinking of TDD like that, you can get the best out of testing without spending an horrible overhead of time on it.

--Felix Geisendörfer aka the_undefined


Update to the RSS feed parser Model

Posted on 6/9/06 by Felix Geisendörfer

A couple of days ago I got contacted by James Archer of Forty Media who pointed out a little issue with the RSS Model I developed a while ago. Even so it works well for parsing blog feeds like the one wordpress is putting out, it had difficulties with podcast feeds or other feeds that use node's with a notation like that:

This was due to the fact that I forgot to consider this node type in xml.

While trying to fix this issue I noticed that some feeds use line breaks inside their tag attributes. For example something like this:

<node attr1="valueC"

So I ended up remodelling my regex that matches the elements inside the <channel> and <item> elements of an RSS feed. What I came up with looks sort of nightmarish if you aren't pretty familiar with regex, but it has a lot of advantages to the original one. So for those interested in Regex here comes a little comparision:

The old Regex looked like this:

It had two big weak points:

  • It assumes there are no new lines in the . character classes which is wrong
  • It does not match <node ... />-style nodes

The unoptimized version of the new Regex looks like this:
/\<(.+)( .*)?\>(.*)\<\/\\1\>|\<(.+)( .*)?\/\>/sU'

Now while this verison fixes all the issues from the regex above it still has one problem: it's slow. For a normal sized RSS feed this regex causes the parsing to take ~1 second. Not too bad for one feed, but on my Cake News site I use 10 feeds right now and so this get's me to 10 seconds of feed parsing.

The problem is the /s modifier that turns on line break matching for the dot (.) character. Now in my regex I only need this behavior on attributes and node values, but not for the node names. So I came up with this little optimization:
/\<(.+)( [^\x00]*)?\>([^\x00]*)\<\/\\1\>|\<(.+)( [^\x00]*)?\/\>/U

Here I removed the /s and replaced the dot's where I needed line break matching with the character class [^\x00] which basically means "match any character but 0x00". Since I doubt there is a RSS feed out there containing 0x00 (binary) this should be a save thing to do. It speeds up the parsing by a factor of about 4x which is nice.

Get the new Version

So if you stayed awake while reading through my regex explanation here comes the reward in form of the link to the new version of the RSS model.

-- Felix Geisendörfer aka the_undefined


Hacking a commercial airport WLAN

Posted on 30/8/06 by Felix Geisendörfer

Welcome & visitors. Read this follow up post if you care about the story of this article.

Update 06:20pm: My luggage just arrived - I'm happy ; ).

Yesterday I left Atlanta, GA after having spent 6 weeks of my summer there to visit my host family that I was staying with the year before as a foreign exchange student. The flight back wasn't all that great, it had 4 hours of delay, I missed my connection flight, had a long waiting time at the Düsseldorf aiport and when I finally got back into Dresden my 2 big suit cases were missing - and still are. But oh well ... they'll show up, eventually.

Meanwhile I want to share a little hack I did when I was waiting at the Atlanta airport. As most airports do these days, they have a wireless network there. Unfortunatly, they try to make you pay $7 for 24h, no matter how long you actually get on there. Since I didn't want to get ripped off, I started playing around with the network. Using LiveHTTPHeaders for firefox, I was able to see that they were redirecting me to their portal via a 302 whenever I tried to access a public site. So the first thing I tried was to deactivate redirects in the about:config, and hoped they would send me the site I wanted after their redirection header. This might sounds stupid, but checkout the post on cakebaker talking about it if you are unfamiliar with the problem. Anyway, it didn't help, I wouldn't see any page at all, and instead get a firefox error message. So back to the beginning.

I continued to try a couple other things, like checking if they eventually forgot some ports like 21 (ftp) or 110 (pop3). But no, all of them were properly blocked. After a lot of unsuccesfull attempts, I had some intuition telling me to check how they handle pictures. Without any hope of success I typed into my browser's adress bar, and to my big surprise I saw the page you see when you follow the link right now. The next thing I typed in was: but that didn't work. But I went on, and found that url's like worked like a charm. I found that I could easily visit sites like slashdot, google, or even this weblog, when adding a ?.jpg at the end of the url. The next logical step was to automate that. I downloaded greasemonkey.xpi?.jpg (*g*) and wrote a 4 line js script that would add ?.jpg to every link in a document. That way I was able to browse most sites without a hassle. Unfortunatly, I didn't get to explore this vulnerbility much more, because I had to board the airplane, were I waited another 3 hours due to a mechanical failure - without wlan : /.

So, anyway, wish me good luck with getting my luggage back and if you are ever stuck at an aiport with commercial rip-off wlan only, you might want to give this little method a try ; ).

--Felix Geisendörfer aka the_undefined

Update: Read this follow up post if you care about the story of this article.


Agility? Divide and Conquer? What?

Posted on 17/8/06 by Felix Geisendörfer

Ok, right now I'm deep inside the process of exploring some of the agile coding technics out there (including test driven development). But what I slowly come to realize, is that I'm still trying to pinpoint the concept of agility itself.

One thing I've learned so far: It's most likely not testing, not automatisation, not source control, not wiki's, not any of the stuff I've just finished about reading in "Practices of an Agile Developer". Don't get me wrong, I enjoyed reading this book a lot, I think the tips in it are awesome and I'll try to make use of them where I see need. But one thing that I think should be more emphasized is the non-technical aspect of agility.

Let me try to show what I mean by talking about living in a house (or appartment):

You just finished your dinner and you want to get right back to that interesting book/movie/... you were involved with before starting to eat. You take your plate and put it into the sink (the place were you put the plates from last night, and the night before). On your way back to the living room you notice the dirty carpet in the hallway and you hear yourself saying: "uhm, guess I should clean that up - one of these days". Back in the living room you see a pile of magazines, empty cups and old (are they?) phone call notes on the table -> "Hm, guess I should clean those up was well, ...". But back to your interesting book/movie/... you wanted to enjoy! ... - All the sudden it's 1:30am. You've got distracted by some other things (this new cool tv show, a friend calling on the phone, ..., etc.) and you decide it's time to go to bed. The next morning you wake up, the sink is still full of unwashed dishes, the carpet is dirty and the living room is a mess, but do you feel like cleaning them up, now, at 8 am in morning? You probably don't. This will most likely go on for a couple more days (weeks?) until you decide to do one big clean up session that you try to push off as long as possible. After you are done with it, you will feel a lot better for next 1-2 days, until everyday live slowley starts to turn your place into a mess again.

Alright, if this story does not sound a bit familiar to you, consider yourself lucky and go back into your IDE, this post is not for you. But if you are like me, who always talks about his desk that he wants to clean up the next couple days, that class that needs documentation and those other things that never seem to really get done - welcome in the club.

When you think about it, you'll probably agree that you could have a clean house (and not a messy one for 90% of) all the time if you would constantly wash the dishes, clean the rooms, and take out the trash. But realistically, you are not always going to do it. You don't always feel like commenting all your code, refactoring crapy interfaces and answer the emails in your filled up inbox.

So what to do? To be quite honest, I don't claim to have the ultimate answer to this one. But how about this:

"Divide and Conquer" -- (most likely) Julius Caesar

It's one of my favourite quotes of all times because it can essentially be applied to *any* kind of problem. The bigger and more complicated an issue is, the more likely it's made up of smaller chunks you could attack one at a time. The house is dirty? Next time you see those dishes - do them. Next time you see the carpet - clean it. Oh and before you watch this movie tonight, clean up the table in the living room. Don't try to do it all at ones, it's not agile. Sure, getting a big issue solved all at ones is a very satisfying feeling, but unfortunatly it's not big enough to motivate you earlie enough the next time it comes up. The same goes for code, projects, etc. You are behind the deadline of a project, but instead of sending out the email to the client, you try to finish at least this one promised feature to show him as an excuse before getting the email out? Don't. Send the email, let the client know what's going on. And if you still manage it to complete one or more of the features he'll be just as happy to hear it. Got this huge undocumented piece of code? Go in and comment 3-4 functions right now, and do it again over the next couple of days, try to find a rythm in it.

Ok, enough hypocritical ranting - back to reality. The question is still, how do you motivate yourself to be more of a google octopus instead of a microsoft wale? What technics work, which ones don't?

I'm trying my luck with progressive tasks lisks that have little progress bars you can fill up as a tasks come closer to completion. I set my watch to alarm me when I'm overtime on a certain thing I was working on. I write posts on my blog about agility ; ). And it does seem to have some kind of positive impact I think. But I'm more then interested to hear about your ideas / oppinions / experiences in this area. So in case you are one of those people I told to stop reading after the house story, what's your secret?

Oh and before I forget, a very nice article in this category that I read about on A List Apart a while ago: The Four-Day Week Challenge

--Felix Geisendörfer aka the_undefined


The ultimate CakePHP bootstrap technique

Posted on 15/8/06 by Felix Geisendörfer

Ok, I shouldn't use such a buzz-headline, but I was very happy today, when I discovered a new way to bootsrap CakePHP wihout having to render a page. This is especially useful when you try to embed CakePHP in existing php apps (drupal, wordpress, ..., ?), or when you try to write unit tests for highly coupled classes such as Controllers or Models.

The idea is very simple, and I felt sort of stupid that I didn't see it earlier. When you look in your app/webroot/index.php file there are the following lines at the end:

if (isset($_GET['url']) && $_GET['url'] === 'favicon.ico') {
} else {
   $Dispatcher=new Dispatcher();

Their only purpose is to intialize an instance of the Dispatcher class and to request a Controller action for our $url. But what it also does, is to check weather or not the $_GET['url'] variable contains the value "favicon.ico" in which case *no dispatching* is happening. Now this gives us an easy way for including the webroot/index.php file without automatically rendering a page:

$_GET['url'] = 'favicon.ico';

Now we can easily create instances of single controllers, models, etc. or use the Dispatcher to render a page. The only thing we might need to do is call loadController(null) for loading the AppController (or others) and calling loadModels() to load our model files.

--Felix Geisendörfer aka the_undefined

53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61