debuggable

 
Contact Us
 
50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58

Should I refactor this piece of code?

Posted on 9/11/06 by Felix Geisendörfer

Most of you will probably be familiar with the concept of refactoring. It is the process of modifying an exisiting piece of code in order to improve it's readibility or structure while preserving it's original meaning or behavior. Automated testing makes refactoring a lot easier, as you can see if the changes you make to the code break it's original functionality. Anyway, this post was actually going to be about a little Helper function I wrote and felt like sharing and not about refactoring. But as it's the case with most of my code, the first step before blogging about it, is to refactor it for readability. But often I also tend to refactor code on here to be more generic / reusable then it needs to be for my actual application to make it more attractive to a broader audience. This was also the case with the little Helper function that can be used to convert a given amount of bytes into a more meaningful string using the appropriate units. The end result was that I decided to use the resulting code as an example for refactoring instead.

First of all, consider the function I initially thought about posting:

class UnitsHelper extends Helper
{    
    function bytesToString($bytes, $round = 1)
    {
        if ($bytes==0)
            return '0 bytes';
        elseif ($bytes==1)
            return '1 byte';
   
        $units = array('bytes' => pow(2, 0),
                       'kb'    => pow(2, 10),
                       'mb'    => pow(2, 20),
                       'gb'    => pow(2, 30),
                       'tb'    => pow(2, 40),
                       'pb'    => pow(2, 50),
                       'eb'    => pow(2, 60),
                       'zb'    => pow(2, 70),
                       'yb'    => pow(2, 80));
       
        $lastUnit = 'bytes';
        foreach ($units as $unitName => $unitFactor)
        {
            if ($bytes  >= $unitFactor)
                $lastUnit = $unitName;
            else
                return round($bytes/$units[$lastUnit], $round).' '.$lastUnit;
        }
       
    }
}

Now even so I expect most of us to deal within units much smaller then 1 TB on the web, I added the units that come afterwards for completeness. But that's not really important. What's interesting is is that this function above serves a well defined purpose and there is no strong reason for not leaving it as is. However, when posting about it on here (so I thought), it should be more generic as it's part of an object called UnitsHelper. What happened was a ridiculous little refactoring session turning this innocent little function into being part of an automagic unitToString function that works like the Model::findBy<Field>() stuff in CakePHP. Only that in this case it's UnitsHelper::<unit>ToString(). Before explaining any further, take a look at the result:

class UnitsHelper extends Helper
{
    function __call__($method, $param)
    {
        if (preg_match('/^(.+)toString$/i', $method, $match))
        {
            array_unshift($param, $match[1]);
            return call_user_func_array(array(&$this, 'unitToString'), $param);
        }
    }
       
    function unitToString($unit, $quanitity, $round = 1)
    {
        if ($quanitity==0)
            return '0 '.$unit;
        elseif ($quanitity==1)
            return '1 '.Inflector::singularize($unit);
   
        $unitFct = $unit.'Units';
        if (!is_callable(array(&$this, $unitFct)))
            return $quanitity.' '.$unit;
       
        $units = $this->{$unitFct}();
        foreach ($units as $unitName => $unitFactor)
        {
            if ($quanitity  >= $unitFactor)
                $lastUnit = $unitName;
            else
                return round($quanitity/$units[$lastUnit], $round).' '.$lastUnit;
        }
    }
   
    function bytesUnits()
    {
        return array('bytes'        => pow(2, 0),
                     'kb'           => pow(2, 10),
                     'mb'           => pow(2, 20),
                     'gb'           => pow(2, 30),
                     'tb'           => pow(2, 40),
                     'pb'           => pow(2, 50),
                     'eb'           => pow(2, 60),
                     'zb'           => pow(2, 70),
                     'yb'           => pow(2, 80));
    }
}

First of all: Note that this code will only work in CakePHP 1.2 where Helper extends Overloadable and therefor makes it easy to implement automagic functions like the one used here.

Other then that this is good example to answer the question about when to refactor. What I did in this case was to refactor the function, so it would be easy to use it's core functionality for other (linear) unit systems as well. So if I was to expect that I will deal with lot's of units in the metric system in a project, this little refactoring would become very useful as the amount of used units grows. However, in the little experimental project I'm using this Helper in (a code review assistant) there probably won't ever be another unit then bytes.

So what does this mean? Well I think it's obvious: The principle of YAGNI (You Arent Gonna Need It) says that in my case I just wasted a lot of time. However, in the scope of somebody elses project who might reads this post, refactoring this Helper was very benifital. In general I'm in big favor of 37 Signals philosophy of not trying to be everything to everybody, and sticking with what you need. So just because there are a lot of benifits to staying DRY (Don't Repeat Yourself), this doesn't mean it should become your #1 priority. If you are unsure if you are going to need something, don't code it. If you are not sure if refactoring some piece of code will make the world a better place, don't do it. Try to get a feel of where the critical parts of your application are located and focus on making them the most beautiful code you can imagine (or afford in terms of time investment ^^). But don't try to be as insane and refactor the little bytesToString function of mine if all you need is a bytesToString function ... ; ).

Now even if so this post didn't answer all your questions about refactoring, it at least provides you with (one) solution to a common problem ; ).

--Felix Geisendörfer aka the_undefined

 

Windows XP Apache PHP output problem

Posted on 3/11/06 by Felix Geisendörfer

... those were (some of) the google keywords I used to search for a solution to the Heisenbug that recently killed my productivity. Big thanks to meek, who has found the bug in Windows XP (no SP) that caused the problem:

Q317949: Socket Sharing Creates Data Loss When Listen and Accept Occur on Different Processes

I'm doing this post because I hope that Google will index it, and the next person hitting the issue will find my blog this way, so he doesn't have to go through the same issues that had to deal with.

So in case your system has the following SYMPTOMS, please have a look at the steps below that helped me fixing the issue.

Data loss may occur if all of the following conditions exist:

  • A bind of a listen socket is completed in a parent process.
  • An accept is completed in a child process.
  • Fast sends are happening.

Or with other words: Files served via php's output functions like fread, fpassthrough, readfile, imagejpg and so on will not completly be recieved by the browser causing incomplete & often disturbed display of the data.

The solution is quite simple, one can either update to the latest Service pack, or if that is problematic (because the SP is incompatible with one's hardware like in my case), one can simply replace the \Windows\system32\drivers\afd.sys with a newer version. So if you want to use the second option, here is what you have to do:

  1. Backup your existing \Windows\system32\drivers\afd.sys. In case something goes wrong you sure want to be able to fix it.
  2. Disable Windows automatic System restore functionality. Go to Start->Run->[type in 'control', then hit enter]->System. Select the System Recovery tab and check the box to deactive the functionality. Then hit Ok.
  3. Download a SP2 version of the afd.sys. (Thanks to my friend Robert H. for sending me his copy!)
  4. Copy this file to \Windows\system32\drivers and \Windows\system32\dllcache and overwrite the existing versions of the file. If Windows starts to bug you to insert the Windows XP CD, ignore it as you usually do with useless error messages ; ).
  5. Restart your System
  6. Check that the afd.sys in \Windows\system32\drivers is the one you downloaded and has not been replaced with the old one (you can use modified date, file size, md5 hash, version, or whatever suits you best to do this).
  7. Check if the problem still exists. If no, donate all your money to "Felix Geisendörfer" via Paypal ; ). (Just kidding)

Oh boy, this has been a true nightmare and I really want to thank all the people who have thrown in their suggestions in the Google Group post and the blog post I made. Now I can finally go back to being productive!

And for those of you wondering why I'm on Windows and don't switch to some *Nix system like all the cool kids: I like Windows. Don't get me wrong, there is an endless list of things I truly hate about it, and an even more endless list of reasons why I don't like Microsoft, but in reality it's the OS that runs all the Apps I love, that I know the best and that has the best GUI (to me). There are a lot of things I like about Mac OS X & Linux, and I will eventually start using one of them as a second OS at some point, but there are just too many drawbacks for me right now. Robert Nyman has a nice post highlighting some things about OS X that suck and I got quite a couple others in my head as well. But again, the truth simply is that me using Windows is a marriage of convenience. Nothing more and nothing less ; ).

--Felix Geisendörfer aka the_undefined

 

Baking in the big kitchen

Posted on 27/10/06 by Felix Geisendörfer

I just wanted to make a quick announcment that I'm officially a contributor of the Cake Software Foundation / CakePHP from now on. My first task is to work on some of the tickets for the Bakery and then add some enhancments and new features to it. After that I'm sure there will be more things for me to do, but I'll talk about those when I know more about them ; ).

 

Modeling relationships in CakePHP (faking Rails' ThroughAssociation)

Posted on 26/10/06 by Felix Geisendörfer

Ok, this has been something I found very delightful when I first heard about it (see this speech by DHH and slides). But somehow I didn't really manage to blog about so far, which means it's about time to do this right now. The topic can be simply discribed as: How to implement ThroughAssociations (as they are known in Ruby on Rails) in CakePHP. The idea behind it is pretty simple: Often you have two Models that are associated with each other where setting the associations themself via hasAndBelongsToMany isn't quite enough for what you try to to. Let's say you have Users and Groups and you want to not only store what Users belong to what groups, as you would normally do with a join table and a hasAndBelongsToMany relationship, but you would also like to track when a User became a Member of a certain group and maybe even add a note about this. If you try to add additional fields to your Users or Group tables you'll certainly run into issues very soon, because this information simply does not belong into any of those Models. Same goes for creating the relationship (i.e. adding a User to a Group) with the Controllers. There is no way to do this in a CRUDful way. You'll have to add odball functions like UsersController::add_to_group() or in other words create messy code.

I don't like to break the CRUD pattern, so when I first heard about the concept of modeling relationship I instantly fell in love with it. Basically what this is all about, is to create a Model and a Controller for your relationships. So in our Users<-HABTM->Groups example we would simply create an additonal Memberships Model. Our new relationship would be: User->hasMany->Groups through Membership. And Group->hasMany->Users through Membership. Now unfortunatly CakePHP doesn't support the through association, but there is a simple workaround to this. Instead of using the through operator we do this:

User->hasMany->Membership
Group->hasMany->Membership
Membership->belongsTo->User
Membership->belongsTo->Group

Because of CakePHP's recursive associations, we achieve a similar effect as the trough operator would have in Rails. Now the cool thing is that you can add additional fields to your memberships table like 'id', 'created', 'updated', 'note' and others.

In case you want to see it working, I setup a little scaffolding demo here:
http://demos.thinkingphp.org/relationship_modeling/

I'll probably have to take it down due to Spam soon, but meanwhile feel free to browse around and to perform non-destructive operations. I didn't add any validation, nor a check for forbidding double Memberships, but this would be easy.

In case you are interested in the code, here it comes:

class User extends AppModel
{
    var $name = "User";    
    var $hasMany = array('Membership');
}
class Group extends AppModel
{
    var $name = "Group";
    var $hasMany = array('Membership');
}
class Membership extends AppModel
{
    var $name = "Membership";
    var $belongsTo = array('User', 'Group');
}
class UsersController extends AppController
{
    var $name = "Users";
    var $scaffold;
}
class GroupsController extends AppController
{
    var $name = "Groups";
    var $scaffold;    
}
class MembershipsController extends AppController
{
    var $name = "Memberships";
    var $scaffold;
}

And not to forget the SQL Dump:
[sql]CREATE TABLE `groups` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(11) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `memberships` (
`id` int(11) NOT NULL auto_increment,
`user_id` int(11) default NULL,
`group_id` int(11) default NULL,
`created` datetime default NULL,
`updated` datetime default NULL,
`note` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `users` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;[/sql]

Alright, I hope this is similiar enough to the ThroughAssociation in Rails to scale, in case there are some RoR folks listening, let me know if I missed something.

-- Felix Geisendörfer aka the_undefined

 

Using if statements to express dependencies

Posted on 21/10/06 by Felix Geisendörfer

When I was hanging out with a friend of mine last night (a php programmer as well), we were discussing some of the coding both of us are doing right now. At some point I told him that in my current script (the FTP/SVN deployment thing), I started to use some if statements to express dependencies. At first he didn't know what I ment, and to be honest I can't remember using this technic in a project before either, even so I think I've known about it for a while. I went on explaining it to him and after a couple of minutes we both realized it's probably one of those little coding tricks you might learn after a decade of coding and something that is used way to seldom in peoples code. For that reason I think it's definetly worth a post on here, even so some people might already know about it.

But let me start with another little story. I had a very excellent math teacher ones, and one of my favourite quotes from him is "Mathematics is nothing but concentrated lazyness". Already being a coder when I first heard him say this, I always wondered what he would think about programming under this aspect ; ). Anyway, even so Mathematicians and Programmers are lazy folks, there is one species to top them both easily - the compiler (or interpreter in our case). Compilers are written with no other goal then making them as lazy (err optimized) as possible. Everything that doesn't need to be executed should not execute, saving cpu cycles where possible. Now usually you only notice this in the execution speed of your application, and it doesn't have a huge affect on the way we write our code. Ok we might have heard of certain assignment technics to be faster then others, and that it can't hurt to unset() uneeded variables sometimes. But here comes one thing that is very easy to understand, but almost never used.

Let's think for a moment now we would be the PHP interpreter. We would run the code, and sometimes we would hit expressions in if statements. Our only interest is to decide whether the expression will evaluate to true or to false. So, being very lazy err optimized we would stop evaluating the expression if it becomes clear that it can't become true anymore. Confused? Let me show a little example:

if (false && funcA())

In this if statement there is no way for (false && [x]) to become true. So why should we even bother evaluating [x] (funcA)? Well we shouldn't. And that's how the php interpreter acts. In the example above, funcA() will never be executed, because there is no logical point in doing so.

Now while this can lead to unexpected behavior and might turn into a nightmare of debugging, you can also use it to express dependencies. Because just like a human being, the php interpreter reads expressions from the left to the right. So if we have several chained expressions, and all of them are connected via a logic AND (&&), then the compiler will stop his evaluation process as soon as one of the expressions returns false.

A (bad) example would be that we have three functions called funcA, funcB and funcC. Each of them will return either true or false, and we only want to execute funcB if funcA has returned true and only execute funcC if funcB has returned true as well. So we could also say funcB depends on funcA and funcC depends on funcB. Using the technique from above we can avoid writing several nested if statements and put all of those functions in one simple expression:

header('Content-type: text/plain');

if (funcA() && funcB() && funcC())
    echo "Success\n";
else
    echo "Failure\n";
   
function funcA()
{
    echo "funcA\n";
    return true;
}

function funcB()
{
    echo "funcB\n";
    return true;
}

function funcC()
{
    echo "funcC\n";
    return true;
}

So in order for you to be able to play around with this, I added some echo's to each function so you can change funcA and funcB's return value to false and see that the dependend functions will not get executed anymore.

Now of course, this is not the Übersolution to all your coding problems, and it complicates some things like error handling. But just to show you a practical application from my current deployment script efforts, here comes a simple real world example:

if ($recursive==true && !$this->__ftpUploadDir($Ftp, $localPath.DS.$localFolder, $remoteFolder, $recursive, $bruteForce))
    $success = false;

This 2 lines of code contain quite some logic. The first question is if the $recursive parameter is set to true, if not skip the next command. But if the parameter is true, then make the recursive function call, and only if this call fails, switch our $success variable to false.

But I would not always recommend this technic. For example one part of my script looks like this:

if (!$this->__svnExport($Svn, $tmpFolder))
    return false;
   
if (!$this->__ftpConnect($Ftp))
    return false;

if (!$this->__ftpUploadDir($Ftp, $tmpFolder, $remotePath))
    return false;

Of course I could put all of those function calls into one single if statement. But would that really make my code more readable? I don't think so. There are cases when this technic will make your code more beautiful and easier to maintain, but there will also be many where you shouldn't bother to abuse the compilers lazyness ; ).

Oh and one thing I already mentioned was that this technic can complicate error handling. Because if you run several functions in a chain like this, and one fails, there is no easy way to figure out which one did. The best solution to this problem, is to use the powers of OOP. Because if all functions are executed on the same object, you can simply create an lastError variable for that object, and check it's value if the total expression evaluated to false.

Alright I hope this was useful to some folks out there, I certainly consider it to be a nice addition to my coding habits.

Little Update:
I just put this technic to another nice real world use, this time without an if statement:

$success = $DeployScript->beforeFilter($deployAction) && call_user_func_array(array(&$DeployScript, $deployAction), $params) && $DeployScript->afterFilter($deployAction);

if ($success==true)
    $DeployScript->log($deployClass.'::'.$deployAction.'() was successful.', 'success');
else
    $DeployScript->log('There was an error while executing '.$deployClass.'::'.$deployAction.'()!', 'failure');

-- Felix Geisendörfer aka the_undefined

 
50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58