Blog » Debuggable - Node.js Consulting

What is refactoring?

Today I feel like writing about one of the most important topics concerning php development: refactoring. Make sure you check the other articles in the Refactoring Category. They will contain examples of refactorings and the like.

A definition

Refactoring is the process of changing a software system in such a way that it does not alter its outer behavior, but changes its internal structure. The goal is to improve the internal structure and to clean the code - in order to minimize chances of introducing bugs later. Summing it up when you refactor you improve the design of code after it has been written.

- "Refactoring" by Martin Fowler

Improving design in later stages

It's the same with most modern software development teams. Before the coding comes the design, but as more people work on the system, deadlines need to be met and the general motivation to work on the project sinks, design becomes more and more hacking. The integrity of the system according to the pre-defined design gradually fades.

Refactoring to the rescue. With refactoring, the opposite of this practice, you can take a bad design and rework it into well-designed code. Each of the steps you undertake is very simple. It might be as simple as moving one method from one class to the other or renaming a class' attribute. Yet the culmulative effect of these small changes can improve your overall design dramatically. They will make your code easier to understand, reusable and - most of the time - a lot shorter, too!

When should you refactor?

Often! Never let the system's design overwhelm you. When you feel something is not right, put your refactoring hat on and redo it. It will make your teammates count on you. Just imagine if you would not refactor the code. As more and more features are added, the system becomes more and more complex and you would need more and more time to think yourself into it everyday. With refactoring that is not so. Refactoring ensures your code reads like a book.

About switching your hats: that's essential. Never refactor a portion of the system when you are adding features to another. This will lead you to the road to hell, introducing bugs and - worse - you will have no idea which code is causing them. Ensure that you have the code you want to refactor covered with Unit Tests so you do not break anything.

When should you start optimizing?

Posted on 21/4/07 by Tim Koschützki

Premature optimization is the root of all evil! That is also true for PHP applications!

Some people say that it is better to defer tuning until after the coding is complete. This advice only makes sense if your programming team's coding is of a high quality to begin with, and you already have a good feel of the performance parameters of your application. Otherwise you are exposing yourselves to the risk of having to rewrite substantial portions of your code after testing.

My advice is that before you design a software application, you should do some basic benchmarks on the hardware and software to get a feel for the maximum performance you might be able to achieve. Then as you design and code the application, keep the desired performance parameters in mind, because at every step of the way there will be tradeoffs between performance, availability, security and flexibility.

Also choose good test data. If your database is expected to hold 100,000 records, avoid testing with only a 100 record database – you will regret it. This once happened to one of the programmers in my company; we did not detect the slow code until much later, causing a lot of wasted time as we had to rewrite a lot of code that worked but did not scale.

Make sure you use profilers to measure the bottlenecks of your application.

Optimising for-loops

Posted on 19/4/07 by Tim Koschützki

For-loops and how not to use them

The following tip is another one that could greatly increase your script's performance. The thing is quite simple, so let's look at a code example:

$arr = range(1,1000);

for($i=0;$i<count($arr);$i++) {
echo $arr[$i].'
';
}

This code is perfectly straightforward. It creates an array with the values ranging from 1 to 1000, keys from 0 to 999. Now the for-loop iterates over the array and echoes the array's values.

The problem with this code is that whenever the for-loop is executed, the count() function re-calculates the amount of entries in the array. For an array with only 1000 values this is not so much of a significant performance issue. Imagine, however, an array with 100 000 values! It could slow down your application a great deal!

Common sense to the rescue

The solution is pretty simple. Just calculate the length of the array before the for-loop:

$arr = range(1,1000);
$length = count($arr);

for($i=0;$i<$length;$i++) {
echo $arr[$i].'
';
}

What is the actual performance difference?

Let's try to measure the actual difference it all makes. For this we assign the array's values in the two loops to a variable called $b - not to cheat the script, but to make the output it los more simple, not having to scroll your browser window down to the actual time difference we need. :) Also to show a real difference, let's use an array of 100 000 values:

$start = array_sum(explode(' ',microtime()));
$arr = range(1,100000);

for($i=0;$i<count ($arr);$i++) {
$b = $arr[$i].'
';
}
$end = array_sum(explode(' ',microtime()));
echo ($end - $start).'
';

$start = array_sum(explode(' ',microtime()));
$arr = range(1,100000);
$length = count($arr);

for($i=0;$i< $length;$i++) {
$b = $arr[$i].'
';
}
$end = array_sum(explode(' ',microtime()));
echo ($end - $start).'
';

The script's output speaks for itself:

0.139598846436
0.0997688770294

The first script part that has the count() function within the for-loop runs 0.04 seconds slower. Now imagine using the wrong approach in a large application with many loops, many calculations and more complex calculations within the loops.

Calculating the length of an array outside the loop is common sense and a best php coding practice.

Happy coding all!

Most probable first

Posted on 18/4/07 by Tim Koschützki

What means the "Most probable first" principle?

When you are dealing with control structures in php, you have to ensure that you abide by the "Most Probable First" principle. That means, that whatever part of your control structure seems to be the one that is most likely to occur, should be the one after the if-statement. Here is a code example:

$r = rand(1,3);

if($r < 3) {
$a = 1;
} else {
$b = 2;
}

The script generates a random number between 1 and 3. The control structure check whether the value is 1 or 2, respectively. It is more probably that the value is 1 or 2 than it is 3. So you should check for that first.

Why is this beneficial for the scripts performance?

PHP is not going to parse the opcode for the parts of control structures that return false. So the variable $b is actually never created in php's memory and thus will not use up your machine's memory.

This is not relevant to any performance gains in small applications. However, think of large application with a couple hundred variables and control structures. You should make it a habit to go by the "Most probable first" principle, so you use it all the time.

New fix for array junkies: Set::merge assembles yummy arrays

Posted on 5/4/07 by Felix Geisendörfer

Hi folks,

long time - no post, as always - I suck. I intend to make up for it with a screencast on unit testing in the next days, but meanwhile I want to talk about my favourite data type in PHP again: Arrays. For those of you just tuning in, I already wrote about how Cake 1.2’s Set class eats nested arrays for breakfast a while ago and if you haven't read this post yet, go ahead and do it now ; ). Todays post features a brand new Set function called merge that was a side product of me working on a cool new cake class. If you've done a lot of array work in the past, you've probably have come in situations where you wanted to merge to arrays into a new one. Usually that's a no-brainer in PHP by simply using the array_merge function (or the CakePHP wrapper 'am'):

$a = array(
'user' => 'jim',
'pass' => 'secret',
'friends' => array('bob', 'tom', 'paul')
);
$b = array(
'pass' => 'new-password',
'last_login' => 'today'
);
debug(array_merge($a, $b));

/* Output:

Array
(
[user] => jim
[pass] => new-password
[friends] => Array
(
[0] => bob
[1] => tom
[2] => paul
)

[last_login] => today
)
*/

In about 90%++ of all cases, this will be the usual way one uses to merge two (or more) arrays into a new one. However, sometimes array_merge is not going to cut it. That'll mostly be because it does not behave recursive and merging nested arrays can lead to unexpected results:

$a = array(
'User' => array(
'name' => 'jim',
'pass' => 'secret'
)
);
$b = array(
'User' => array(
'pass' => 'new-pw',
'last_login' => 'new-pass'
)
);
debug(array_merge($a, $b));

/* Output:

Array
(
[User] => Array
(
[pass] => new-pw
[last_login] => new-pass
)

)

*/

This is a little counter-intuitive at least to me. I'd expect only the User.pass key to be overwritten the User.last_login one to be added. But instead array_merge just overwrites the entire 'User' key with $b's value for it. Now wait, isn't there a function called array_merge_recursive for this some of you might object? Well of course there is. However to me it's behavior is even more counter-intuitive then the one of array_merge:

$a = array(
'user' => array(
'name' => 'jim',
'pass' => 'secret'
)
);
$b = array(
'user' => array(
'pass' => 'new-pw',
'last_login' => 'new-pass'
)
);
debug(array_merge_recursive($a, $b));

/* Output:

Array
(
[user] => Array
(
[name] => jim
[pass] => Array
(
[0] => secret
[1] => new-pw
)

[last_login] => new-pass
)

)
*/

Now this time all 3 User fields show up in the new array, which is good. If one looks at the User.pass however, one will notice that instead of overwriting $a's value with the one of $b array_merge_recursive has taken both of them and thrown 'em into a indexed array. To me, that's really not what I want most of the times. My main need is to have complex array structures that hold default values (conventions) that I can then overwrite easily on demand (configuration) when calling a function.

Introducing Set::merge

So here comes Set::merge which works like one would expect array_merge_recursive to work before actually trying it out:

$a = array(
'user' => array(
'name' => 'jim',
'pass' => 'secret'
)
);
$b = array(
'user' => array(
'pass' => 'new-pw',
'last_login' => 'new-pass'
)
);
debug(Set::merge($a, $b));

/* Output:

Array
(
[user] => Array
(
[name] => jim
[pass] => new-pw
[last_login] => new-pass
)

)
*/

Another important thing to know about the behavior is how it deals with numerically index array items:

$a = array(
'Users' => array(
'jim', 'bob',
'count' => 2
)
);
$b = array(
'Users' => array(
'lisa', 'tina',
'count' => 4
)
);
debug(Set::merge($a, $b));

/* Output:

Array
(
[Users] => Array
(
[0] => jim
[1] => bob
[count] => 4
[2] => lisa
[3] => tina
)

)
*/

I could provide you a lot more examples but it basically comes down to the following Set::merge behavior:

Set::merge loops through all items of all arrays provided to it, and if it ...

... hits an array value: Acts recursively and merges it's value over the ones of the previous values for the current key
... hits an integer key: Recognizes it as a numerically indexed key and pushes the current value at the end ($a[]) of the current key value
... didn't do the above: Overwrites the value of previous arrays for the current key with the new one.

Oh and the function will also typecast non-array parameters into arrays for you.

So if all of this still leave you unsure about the way this function works you probably should check out the Unit Test Case for Set::merge. It should show you pretty much every imaginable aspect of how the function can be used (including passing other Set class instances to it).

Alright, I hope you my fellow array junkies are going to get a little kick out of this one. Other then that stay tuned for the promised unit testing screencast in case you are interested in getting started with the light side of the force as far as programming goes ; ).

-- Felix Geisendörfer aka the_undefined

debuggable