Why shouldn’t IT people look different?

October 19th, 2008

quick post: I think I need to reply to Computer Weekly’s blog entry - The IT Professional’s Survival Guide.  I think it’s heading in the right direction but got sent off at a terrible tangent at some point…
Firstly, the most important thing is not to “flatter“, or understand “where you fit in” it’s simply to learn how to communicate properly.  Many IT staff are proud of their logical outlook on life and rejoice in “simple” answers.

For example, years ago I was working with support engineers, a “request for change” ticket came in saying “can you help me organise my files better?” to which the reply was “no”.  That’s neither helpful, nor useful (even if it may be correct in some cases), and it’s not answering what the user wants and needs.  People remember good service and those that try and help them (and some evolutionary biologists claim that we “like” to help others - even if the computer business is a good model of the reverse!).

What the engineer should have done was simply ask for more specifics, it takes 10 minutes to do so, and even if you can’t help you can point them in the right direction (”sorry, I think you need to speak to Ted about this - he’s the head of Business Process re-engineering”).  It costs nothing and you’ll have a better relationship with the non-IT staff.

Secondly, “Dress like your peers” - I disagree almost as strongly.  As a young man (with hair) I was often sent to customer sites and heard the users say “thank god, he looks like a freak: he must be good” (not every time, but often).  Non-IT staff expect computer people to be nerdy and geeky, and often the nerdier and geekier they look, the better they must be.  In contrast, even IT staff have probably been concerned when a beautifully suited and presented individual comes in as a star “consultant” - it just doesn’t fit the mold.

Also, we’ve all seen engineers who wear a suit and yet look worse than if they’re in a bin-bag.

So, am I advocating looking terrible all day and wearing T-shirts with expletives on them?  No, really no.  But, dress to match your style and finesse and remember it’s not London geek fashion week - nobody will love you for a T-Shirt that says “all users are idiots” regardless of how much Geek kudos it grants you.

lastly, understand the business you’re in (regardless of your position).  When you’re in a marketing department’s IT team then ensure you understand what they’re marketing, how they do things and, at the least, the outline business processes.  That way when a user contacts you then you ask sensible questions related to what they do, they’ll respect you for it, and you’ll achieve more.

You’ll be amazed, even if you’re not a programmer (then you have no excuse for knowing what they do and you should hang your head in shame and make for the door with your head bowed) and you’re doing server support then the setup of the machines is still related to the business itself.  Even for a support engineer there are places where you can help that business run (or stop that business running better) by your actions: there’s no point you pushing windows OS if none of the software runs on it, but if the users need a document store with some workflow then you can ask if it makes sense to include Windows Sharepoint or IBM Portal or similar as part of the standard build, or alternately if the machines never do any high transaction work/high IO then why is the business paying out for RAID1+0 local disk arrays when it could be using a (relatively) low latency SAN solution and sharing disk between all the machines.

In summary, IT does have a place in businesses, but arrogance, laziness and ignorance of IT staff doesn’t.  Do yourself a favour, find a nice big mirror and take a long hard critical look at yourself, it’ll be not just good for your current job/contract/assignment, but it’ll help your work ethic (and your job satisfaction) more in the longer run too.

Bringing Software Projects Under Control

September 12th, 2008

One of the jobs that Code Wizards often gets is to help a software project that’s in trouble, and there’s a pretty standard list of things that we do each time to establish and baseline and see what we can do to help.

Identify the movers and shakers - who’s the head developer?  who handles the builds?  who developed the list of controls and processes (note: these don’t have to be documented, they can be anecdotal: “oh, Bob always does the builds then runs through a quick test before deploying it himself”)

  1. How long has the project been running?  How long has it been in crisis?
  2. How experienced are the people on the project?  Have they always worked at this location or a lot of places?  Have they got formal training?
  3. How are requirements handled?
  4. What’s the technical infrastructure like?  Is it all based around a windows SMB share where everybody deposits their code? Is there a bug ticketing system?  Are bugs handled by an Excel spreadsheet?
  5. What’s the subject of the Project?  From a personal perspective if you were asked to give a timescale for delivery (independent of what you see at the project) would it match those that the project has got or has atleast asked the business for?
  6. Is there a process for testing (at the Unit, Functional and Integration testing levels)?   How is User Acceptance Testing (UAT) handled?  How are user bugs and requests for change documented, triaged and executed?

By this point you should have a good idea of what the project is all about, what it’s trying to achieve, what’s in place and who’s in place.  So it’s time to put it all together.

At this point your opinion won’t be perfect and it’s important to remember that, you’ve seen a small slice of the project and you’ve got a flavour and opinions but few definitive facts.  I’ve seen bolshy and over-confident engineers examine projects and immediately announce who should be fired and promoted, and provide “concrete” steps that will “guarantee success”.  Ignore them, they’re the snake-oil salesmen of the 21st century and they’re often dangerous (you can often identify these people as their CV/Resumes read like a Hagiography and they’ve never worked on a project that’s been canned).

First of all attack the basics; no requirements or documentation? then insist that some are put together regardless of whether you’re fed some line about this being “extreme” and/or “very Agile”.  Input document packs should represent the basics: a mission statement (what you want to achieve), a general outline of strategy (how you’re going to achieve the mission), and an outline of technical aspects (what technologies are in use, why they were chosen and what’s forbidden).

Then look at configuration management and software management.  In this day and age there is no excuse for not using source control and a ticketing system, and I mean that, there are NO reasons not to have those things in place.  You’ll probably find that various people (including lazy developers) resent this idea, force it through anyway - i’ve never seen a project that runs better without source control and ticketing systems.

If there isn’t a specific requirement for source control then pick something that’ll work and is free (free is always easier to get into businesses than paid for software in our experience).  If most of the source code is text then you can’t go wrong with Subversion, it’s got a good UI for Windows (TortoiseSVN) it’s very similar to CVS and it’ll integrate directly into a software management tool called “TRAC“.  If you’re working in Visual Studio then go for Team System - it’s pretty good and it integrates into Visual Studio which will make the developers job easier.
TRAC should be an easy sell to the client, it’s free, integrates with Subversion (aka SVN) and reads the changelog, it’s open and extensible and includes a wiki that you can use for documentation.  OK it doesn’t do everything, but it does most things, and it does it well and doesn’t encumber the developers.  But I’ll blog more about TRAC in future.

Next establish basic processes and document the original and your revised versions (nobody else is going to do this for you - so put some headphones on and get your head down and get a basic set documented).  This gives you a way you can show people why they’re different and what the benefits are (if you’re in TRAC then do this in a nice diagram and put the new processes on the first page of the wiki).

Then start creating tickets for functional implementation and bugs and start showing how you can assign them and have them worked on.

Lastly, present what you’ve done to a mixed audience of the business owners of the project, the project and programme managers, testers, developers and analysts, and get ready to be flamed, heckled, insulted and generally disliked.

The truth is that few of us like being criticised, especially when it’s from an outsider so it’s not going to be an easy ride.  You’ll probably find acceptance comes from the business lead (who probably hired you to look at their project), then the programme manager, project manager, testers, team leads and managers, then the developers.  It’s a fair chance that the developer leads (aka “Corncobs” according to Anti-Patterns) will never accept the changes unless you present the ideas as being partly their idea, so do this if it makes your life easier.

Before disengaging you’ll need to help them run the new processes and tools and ensure that everybody is keeping to the requirements of their roles (are analysts generating requirements now? are the team leads ensuring that developers are always checking in with defined ticket numbers?  are testers entering and tracking their bug tickets?  are managers handling bug/function ticket triage and assignment?).

You’re probably thinking, ok those are good points, but why does it make sense to use an outside party to do this?  Well the reason is simple: you need somebody without affiliation to anybody in the project, you need somebody who doesn’t mind being disliked and asking difficult questions, and lastly you’re better off if that person can come in and do the work then leave - so then the team can act like it was all their idea anyway and collectively bond while moaning about the person who came in about the work.

Do teams ever welcome you, acknowledge the things that are wrong and help you remedy them?  Yes, of course but it’s few and far between and those are the managers, leads, developers, testers, and analysts that you should think of the next time you need to build a team to develop software as they’re worth their weight in gold.
Good Luck!

TIP: Getting VIM to open files in Unicode (UTF8) format by default

August 11th, 2008

Whenever i re-install VIM (aka Vi IMproved) i get the same issue: if i open a file encoded as utf-8 then vim will open it and show control characters in some places.

To fix this simply find your vimrc file (on windows you’ll find it in c:\program files\vim\ - it’s called _vimrc), open it (using vim :) ) and add the line

set encoding=utf-8

Save it and future sessions of vim will now correctly notice the formatting and pick it up accordingly.

Web Services - Picking the right one for the right job

July 24th, 2008

Picking your web service standard is an important decision and shouldn’t be taken lightly, and development teams shouldn’t be allowed to make the decision on their own.

Roughly there’s 3 “standards” to choose from; SOAP (Simple Object Access Protocol), XML-RPC, and “REST” (”REpresentational State Transfer”).  It’s worth touching briefly on what each is before continuing;

  • SOAP is very heavily pushed by IBM and Microsoft and has standard tooling within various IDEs and frameworks.  This means for a lot of developers this is the only Web Service standard that they consider.  It allows the Web Service to handle complex (and “rich” datatypes) both as parameters and for returning, and has a standard called WSDL that allows Web Services to be defined in a manner which is language agnostic (for those of you old enough; yes, this is very much like IDL and CORBA’s IDL) and is downloaded so the script can adapt things itself.  Best of all this means that you can both pass objects and retrieve them without having to realise how the information is transferred, which for programmers is a massive boon.  It works by encoding everything as XML and passing the XML as an encoded object to via HTTP GET, then the return values are passed back as XML, the SOAP framework does all the hard work of translating to and from XML on both sides.
  • XML-RPC seemed like a reasonable competitor to SOAP, but seems to have diminished over time. It too allows “rich” and complex data to be passed to and returned from the Web Service.
  • REST is the outsider, although there are toolkits and frameworks to help it relies on simplicity of parameter passing, although any data format can be passed back.  This means it’s hard to pass complex data, and what it returns needs to be interpreted (rather than being magically converted into objects/structs that are usable within the calling language).  It also frequently requires custom coding to create the web service, and most of that work will be manual.

So, you’re probably thinking, why would I go for anything but SOAP?  It seems the best, most widely used and easy to develop web service technology, so surely it’s the only choice? The answer is unfortunately not obvious..

The answer is related to who you’re creating the Web Service for (your “target audience”) and what you want them to do with your data.

If you’re on an intranet and you need .NET and Java code bases to cross communicate then SOAP is an excellent way to go, development is quick, it’s easily secured and toolkits are in place for both platforms to allow you to create and consume (and debug) Web Services very easily.

In addition, if you’re an internet based app and you’re working with medium and large enterprises, then SOAP could again be a good fit, especially if you’re looking at clients that are based heavily around .NET and Java.

If, however you’ve got a wide development language base within your intranet (using scripting languages such as python, php, perl, ruby etc.) then it may not be a good fit.  Scripting languages will need to execute more code before they can make the calls (and the calls may still have minor issues when they communicate with Web Services - despite the popular press SOAP still has issues and incompatibilities when things are getting complex) and you’ll need to start including additional libraries to support it.

REST really shines when you have either; a varied user base (you can’t check which languages consumers of the service will use), and/or some of the users will be small businesses or even single developers creating mash-ups.  This is because it’s incredibly simple - parameters are either encoded as part of the path or on the query string and it returns whatever is most useful.  If it makes sense to return CSV information from a REST Web Service then that’s just fine, JSON? yep fine again.

Because REST is based around a standard HTTP GET or POST call it’s easy to implement, even if you don’t have an easy way to open up a GET or POST call (for instance using cURL from just about any language - including shelling out to it if needed from UNIX shell scripts or MS-DOS bat files) you can custom code it - it’s trival even by creating a socket yourself and sending a GET HTTP request on port 80 (although i seriously doubt you’ll do that).
The other time you should seriously consider REST above the other choices is when you want to use the Web Service for more than backend server usage - for instance when you want javascript based AJAX to use the same information or you’re creating javascript based mashups.  Ever tried building a SOAP client usable from javascript?  it’s non-trivial and you’ll be fixing bugs from here to eternity.
So, let’s take a look at some code and see why SOAP is so attractive to developers and why REST is more difficult but more accessible.  For the examples we’ll create a fake music based web service that allows users to find tracks that have been created by an artist.

For a SOAP example (in PHP)

/* create an instance of the web service proxy that you'll call on to - pass it the link to the WSDL file that tells the SOAP client what the web service needs as parameters and what it returns */

$musicfinder = SoapClient('http://someexample.com/webservices/musicfinder.wsdl')

/* use the proxy object and call the "findTracks" method and pass the first argument (in this case the artist) */

$tracks = $musicfinder->findTracks('madonna');

/* iterate and do whatever you need with them */

foreach($tracks as $track) {

    print("TRACK IS : " . $track->trackName());

}

REST example (in PHP) version 1 - we’ll assume it returns XML, and we’ll use the verbose (non-wrappered) version of curl

function CallWebService($url) {

    $ch = curl_init($url); $

    response = curl_exec($ch);

    //optionally you can check the response and see what the HTTP Code returned was

    $response_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);

    $curl_close($ch);

    return $response;

}

//build the string to call it $url = "http://someexample.com/webservices/rest/findtracks/" .$artistname;

$resultsxml = CallWebService($url);

//load the dom - skipped here and set up an iteration etc.

REST example (in PHP) version 2 - we’ll assume it returns XML, and we’ll use the file_get_contents cheat method

function CallWebService($url) {

    return file_get_contents($url);  //may be disabled on some ISPs

}

//build the string to call it

$url = "http://someexample.com/webservices/rest/findtracks/" .$artistname;

$resultsxml = CallWebService($url); //load the dom - skipped here and set up an iteration etc.

REST example (in PHP) version 3 - we’ll assume it returns CSV, and we’ll use the file_get_contents cheat method again

function CallWebService($url) {

    return file_get_contents($url);  //may be disabled on some ISPs

}

//build the string to call it

$url = "http://someexample.com/webservices/rest/findtracks?artistname=" .$artistname;

$resultcsv = CallWebService($url);

foreach(split("\n", $resultcsv) as $line){

    print("TRACK: " . $line[ COLUMN_CONTAINING_TRACK_NAME ]);

}

You can see it’s easy to use SOAP, but the REST call is very clear in the server logs, and it’s easy to construct the call and to process the call on the server side (read the query string parameters, do something and return something).

I know of one major company that has provided SOAP interfaces for it’s Web Services, without realising it’s core audience is single developers and small development houses.. almost all using scripting langauges very heavily (and most of them have poor programming skills), as a consequence it’s barely being used.  The company keeps enhancing it’s interfaces to give more capabilties (I can only presume they think the issue is that they’re not providing enough functionality) but the user base just can’t use them.
There’s a reason why Google, Amazon, eBay etc. all provide REST interfaces - hopefully now you can get an idea why.
PS: If you’re trying hard to use PHP with SOAP then i’ll recommend this “Dirty Secrets” document with inside scoop with regards to using PHP with SOAP.   Also, i’d ignore anything but the core SOAP library - they’re poorly maintained (although i’d love to be proved wrong), however sometimes it’s difficult to get the extension installed.  In that case you’ll just have to try other libraries until it works.

Architects need to properly Communicate with Developers: Scale-Up vs. Scale-Out

July 2nd, 2008

This is a problem that unfortunately we so often see.

Architects are employed to ensure that software projects have a reasonable base; solutions are well thought out, robust, and follow industry best practices; and provide a more detailed but higher-level view of application spread (and inter-operation) than a senior developer, project or programme manager could affect.

However, there’s a gap, the architects produce plans and ideas and don’t communicate them effectively to the developers, so new problems arise. Often it then gets worse because the architects blame the developers (and especially the lead developers) for not following the plan, while the developers blame the architects for poorly thought out plans and non-committal/non-understandable language. Out of frustration, I thought we could cover one of those major communication problems today: Scale-Up vs. Scale-Out.

Any developer who’s been in the industry has heard these terms a few times (probably by some people who have used them inter-changeably), however they mean very different things and have different design decisions and costs associated with them.

a. Scale-Up -> To add additional hardware (processing power, memory, faster/bigger disk) to a solution to enable it to scale further

b. Scale-Out -> To use additional hardware nodes (web servers/database servers/application servers) to an existing solution to enable scaling by spreading the workload further. (Note: this also includes strategies where duplicated hardware is put in place and the solution is segmented down functional/data lines so that each set of hardware runs independently of the other). Grid computing is an example of Scale-Out pushed to the maximum.
Scale-Up is usually the first port of call for badly performing applications, but there’s always a high point of what can be done. Even if you started on a single CPU Windows 2003 server and then ported up to a SUN Fire E25K you’ve still hit a ceiling. However, for most projects this makes sense as the user/work loading will never be exponentially larger than it started. So long as there’s sufficient upgrade path (and the application can handle it) then Scale-Up will often be the best in cost-benefit analysis, primarily because one only upgrades (or moves on to new hardware) if it’s strictly necessary.
True Scale-Out requires designing from the start as it’s hard to push a Scale-Out strategy on to existing solutions. If the service is known to be massive from day one then segment your workload like Google/Ebay and co and have multiple servers that all do a minor part of the workload, and other servers that collate that and have responses back to the user. In that way you can always add more hardware to do individual tasks without a major problem. However, for smaller projects this isn’t worth the cost or the stress.

Some architects like to say they’ve got the ultimate scaling strategy: a mix of Scale-Out and Scale-Up, and to an extent some do. However, having your webserver, app server and database server on a single machine, then scaling out to three machines, before beefing up each of those machines further isn’t true Scale-Out (it’s just a modification of the Scale-Up strategy really). For those solutions you have to be extremely careful about issues like marshalling (inefficient data passing between tiers won’t really hurt when you’re on the same machine, however if you have to pass that data across a network, serializing and de-serializing as you go, then that can cause major issues.

Hope that helps explain Scale-Up (aka ScaleUp) and Scale-Out (aka ScaleOut). As ever i’m happy to hear from you with questions etc.

“Open Command Line Here” windows explorer option for Vista

June 20th, 2008

For those of us who have upgraded from XP to Vista and need to run a command line (aka cmd.exe or “open command window here”) you may be a little confused.  Even searching on microsoft.com doesn’t appear to help and the Windows XP Powertoy “Open Command Line Here” isn’t usable for Vista.

Well it’s built into the OS from scratch (with an important caveat).  Simply hold down shift when getting a context menu in the RIGHT HAND panel in windows explorer (that’s correct, you read it properly, it doesn’t work in the left panel!) and hey presto it’s now an option.

If you need to go further and launch the command line as an elevated privilege (assuming you haven’t turned UAC off of course) then Microsoft Technet does have the answer, however it’s not going to be easy and it assumes that you want all of the old sysinternals tools installed (but that’s not a bad thing IMHO anyway).

MySQL - Looking for Table Locks and Killing them if needs be

May 1st, 2008

Quick tip, for MySQL you’ll occasionally get problems where some processes lock a table (for MyISAM tables atleast) and you need to find out what the rogue query is (or just look at what’s being executed on your server at any time).

If so use the command “SHOW PROCESSLIST;” - this will show you the process id vs. the information for that process (what the query/update is and the current status).

Where you see a LOCK at the top and lots of lower entries that are locked on the same entity then you can always KILL that process using the command “KILL process-id;” (e.g. to kill process 1234 then you’d execute “KILL 1234;”)

Hope that helps people

Why adding an index to a DB table isn’t always a good idea

April 29th, 2008

When we’re on site we talk to a lot of people who are struggling with databases and how they work. I always try and help by explaining the basic of how a RDBMS works and how indexes are generally used, however there’s one thing that always requires extra discussion, as it’s contrary to commonsense, why removing indexes can sometimes increase database performance.

In this blog there’s not really room for a full discussion of Database (aka “DB” from now on) problems, however new users of databases (or rather new users of production databases with a reasonable workload) often do the following;

a) Start logging queries that don’t have indexes (or run tuning analysers like those in Microsoft SQL Server’s Management Studio)

b) Add indexes with gusto; each table starts to have 3 or 4 indexes (I won’t use the term indices today…)

c) Notice new and horrible issues, some inserts and updates take longer than before, and some selects take longer than they did before the indexes were added.

Of course some people then repeat (b) and (c) again…

Two universal truths need to be stated;

1) A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup. Instead you should estimate that a DB will only select a single index to use, and that’ll be based off the statistics of the index “selectivity” (aka “granularity” in some quarters).

2) Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t. This does mean that in some instances it’s better to not use an index for a select statement, or even (in some difficult spots) tell the query engine to use a specific index for it’s lookup.

I should also add another “Universal Truth” that’s not quite true (it’s kind of the Newtonian Physics of the DB Tuning world - it’s not quite true but it’s near)

3) Adding an Index has a cost for each insert and update (and when those indexes need to be rebuilt). Yes, it’s not good to have superfluous Indexes but this is often over-stated to a point where people are scared to add indexes, even when they’re sure it’s a good idea.

Right, first we’ll attack UT (Universal Truth) 1;

Universal Truth 1

“A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup”

Databases are created to handle data, and they do it well. They maintain this lead over the competition by being ruthlessly efficient and sticking to sensible rules.

A database will typically (there are exceptions but they’re rare) use a single index to look up data in a table. If there isn’t one then you’ll get a table scan (usually bad bad news if there are many rows - more on that later).

When a DB uses an index it tries to select the best route/index to cut down the number of records it then has to read from the main table and finish off it’s selection (of course if it doesn’t need to select any further from the table that’s great.. however unless the index has all the necessary return columns from the query then it’ll still need to read that table row).

So let’s create a sample table;

CREATE TABLE `propagandr`.`animal` (
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`animalname` VARCHAR(45) NOT NULL,
`animaltypeid` INTEGER UNSIGNED NOT NULL,
`creationusername` VARCHAR(45) NOT NULL,
`creationdate` VARCHAR(45) NOT NULL,
PRIMARY KEY(`id`)
)
/* note: this is an example - if you want to add a few thousand content lines then go for it and you can try the queries out */

Now this has no user defined indexes (it has one - notice the primary key is ‘id’ - that’s an indexed column as that’s the way it’s laid out on the disk within MySQL), so if I say “SELECT id, animalname FROM animal WHERE creationusername = ‘boris’” then we know it’ll do a table scan.

So, let’s add an index for the animaltypeid (as that links to another table)

ALTER TABLE animal ADD INDEX `Index_animaltypeid`(`animaltypeid`);

We then re-run the query, and no surprise it doesn’t get used. So we’ll add an index for the creationusername…

ALTER TABLE animal ADD INDEX `Index_creationusername`(`creationusername`);

Re-running the query we notice it’s now using the index.. winner!

So, let’s change the query to be more “realistic” - “SELECT id, animalname FROM animal WHERE animaltypeid = 3 AND creationusername = ‘boris’”. This should use both indexes right? erm. no. It’ll select the ONE index that it believes is the most selective (so if there is only two people who create records then each row has a 50% chance of being allotted to a single user - not that selective, however there might be 500 animal types).

Let’s assume we really need this query to be lightening fast; we could add a further index to include both columns

ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationusername `(`animaltypeid`, `creationusername`);

Now, i’ve used the business domain specific discussion earlier to establish that animaltypeid is more selective, so i’ve put that first - after all a query just based on animaltypeid can still use that index. But, what for creationusername? Well that’s largely useless and in many cases it makes more sense to do a table scan (if that doesn’t sound right then think hard about what the DB will do - it’ll look at the index, note the selectivity is very poor AND that it needs to go to the table to get the data anyway… so it’ll do a single serial read through the table rather than use the index, compile the animal id’s that match and then look them up and return them).

Hope that helps with part 1… Now on to part 2 (which is far weirder for most folk)

Universal Truth 2

“Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t”

In the above example we used our knowledge of the data to guide which way we should build a key. We sometimes hav to do the same with respect to table data.

Let’s imagine that the above Animal table is constantly being updated (like > 10 per second), so we can’t key the data too heavily, but a query we often run is asking whether we’ve had a tiger added in the last 5 minutes (this is an example bear with me! - other examples could be a wiki update log, a money audit log etc.)

Here we can use some business domain knowledge to help us out. We know that tigers (animaltypeid 3 btw) are added pretty regularly, and that they’re a regular but largely insignificant number of entries (10 entries per sec * 60 secs in a minute * 60 minutes per hour - that’s a fair few non-tigers coming through).

Now we’ve already got an index that addresses the animaltypeid, so we figure that’s a good candidate, so we write the following SQL (if you’re not familiar with the LIMIT keyword it just does what TOP does in SQL Server);

“SELECT * FROM Animal WHERE animaltypeid = 3 /* that’s a tiger remember */ AND creationdate > DATE_ADD ( NOW(), INTERVAL -1 HOUR) LIMIT 1;”

(note: the limit 1 just returns the first one we find as we only want to know IF there was a tiger added in the last hour)

Now, if you analyse the DB you’ll see it uses the animaltypeid index, and you may assume this is good, however look at the timings and work out what it’s doing.

Using that index means it’s gone to the index, retrieved all the ids for animals that are of animaltypeid 3, then it goes to the animal table and scans those rows and if they pass the criteria it copies it out for returning to the caller. Assuming we’ve got a thousand tigers in the DB (and a 100,000 entries) then that’s a lot of Disk IO for not much.

So, we re-write it slightly to only use the table (i’ll skip the index hint that tells the DB not to use that index here but you can add it if you need it, but in a circumstance like this we’d re-write the index so it went creationusername then animaltypeid so this query wouldn’t trigger it but callers using both those criteria still get the benefits)

“SELECT * FROM animal WHERE animaltypeid = 3 and creationdate > DATE_ADD( NOW(), INTERVAL -1 HOUR) ORDER BY id DESC LIMIT 1;”

Here we using a few things, the fact that we know the table is inserted into in order (so we add the order by id DESC to force the DB to look backwards from the current page - we could have said “ORDER BY creationdate DESC” but we know that the id rises as the creationdate does, and as MySQL orders tables by the primary key it makes sense to use that), and also we know that it won’t use the animaltypeid index (as we’ve given an index hint or we’ve removed the index).

Now, this query uses no indexes for the lookup, and table scans are always bad, right? Well, you’ll find in these instances this runs very quickly as the answer is likely to be on the first page of data that’s parsed, and because it’s updated so frequently it’s also a good likelihood that it’s in memory.

So, what about if we just added an index that allows us to do that lookup quicker? Let’s define an index that does that

ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationdate `(`animaltypeid`, `creationdate`);

Now, that seems an ideal situation: the query now uses that index and everybody’s happy? Well not really, that index is going to be appended to all the time and that means the index will be less than optimal, plus we have to write an additional index entry for every insert that we do (or updates if we change the creationdate for some reason??). This of course is part of Universal Truth 3 :)

Adding an index is min-maxing, you need to profile the application performance before and after and get representative results back, otherwise you may as well be applying stuff in the dark.

Hopefully the above has helped some of you. If you’ve any comments then get in touch and let us know.

MySQL - how to create a Stored Procedure

April 29th, 2008

Creating StoredProcs still feels a little like magic at the moment, but it’s incredibly simple really.

Stored Procs (aka “SPs” or “SProcs”) live within the database and allow a number of activities to be chained together. They also represent an abstraction method so you can help de-couple DB and application code.

However, MySQL didn’t have Stored Procs until a year or two ago and because of this they’re not widespread.

The first thing to remember with MySQL Stored Procs is that the mysql engine will get confused if you write the SP and try and apply it using MySQL Query Browser or just using the mysql command line app; it’ll notice semicolons within your SP and try and apply those and it won’t work. Thus you need to change the delimiter before you define the Stored Proc. We do this by making it // instead of ; (a semicolon) using the “delimiter” command, we then swap it back to using semicolons at the end.
Here’s a standard bit of text that you can re-use as a template.

delimiter //
DROP PROCEDURE IF EXISTS xxx//
CREATE PROCEDURE xxx (IN param1 VARCHAR(30), OUT param2 VARCHAR(40))
BEGIN
DECLARE localvar VARCHAR(5);
SET localvar = 'helo';
SET param2 = (SELECT MD5(CONCAT(localvar, param1)) );
/* return the data here using a select statement (incase it's not easy for the caller to use the out param) */
SELECT param2 as md5hashedvarchar;
END;
//
delimiter ;

A well recommended agency for Medical workers in New Zealand and Australia

March 29th, 2008

Extremely off topic here but I’ve been asked to point out to UK readers (that are medically trained) that one of our old colleagues has run off to New Zealand (NZ) using an agency over in NZ that specialises in placing foreign nationals.

So, if you’re interested take a look, it may be a good way to go to the southern hemisphere and enjoy a year or two.

http://antipodeanmedical.com/