Why adding an index to a DB table isn’t always a good idea

When we’re on site we talk to a lot of people who are struggling with databases and how they work. I always try and help by explaining the basic of how a RDBMS works and how indexes are generally used, however there’s one thing that always requires extra discussion, as it’s contrary to commonsense, why removing indexes can sometimes increase database performance.

In this blog there’s not really room for a full discussion of Database (aka “DB” from now on) problems, however new users of databases (or rather new users of production databases with a reasonable workload) often do the following;

a) Start logging queries that don’t have indexes (or run tuning analysers like those in Microsoft SQL Server’s Management Studio)

b) Add indexes with gusto; each table starts to have 3 or 4 indexes (I won’t use the term indices today…)

c) Notice new and horrible issues, some inserts and updates take longer than before, and some selects take longer than they did before the indexes were added.

Of course some people then repeat (b) and (c) again…

Two universal truths need to be stated;

1) A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup. Instead you should estimate that a DB will only select a single index to use, and that’ll be based off the statistics of the index “selectivity” (aka “granularity” in some quarters).

2) Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t. This does mean that in some instances it’s better to not use an index for a select statement, or even (in some difficult spots) tell the query engine to use a specific index for it’s lookup.

I should also add another “Universal Truth” that’s not quite true (it’s kind of the Newtonian Physics of the DB Tuning world - it’s not quite true but it’s near)

3) Adding an Index has a cost for each insert and update (and when those indexes need to be rebuilt). Yes, it’s not good to have superfluous Indexes but this is often over-stated to a point where people are scared to add indexes, even when they’re sure it’s a good idea.

Right, first we’ll attack UT (Universal Truth) 1;

Universal Truth 1

“A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup”

Databases are created to handle data, and they do it well. They maintain this lead over the competition by being ruthlessly efficient and sticking to sensible rules.

A database will typically (there are exceptions but they’re rare) use a single index to look up data in a table. If there isn’t one then you’ll get a table scan (usually bad bad news if there are many rows - more on that later).

When a DB uses an index it tries to select the best route/index to cut down the number of records it then has to read from the main table and finish off it’s selection (of course if it doesn’t need to select any further from the table that’s great.. however unless the index has all the necessary return columns from the query then it’ll still need to read that table row).

So let’s create a sample table;

CREATE TABLE `propagandr`.`animal` (
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`animalname` VARCHAR(45) NOT NULL,
`animaltypeid` INTEGER UNSIGNED NOT NULL,
`creationusername` VARCHAR(45) NOT NULL,
`creationdate` VARCHAR(45) NOT NULL,
PRIMARY KEY(`id`)
)
/* note: this is an example - if you want to add a few thousand content lines then go for it and you can try the queries out */

Now this has no user defined indexes (it has one - notice the primary key is ‘id’ - that’s an indexed column as that’s the way it’s laid out on the disk within MySQL), so if I say “SELECT id, animalname FROM animal WHERE creationusername = ‘boris’” then we know it’ll do a table scan.

So, let’s add an index for the animaltypeid (as that links to another table)

ALTER TABLE animal ADD INDEX `Index_animaltypeid`(`animaltypeid`);

We then re-run the query, and no surprise it doesn’t get used. So we’ll add an index for the creationusername…

ALTER TABLE animal ADD INDEX `Index_creationusername`(`creationusername`);

Re-running the query we notice it’s now using the index.. winner!

So, let’s change the query to be more “realistic” - “SELECT id, animalname FROM animal WHERE animaltypeid = 3 AND creationusername = ‘boris’”. This should use both indexes right? erm. no. It’ll select the ONE index that it believes is the most selective (so if there is only two people who create records then each row has a 50% chance of being allotted to a single user - not that selective, however there might be 500 animal types).

Let’s assume we really need this query to be lightening fast; we could add a further index to include both columns

ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationusername `(`animaltypeid`, `creationusername`);

Now, i’ve used the business domain specific discussion earlier to establish that animaltypeid is more selective, so i’ve put that first - after all a query just based on animaltypeid can still use that index. But, what for creationusername? Well that’s largely useless and in many cases it makes more sense to do a table scan (if that doesn’t sound right then think hard about what the DB will do - it’ll look at the index, note the selectivity is very poor AND that it needs to go to the table to get the data anyway… so it’ll do a single serial read through the table rather than use the index, compile the animal id’s that match and then look them up and return them).

Hope that helps with part 1… Now on to part 2 (which is far weirder for most folk)

Universal Truth 2

“Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t”

In the above example we used our knowledge of the data to guide which way we should build a key. We sometimes hav to do the same with respect to table data.

Let’s imagine that the above Animal table is constantly being updated (like > 10 per second), so we can’t key the data too heavily, but a query we often run is asking whether we’ve had a tiger added in the last 5 minutes (this is an example bear with me! - other examples could be a wiki update log, a money audit log etc.)

Here we can use some business domain knowledge to help us out. We know that tigers (animaltypeid 3 btw) are added pretty regularly, and that they’re a regular but largely insignificant number of entries (10 entries per sec * 60 secs in a minute * 60 minutes per hour - that’s a fair few non-tigers coming through).

Now we’ve already got an index that addresses the animaltypeid, so we figure that’s a good candidate, so we write the following SQL (if you’re not familiar with the LIMIT keyword it just does what TOP does in SQL Server);

“SELECT * FROM Animal WHERE animaltypeid = 3 /* that’s a tiger remember */ AND creationdate > DATE_ADD ( NOW(), INTERVAL -1 HOUR) LIMIT 1;”

(note: the limit 1 just returns the first one we find as we only want to know IF there was a tiger added in the last hour)

Now, if you analyse the DB you’ll see it uses the animaltypeid index, and you may assume this is good, however look at the timings and work out what it’s doing.

Using that index means it’s gone to the index, retrieved all the ids for animals that are of animaltypeid 3, then it goes to the animal table and scans those rows and if they pass the criteria it copies it out for returning to the caller. Assuming we’ve got a thousand tigers in the DB (and a 100,000 entries) then that’s a lot of Disk IO for not much.

So, we re-write it slightly to only use the table (i’ll skip the index hint that tells the DB not to use that index here but you can add it if you need it, but in a circumstance like this we’d re-write the index so it went creationusername then animaltypeid so this query wouldn’t trigger it but callers using both those criteria still get the benefits)

“SELECT * FROM animal WHERE animaltypeid = 3 and creationdate > DATE_ADD( NOW(), INTERVAL -1 HOUR) ORDER BY id DESC LIMIT 1;”

Here we using a few things, the fact that we know the table is inserted into in order (so we add the order by id DESC to force the DB to look backwards from the current page - we could have said “ORDER BY creationdate DESC” but we know that the id rises as the creationdate does, and as MySQL orders tables by the primary key it makes sense to use that), and also we know that it won’t use the animaltypeid index (as we’ve given an index hint or we’ve removed the index).

Now, this query uses no indexes for the lookup, and table scans are always bad, right? Well, you’ll find in these instances this runs very quickly as the answer is likely to be on the first page of data that’s parsed, and because it’s updated so frequently it’s also a good likelihood that it’s in memory.

So, what about if we just added an index that allows us to do that lookup quicker? Let’s define an index that does that

ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationdate `(`animaltypeid`, `creationdate`);

Now, that seems an ideal situation: the query now uses that index and everybody’s happy? Well not really, that index is going to be appended to all the time and that means the index will be less than optimal, plus we have to write an additional index entry for every insert that we do (or updates if we change the creationdate for some reason??). This of course is part of Universal Truth 3 :)

Adding an index is min-maxing, you need to profile the application performance before and after and get representative results back, otherwise you may as well be applying stuff in the dark.

Hopefully the above has helped some of you. If you’ve any comments then get in touch and let us know.

58 Responses to “Why adding an index to a DB table isn’t always a good idea”

  1. SergeyNikolaev Says:

    Interesting site, but much advertisments on him. Shall read as subscription, rss.

  2. electromozzo Says:

    Your blog was of course amazing again even though it was terrifying for me to read what happened to you.You turned what could have been a very bad thing into something funny and positive. Do take care. Also your live radio news broadcast was also amazing.I love you xoxox

  3. MishaPowerauto Says:

    If you are wondering how you can help with this or future events, please contact us . Also, you can contact other
    blogs.codewizards.co.uk - cool!!!!

  4. Zashkaser Says:

    Sorry but I don’t share most of these ideas.

  5. LenaShopogolik Says:

    I’m glad that after surfing the web for uch a long time I have found out this information. I’m really lucky.

  6. Sdanektir Says:

    I’m glad that after surfing the web for uch a long time I have found out this information. I’m really lucky.

  7. Vivalkakira Says:

    Ack, my comment didn’t come out the way I’d liked it to.

  8. VitalikGromovss Says:

    If you are wondering how you can help with this or future events, please contact us . Also, you can contact other

  9. dimagromovfoto Says:

    Hey, if I were currently immersed in writing a book explaining set theory, it would have sounded like that to me too, I’m sure.

  10. cartoon Says:

    ohhohoho~~ it’s very bombardning best.

  11. dadies Says:

    nice job men:)

  12. bdsm Says:

    Great site. Keep doing

  13. indian Says:

    It is the coolest site, keep so

  14. femdom Says:

    Thx

  15. feetblog Says:

    thank you soooooo much for this

  16. doctorbiml Says:

    Altogether interesting site. Rely on it determination always be alive!

  17. shemales Says:

    AWESOME! Great job.

  18. teen Says:

    thx!. This one is great!!!

  19. blondinkaya Says:

    Depressing but I don’t dispensation most of these ideas.

  20. schoolgirl Says:

    Thank you very much!
    Regards!!

  21. movies Says:

    thanks a lot

  22. hotmom Says:

    Thanks so much! I’ve looked everywhere for this! You’re amazing!

  23. yapapanyatt Says:

    Excellent review! Mint takes Ubuntu and makes it work! After your review, I plan to use it.

  24. dadboy Says:

    Thank you very much, bookmarked this website! thanks!!!!

  25. twink Says:

    Thanx man =)

  26. loputreyu Says:

    Спасибо за статью!

  27. mature Says:

    Totally Awsome!

  28. roowlsabCroto Says:

    WOW! Good thing you kept your wits enough to fight off the bugger. Glad you’re ok. The worse thing to happen to me on that street was when I encountered a pack of dogs late one night. I kept a safe distance behind them until I got home. Take care.

  29. roowlsabCroto Says:

    Hi, Congratulations to the site owner for this marvelous work you’ve done. It has lots of useful and interesting data.

  30. japan Says:

    This site is the BEST!!!
    Thanks a million!

  31. spanking Says:

    Thanks again!

  32. golYterW Says:

    Все в жизни бывает впервые, мне комп купили месяц назад - очень хочу начать зарабатывать в интернете , если я могу вам помочь в развитии сайта пока даже фактически бесплатно , то свистните плиз - сам верстку изучаю уже месяц , до php добрался - тоже начинающий совсем , если что пишите на marusev321 злой пес list.ru

  33. Kolyewe Says:

    И вот долгожданный топик от автора, спасибо как всегда на высоте!

  34. Pharma852 Says:

    Very nice site! cheap viagra

  35. Pharma74 Says:

    Very nice site! cheap cialis http://apxoiey.com/aoxaqx/4.html

  36. Pharmf115 Says:

    Very nice site!

  37. lokyrte Says:

    Привет! ))) вот отлично замучено, то, что хостинг номальный у вас, и сайт не лежит, а то у многих подобных сайтов это проблема, фиг знает из за чего , спасибо вам за ваше существование!!! Надеюсь так и будете продолжать нас радовать!

  38. hjeryew Says:

    Все бы было так хорошо, если бы не было так грустно, это так к слову… Эх, научил бы кто нить меня сайты делать - может тоже не хуже вашего бы получилось Отличный у вас проектик, не соскучишься, спасибо

  39. Polytrew Says:

    Очень забавные мысли, хорошо рассказано, все просто разложено по полкам :) Да уж… Тут как в поговорке: А ларчик просто открывался.:)

  40. hotyuew Says:

    Всем надо в отпуск, отпускные можете получить у меня , за сайт отдельное спасибо так держать други :) добавьте кстати свой сайт в каталог яндекса, мне кажется должны принять

  41. hotyuewr Says:

    Не совсем в тему, но вопрос такой возник автору. А почему вы именно этот движок выбрали для своего блога? Имхо, слаед бы лучше был. Я вот ваш блог постоянно читаю :)

  42. hjkrteio Says:

    Уметь слушать тишину - значит быть способным услышать бесконечность, как говорил Моцарт, на вашем сайте тишину не послушаешь, но много интересного при этом подчерпнуть можно - за что и спасибо большое, удачного вебмастеринга :)

  43. homemastergoga Says:

    Hey deeply critical blog!! Handcuffs .. Beautiful .. Amazing .. I determination bookmark your blog and take the feeds also…
    blogs.codewizards.co.uk - go to my favorites!!!

  44. roowlsabCroto Says:

    Hey very nice blog!! Man .. Beautiful .. Amazing .. I will bookmark your blog and take the feeds also…

  45. Hoilortewyu Says:

    Привет! ))) вот отлично замучено, то, что хостинг номальный у вас, и сайт не лежит, а то у многих подобных сайтов это проблема, фиг знает из за чего , спасибо вам за ваше существование!!! Надеюсь так и будете продолжать нас радовать!

  46. hjkrteioyu Says:

    анекдот в тему:
    Поделился друг:Если у России две беды, то у Украины к первым двум добавляется еще и третья беда: хитрожопые идиоты…

  47. мишкa Says:

    Читал про это уже на каком то другом сайте, но у вас гораздо интересней написано ;)

  48. BabanQurban Says:

    И как это автору не жалко столько времени на написание статей тратить, мы конечно Вам очень благодарны, но вот я лично на такой альтруизм не способен :)

  49. Hloe bussines Says:

    Hopefully the above has helped some of you. If you’ve any comments then get in touch and let us know.

  50. Marry Horses Says:

    Two universal truths need to be stated;

  51. hloe for nature Says:

    (in some difficult spots) tell the query engine to use a specific index for it’s lookup.

  52. Cookingindahouse Says:

    Because we have more intelligence than machines (and we know the business domain)

  53. Joe fact Says:

    however unless the index has all the necessary return columns from the query then it’ll still need to read that table row).

  54. Recipesofcook Says:

    When we’re on site we talk to a lot of people who are struggling with databases

  55. Joe SEO Says:

    or even (in some difficult spots) tell the query engine to use a specific index for it’s lookup.

  56. Joe build Says:

    users of production databases with a reasonable workload) often do the following;

  57. Balashka Says:

    Информация занятная. Спасибо. Хотелось бы только апдейтов почаще

  58. roowlsabCroto Says:

    Thanks for writing, I very much liked

Leave a Reply