Why adding an index to a DB table isn’t always a good idea
When we’re on site we talk to a lot of people who are struggling with databases and how they work. I always try and help by explaining the basic of how a RDBMS works and how indexes are generally used, however there’s one thing that always requires extra discussion, as it’s contrary to commonsense, why removing indexes can sometimes increase database performance.
In this blog there’s not really room for a full discussion of Database (aka “DB” from now on) problems, however new users of databases (or rather new users of production databases with a reasonable workload) often do the following;
a) Start logging queries that don’t have indexes (or run tuning analysers like those in Microsoft SQL Server’s Management Studio)
b) Add indexes with gusto; each table starts to have 3 or 4 indexes (I won’t use the term indices today…)
c) Notice new and horrible issues, some inserts and updates take longer than before, and some selects take longer than they did before the indexes were added.
Of course some people then repeat (b) and (c) again…
Two universal truths need to be stated;
1) A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup. Instead you should estimate that a DB will only select a single index to use, and that’ll be based off the statistics of the index “selectivity” (aka “granularity” in some quarters).
2) Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t. This does mean that in some instances it’s better to not use an index for a select statement, or even (in some difficult spots) tell the query engine to use a specific index for it’s lookup.
I should also add another “Universal Truth” that’s not quite true (it’s kind of the Newtonian Physics of the DB Tuning world - it’s not quite true but it’s near)
3) Adding an Index has a cost for each insert and update (and when those indexes need to be rebuilt). Yes, it’s not good to have superfluous Indexes but this is often over-stated to a point where people are scared to add indexes, even when they’re sure it’s a good idea.
Right, first we’ll attack UT (Universal Truth) 1;
Universal Truth 1
“A database optimiser won’t magically concatenate multiple index data together and use the composite index for a super-quick lookup”
Databases are created to handle data, and they do it well. They maintain this lead over the competition by being ruthlessly efficient and sticking to sensible rules.
A database will typically (there are exceptions but they’re rare) use a single index to look up data in a table. If there isn’t one then you’ll get a table scan (usually bad bad news if there are many rows - more on that later).
When a DB uses an index it tries to select the best route/index to cut down the number of records it then has to read from the main table and finish off it’s selection (of course if it doesn’t need to select any further from the table that’s great.. however unless the index has all the necessary return columns from the query then it’ll still need to read that table row).
So let’s create a sample table;
CREATE TABLE `propagandr`.`animal` (
`id` INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
`animalname` VARCHAR(45) NOT NULL,
`animaltypeid` INTEGER UNSIGNED NOT NULL,
`creationusername` VARCHAR(45) NOT NULL,
`creationdate` VARCHAR(45) NOT NULL,
PRIMARY KEY(`id`)
)
/* note: this is an example - if you want to add a few thousand content lines then go for it and you can try the queries out */
Now this has no user defined indexes (it has one - notice the primary key is ‘id’ - that’s an indexed column as that’s the way it’s laid out on the disk within MySQL), so if I say “SELECT id, animalname FROM animal WHERE creationusername = ‘boris’” then we know it’ll do a table scan.
So, let’s add an index for the animaltypeid (as that links to another table)
ALTER TABLE animal ADD INDEX `Index_animaltypeid`(`animaltypeid`);
We then re-run the query, and no surprise it doesn’t get used. So we’ll add an index for the creationusername…
ALTER TABLE animal ADD INDEX `Index_creationusername`(`creationusername`);
Re-running the query we notice it’s now using the index.. winner!
So, let’s change the query to be more “realistic” - “SELECT id, animalname FROM animal WHERE animaltypeid = 3 AND creationusername = ‘boris’”. This should use both indexes right? erm. no. It’ll select the ONE index that it believes is the most selective (so if there is only two people who create records then each row has a 50% chance of being allotted to a single user - not that selective, however there might be 500 animal types).
Let’s assume we really need this query to be lightening fast; we could add a further index to include both columns
ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationusername `(`animaltypeid`, `creationusername`);
Now, i’ve used the business domain specific discussion earlier to establish that animaltypeid is more selective, so i’ve put that first - after all a query just based on animaltypeid can still use that index. But, what for creationusername? Well that’s largely useless and in many cases it makes more sense to do a table scan (if that doesn’t sound right then think hard about what the DB will do - it’ll look at the index, note the selectivity is very poor AND that it needs to go to the table to get the data anyway… so it’ll do a single serial read through the table rather than use the index, compile the animal id’s that match and then look them up and return them).
Hope that helps with part 1… Now on to part 2 (which is far weirder for most folk)
Universal Truth 2
“Because we have more intelligence than machines (and we know the business domain) we can do things that the machine optimiser can’t”
In the above example we used our knowledge of the data to guide which way we should build a key. We sometimes hav to do the same with respect to table data.
Let’s imagine that the above Animal table is constantly being updated (like > 10 per second), so we can’t key the data too heavily, but a query we often run is asking whether we’ve had a tiger added in the last 5 minutes (this is an example bear with me! - other examples could be a wiki update log, a money audit log etc.)
Here we can use some business domain knowledge to help us out. We know that tigers (animaltypeid 3 btw) are added pretty regularly, and that they’re a regular but largely insignificant number of entries (10 entries per sec * 60 secs in a minute * 60 minutes per hour - that’s a fair few non-tigers coming through).
Now we’ve already got an index that addresses the animaltypeid, so we figure that’s a good candidate, so we write the following SQL (if you’re not familiar with the LIMIT keyword it just does what TOP does in SQL Server);
“SELECT * FROM Animal WHERE animaltypeid = 3 /* that’s a tiger remember */ AND creationdate > DATE_ADD ( NOW(), INTERVAL -1 HOUR) LIMIT 1;”
(note: the limit 1 just returns the first one we find as we only want to know IF there was a tiger added in the last hour)
Now, if you analyse the DB you’ll see it uses the animaltypeid index, and you may assume this is good, however look at the timings and work out what it’s doing.
Using that index means it’s gone to the index, retrieved all the ids for animals that are of animaltypeid 3, then it goes to the animal table and scans those rows and if they pass the criteria it copies it out for returning to the caller. Assuming we’ve got a thousand tigers in the DB (and a 100,000 entries) then that’s a lot of Disk IO for not much.
So, we re-write it slightly to only use the table (i’ll skip the index hint that tells the DB not to use that index here but you can add it if you need it, but in a circumstance like this we’d re-write the index so it went creationusername then animaltypeid so this query wouldn’t trigger it but callers using both those criteria still get the benefits)
“SELECT * FROM animal WHERE animaltypeid = 3 and creationdate > DATE_ADD( NOW(), INTERVAL -1 HOUR) ORDER BY id DESC LIMIT 1;”
Here we using a few things, the fact that we know the table is inserted into in order (so we add the order by id DESC to force the DB to look backwards from the current page - we could have said “ORDER BY creationdate DESC” but we know that the id rises as the creationdate does, and as MySQL orders tables by the primary key it makes sense to use that), and also we know that it won’t use the animaltypeid index (as we’ve given an index hint or we’ve removed the index).
Now, this query uses no indexes for the lookup, and table scans are always bad, right? Well, you’ll find in these instances this runs very quickly as the answer is likely to be on the first page of data that’s parsed, and because it’s updated so frequently it’s also a good likelihood that it’s in memory.
So, what about if we just added an index that allows us to do that lookup quicker? Let’s define an index that does that
ALTER TABLE animal ADD INDEX `Index_animaltypeid_vs_creationdate `(`animaltypeid`, `creationdate`);
Now, that seems an ideal situation: the query now uses that index and everybody’s happy? Well not really, that index is going to be appended to all the time and that means the index will be less than optimal, plus we have to write an additional index entry for every insert that we do (or updates if we change the creationdate for some reason??). This of course is part of Universal Truth 3
Adding an index is min-maxing, you need to profile the application performance before and after and get representative results back, otherwise you may as well be applying stuff in the dark.
Hopefully the above has helped some of you. If you’ve any comments then get in touch and let us know.
July 30th, 2009 at 3:26 pm
Interesting site, but much advertisments on him. Shall read as subscription, rss.
July 31st, 2009 at 2:20 pm
Your blog was of course amazing again even though it was terrifying for me to read what happened to you.You turned what could have been a very bad thing into something funny and positive. Do take care. Also your live radio news broadcast was also amazing.I love you xoxox
August 1st, 2009 at 7:38 am
If you are wondering how you can help with this or future events, please contact us . Also, you can contact other
blogs.codewizards.co.uk - cool!!!!
August 6th, 2009 at 1:04 am
Sorry but I don’t share most of these ideas.
August 6th, 2009 at 11:05 am
I’m glad that after surfing the web for uch a long time I have found out this information. I’m really lucky.
August 6th, 2009 at 8:24 pm
I’m glad that after surfing the web for uch a long time I have found out this information. I’m really lucky.
August 7th, 2009 at 3:42 pm
Ack, my comment didn’t come out the way I’d liked it to.
August 8th, 2009 at 7:21 pm
If you are wondering how you can help with this or future events, please contact us . Also, you can contact other
August 15th, 2009 at 7:02 pm
Hey, if I were currently immersed in writing a book explaining set theory, it would have sounded like that to me too, I’m sure.
September 17th, 2009 at 2:29 pm
ohhohoho~~ it’s very bombardning best.
September 18th, 2009 at 5:01 pm
nice job men:)
September 19th, 2009 at 7:43 pm
Great site. Keep doing
September 20th, 2009 at 4:21 am
It is the coolest site, keep so
September 21st, 2009 at 4:53 pm
Thx
September 23rd, 2009 at 2:16 pm
thank you soooooo much for this
September 24th, 2009 at 11:22 pm
Altogether interesting site. Rely on it determination always be alive!
September 25th, 2009 at 11:51 am
AWESOME! Great job.
September 26th, 2009 at 3:21 am
thx!. This one is great!!!
September 26th, 2009 at 5:53 am
Depressing but I don’t dispensation most of these ideas.
September 26th, 2009 at 12:18 pm
Thank you very much!
Regards!!
September 27th, 2009 at 3:22 am
thanks a lot
September 27th, 2009 at 3:36 pm
Thanks so much! I’ve looked everywhere for this! You’re amazing!
September 27th, 2009 at 5:37 pm
Excellent review! Mint takes Ubuntu and makes it work! After your review, I plan to use it.
September 28th, 2009 at 12:26 pm
Thank you very much, bookmarked this website! thanks!!!!
September 29th, 2009 at 5:56 am
Thanx man =)
September 29th, 2009 at 6:38 am
Спасибо за статью!
September 30th, 2009 at 3:14 am
Totally Awsome!
September 30th, 2009 at 4:02 am
WOW! Good thing you kept your wits enough to fight off the bugger. Glad you’re ok. The worse thing to happen to me on that street was when I encountered a pack of dogs late one night. I kept a safe distance behind them until I got home. Take care.
October 1st, 2009 at 3:17 am
Hi, Congratulations to the site owner for this marvelous work you’ve done. It has lots of useful and interesting data.
October 1st, 2009 at 12:01 pm
This site is the BEST!!!
Thanks a million!
October 2nd, 2009 at 3:18 pm
Thanks again!
October 3rd, 2009 at 5:49 am
Все в жизни бывает впервые, мне комп купили месяц назад - очень хочу начать зарабатывать в интернете , если я могу вам помочь в развитии сайта пока даже фактически бесплатно , то свистните плиз - сам верстку изучаю уже месяц , до php добрался - тоже начинающий совсем , если что пишите на marusev321 злой пес list.ru
October 4th, 2009 at 11:13 pm
И вот долгожданный топик от автора, спасибо как всегда на высоте!
October 7th, 2009 at 3:16 am
Very nice site! cheap viagra
October 7th, 2009 at 3:16 am
Very nice site! cheap cialis http://apxoiey.com/aoxaqx/4.html
October 7th, 2009 at 3:16 am
Very nice site!
October 13th, 2009 at 1:29 am
Привет! ))) вот отлично замучено, то, что хостинг номальный у вас, и сайт не лежит, а то у многих подобных сайтов это проблема, фиг знает из за чего , спасибо вам за ваше существование!!! Надеюсь так и будете продолжать нас радовать!
October 15th, 2009 at 10:06 pm
Все бы было так хорошо, если бы не было так грустно, это так к слову… Эх, научил бы кто нить меня сайты делать - может тоже не хуже вашего бы получилось Отличный у вас проектик, не соскучишься, спасибо
October 19th, 2009 at 2:16 pm
Очень забавные мысли, хорошо рассказано, все просто разложено по полкам
Да уж… Тут как в поговорке: А ларчик просто открывался.:)
October 23rd, 2009 at 1:14 am
Всем надо в отпуск, отпускные можете получить у меня , за сайт отдельное спасибо так держать други
добавьте кстати свой сайт в каталог яндекса, мне кажется должны принять
October 26th, 2009 at 1:00 pm
Не совсем в тему, но вопрос такой возник автору. А почему вы именно этот движок выбрали для своего блога? Имхо, слаед бы лучше был. Я вот ваш блог постоянно читаю
October 28th, 2009 at 12:05 pm
Уметь слушать тишину - значит быть способным услышать бесконечность, как говорил Моцарт, на вашем сайте тишину не послушаешь, но много интересного при этом подчерпнуть можно - за что и спасибо большое, удачного вебмастеринга
October 31st, 2009 at 7:44 am
Hey deeply critical blog!! Handcuffs .. Beautiful .. Amazing .. I determination bookmark your blog and take the feeds also…
blogs.codewizards.co.uk - go to my favorites!!!
November 2nd, 2009 at 3:02 am
Hey very nice blog!! Man .. Beautiful .. Amazing .. I will bookmark your blog and take the feeds also…
November 9th, 2009 at 1:32 am
Привет! ))) вот отлично замучено, то, что хостинг номальный у вас, и сайт не лежит, а то у многих подобных сайтов это проблема, фиг знает из за чего , спасибо вам за ваше существование!!! Надеюсь так и будете продолжать нас радовать!
December 20th, 2009 at 5:01 pm
анекдот в тему:
Поделился друг:Если у России две беды, то у Украины к первым двум добавляется еще и третья беда: хитрожопые идиоты…
January 5th, 2010 at 8:24 pm
Читал про это уже на каком то другом сайте, но у вас гораздо интересней написано
January 10th, 2010 at 5:12 pm
И как это автору не жалко столько времени на написание статей тратить, мы конечно Вам очень благодарны, но вот я лично на такой альтруизм не способен
January 27th, 2010 at 10:18 am
Hopefully the above has helped some of you. If you’ve any comments then get in touch and let us know.
January 27th, 2010 at 10:18 am
Two universal truths need to be stated;
January 27th, 2010 at 10:19 am
(in some difficult spots) tell the query engine to use a specific index for it’s lookup.
January 27th, 2010 at 10:19 am
Because we have more intelligence than machines (and we know the business domain)
January 27th, 2010 at 10:20 am
however unless the index has all the necessary return columns from the query then it’ll still need to read that table row).
January 27th, 2010 at 10:20 am
When we’re on site we talk to a lot of people who are struggling with databases
January 27th, 2010 at 10:20 am
or even (in some difficult spots) tell the query engine to use a specific index for it’s lookup.
January 27th, 2010 at 10:21 am
users of production databases with a reasonable workload) often do the following;
February 10th, 2010 at 12:19 am
Информация занятная. Спасибо. Хотелось бы только апдейтов почаще
March 8th, 2010 at 1:18 am
Thanks for writing, I very much liked