Index for multiple columns in ActiveRecord

You are comparing a composite index with a set of independent indices. They are just different.

Think of it this way: a compound index gives you rapid look-up of the first field in a nested set of fields followed by rapid look-up of the second field within ONLY the records already selected by the first field, followed by rapid look-up of the third field – again, only within the records selected by the previous two indices.

Lets take an example. Your database engine will take no more than 20 steps to locate a unique value within 1,000,000 records (if memory serves) if you are using an index. This is true whether you are using a composite or and independent index – but ONLY for the first field (“species” in your example although I’d think you’d want Family, Species, and then Common Name).

Now, let’s say that there are 100,000 matching records for this first field value. If you have only single indices, then any lookup within these records will take 100,000 steps: one for each record retrieved by the first index. This is because the second index will not be used (in most databases – this is a bit of a simplification) and a brute force match must be used.

If you have a composite index then your search is much faster because your second field search will have an index within the first set of values. In this case you’ll need no more than 17 steps to get to your first matching value on field 2 within the 100,000 matches on field 1 (log base 2 of 100,000).

So: steps needed to find a unique record out of a database of 1,000,000 records using a composite index on 3 nested fields where the first retrieves 100,000 and the second retrieves 10,000 = 20 + 17 + 14 = 51 steps.

Steps needed under the same conditions with just independent indices = 20 + 100,000 + 10,000 = 110,020 steps.

Big difference, eh?

Now, don’t go nuts putting composite indices everywhere. First, they are expensive on inserts and updates. Second, they are only brought to bear if you are truly searching across nested data (for another example, I use them when pulling data for logins for a client over a given date range). Also, they are not worth it if you are working with relatively small data sets.

Finally, check your database documentation. Databases have grown extremely sophisticated in the ability to deploy indices these days and the Database 101 scenario I described above may not hold for some (although I always develop as if it does just so I know what I am getting).

Leave a Comment