Does it make sense to use an index that will have a low cardinality?

An index can help even on low cardinality fields if:

  1. When one of possible values is very infrequent compared to the other values and you search for it.

    For instance, there are very few color blind women, so this query:

    SELECT  *
    FROM    color_blind_people
    WHERE   gender="F"
    

    would most probably benefit from an index on gender.

  2. When the values tend to be grouped in the table order:

    SELECT  *
    FROM    records_from_2008
    WHERE   year = 2010
    LIMIT 1
    

    Though there are only 3 distinct years here, records with earlier years are most probably added first so very many records would have to be scanned prior to returning the first 2010 record if not for the index.

  3. When you need ORDER BY / LIMIT:

    SELECT  *
    FROM    people
    ORDER BY
            gender, id
    LIMIT 1
    

    Without the index, a filesort would be required. Though it’s somewhat optimized do to the LIMIT, it would still need a full table scan.

  4. When the index covers all fields used in the query:

    CREATE INDEX (low_cardinality_record, value)
    
    SELECT  SUM(value)
    FROM    mytable
    WHERE   low_cardinality_record = 3
    
  5. When you need DISTINCT:

    SELECT  DISTINCT color
    FROM    tshirts
    

    MySQL will use INDEX FOR GROUP-BY, and if you have few colors, this query will be instant even with millions of records.

    This is an example of a scenario when the index on a low cardinality field is more efficient than that on a high cardinality field.

Note that if DML performance is not much on an issue, then it’s safe to create the index.

If optimizer thinks that the index is inefficient, the index just will not be used.

Leave a Comment