Database indexes and their Big-O notation

Most relational databases structure indices as B-trees.

If a table has a clustering index, the data pages are stored as the leaf nodes of the B-tree. Essentially, the clustering index becomes the table.

For tables w/o a clustering index, the data pages of the table are stored in a heap. Any non-clustered indices are B-trees where the leaf node of the B-tree identifies a particular page in the heap.

The worst case height of a B-tree is O(log n), and since a search is dependent on height, B-tree lookups run in something like (on the average)

O(logt n)

where t is the minimization factor ( each node must have at least t-1 keys and at most 2*t* -1 keys (e.g., 2*t* children).

That’s the way I understand it.

And different database systems, of course, may well use different data structures under the hood.

And if the query does not use an index, of course, then the search is an iteration over the heap or B-tree containing the data pages.

Searches are a little cheaper if the index used can satisfy the query; otherwise, a lookaside to fetch the corresponding datapage in memory is required.

Leave a Comment