Does mysql use B-tree,B+tree or both?

Posted on

Question :

I did some search on the matter and I found out that Mysql uses B+Tree index, but when I run “show index” the index type that I get is Btree. And I found in this article that Mysql uses both Btree and B+tree. If it is true that it uses both; why is it named Btree without mentioning B+tree, in which case each one is used. I know the difference between the two, and I want to do some queries to figure out the difference in performance between B-tree and B+tree indexes. Which leads to my second question, is there a big difference between the two in performing certain queries, if yes, please give an example.
Thank you in advance.

Answer :

InnoDB uses B+Tree indexes, not B-Tree. All details about InnoDB data structures can be found here. You may also want to look at these diagrams. The author of both the resources, Jeremy Cole, was head of MySQL team at Google.

Why is the syntax BTREE instead of B+TREE? This question should be posed to some MySQL or MariaDB engineer, but I see at least two possible reasons:

  • B+TREE would be a very bad keyword, because it contains +, which is usually an operator.
  • That syntax is older than InnoDB. It is probably as old as the ISAM storage engine, which exists no more. It is very possible that B-TREE was used at that time.

Why does the documentation state that InnoDB uses B-Tree? Well, not all MySQL users are supposed to know what B+Tree is. This may be an oversimplification, but in that context it seems to me acceptable.

You wrote that you know the difference between B-Tree and B+Tree. Than the different performance characteristics should be clear:

  • B+Tree is faster for sorting;
  • B-Tree is faster when you insert values in the middle.

But in general, B+Tree is considered superior. How much? I don’t know, but surely not orders of magnitude.

A B+Tree is a just a binary search tree, like a B-Tree, where,

  • The leaves (buckets) have links to the right and left siblings buckets), making the tree an index into a linked list. Usually each bucket is sized to be one disk read.
  • Data is only stored in the leaf.

A B-Tree for reference stores data in the nodes, and leaves, and has no such link because scanning requires backtracking.

The idea of a B+Tree is to maximize read-size for disk seeks. It’s unlikely that any database that implements a B+Tree would use B-Tree, unless they’re only ever interested in Index Seeks, and not Index Scans. The overhead of the link isn’t substantial. Not to mention, in the method of B+ tree gives rise to any model of concurrency, see the Lehman and Yao.

It’s important that all of these differences are mere optimizations on the same idea. For instance, that Lehman and Yao paper above references B- Trees in the abstract and then mistakenly (imho), says “we consider a simple variant of the B-tree (actually of the B*-tree, proposed by Wedekind)”. That’s weird, because I think Wedekind proposed a B+Tree.

These terms are grossly confusing. Check out this “The Ubiquitous B-Tree” published in 1979 if you really want a waste a day,

Perhaps the most misused term term in the B-Tree literature is the B*-Tree.

Leave a Reply

Your email address will not be published. Required fields are marked *