Why is a red and black tree needed

Why is std :: map implemented as a red-black tree?


Why is it implemented as a red and black tree?

There are several balanced binary search trees (BSTs). What were design compromises when choosing a red and black tree?






Reply:


Probably the two most common self-balancing tree algorithms are red-black trees and AVL trees. To rebalance the tree after inserting / updating, both algorithms use the notion of rotations, which involves rotating the nodes of the tree to perform the rebalancing.

While in both algorithms the insert / delete operations are O (log n), in the case of a red-black tree rebalance rotation it is an O (1) operation, while in AVL this is an O (log n) operation, whereby the red-black tree is more efficient at this aspect of the rebalancing phase and one of the possible reasons why it is used more frequently.

Red and black trees are used in most collection libraries, including offerings from Java and the Microsoft .NET Framework.







It really depends on the use. The AVL tree usually has more balancing rotations. If your application doesn't have too many inserts and deletes, but it is busy searching, the AVL tree is probably a good choice.

Uses the red-black tree as a fair tradeoff is made between the speed of adding / deleting nodes and searching.




AVL trees have a maximum height of 1.44 logs, while RB trees have a maximum height of 2 logs. The insertion of an element into an AVL can mean a realignment at a point in the tree. The realignment ends the insertion. After inserting a new leaf, the ancestors of this leaf must be updated to the root or to a point where the two subtrees are equally deep. The probability of having to update k nodes is 1/3 ^ k. Balancing is O (1). Removing an element can mean more than one realignment (up to halfway down the tree).

RB trees are B-trees of order 4, which are represented as binary search trees. A 4-node in the B-tree leads to two levels in the equivalent BST. In the worst case, all of the nodes in the tree are 2 nodes with just a chain of 3 nodes up to a leaf. This leaf is at a distance of 2 logs from the root.

Going down from the root to the insertion point, you have to change 4 nodes to 2 nodes to ensure that an insertion doesn't saturate a leaf. Once inserted, all of these nodes need to be analyzed to ensure that they represent 4 nodes correctly. This can also be done in the tree. The global cost is the same. There is no free lunch! Removing an element from the tree is done in the same order.

All of these trees require nodes to have information about size, weight, color, and so on. Only splay trees are free from such additional information. But most people are afraid of splay trees because their structure is so spacious!

Finally, trees can also carry weight information in the nodes, which enables weight compensation. Various schemes can be used. You should rebalance when a subtree contains more than three times the number of elements in the other subtree. Balancing is done again by either single or double rotation. This means a worst case of 2.4 logs. You can get away with 2 instead of 3, a much better ratio, but it can mean that here and there a little less than 1% of the subtrees are out of whack. Tricky!

Which tree species is the best? AVL sure. They are the easiest to code and have their worst height that is closest to the log. For a tree with 1,000,000 elements, an AVL has a maximum height of 29, an RB 40 and a weight that, depending on the ratio, is between 36 and 50.

There are many other variables: randomness, ratio of add, delete, find, etc.






The previous answers only deal with tree alternatives and red-black is likely only left for historical reasons.

Why not a hash table?

For a type, only the operator (comparison) has to be used as a key in a tree. For hash tables, however, a function must be defined for each key type. For generic programming, it is very important to keep the type requirements to a minimum so that you can use them with a wide variety of types and algorithms.

Designing a good hash table requires a thorough understanding of the context in which it will be used. Should it use open addressing or linked chaining? What levels of stress should it accept before resizing? Should it use an expensive hash that avoids collisions or one that is rough and fast?

Since the STL cannot predict which is the best choice for your application, the default setting needs to be more flexible. Trees "just work" and scale well.

(C ++ 11 has added hash tables. You can see from the documentation that policies must be set to configure many of these options.)

What about other trees?

Red-black trees offer quick search and, unlike BSTs, are self-balancing. Another user pointed out its advantages over the self-balancing AVL tree.

Alexander Stepanov (the creator of STL) said that he would use a B * tree instead of a red-black tree if he wrote again as it is more friendly to modern memory caches.

One of the biggest changes since then has been the growth of caches. Cache errors are very costly, so the reference location is now much more important. Node-based data structures with a low reference locality are much less useful. If I were to design STL today, I would have a different set of containers. For example, an in-memory B * tree is a far better choice than a red and black tree for implementing an associative container. - Alexander Stepanov

Should maps always use trees?

Another possible map implementation would be a sorted vector (insert sort) and binary search. This works well for containers that are not changed often but queried frequently. I do this often in C as and I'm built in.

Do I even have to use a card?

Cache considerations mean it seldom makes meaningful use or over even for those situations we had learned in school (such as removing an item from the middle of the list). Using the same reasoning, using a for loop to search linearly through a list is often more efficient and cleaner than creating a map for some searches.

Of course, choosing a readable container is usually more important than performance.


Update 06/14/2017: webbertiger is editing his answer after I submitted a comment. I should point out that the answer is much better for my eyes now. But I only kept my answer as additional information ...

Due to the fact that I think the first answer is wrong (Correction: no longer both) and the third has a wrong confirmation. I feel like I had to sort things out ...

The 2 most popular trees are AVL and Red Black (RB). The main difference is in the usage:

  • AVL: Better if the ratio of consultation (reading) is greater than manipulation (modification). The memory requirement is slightly less than with RB (due to the bit required for coloring).
  • RB: Better in general cases where there is a balance between consultation (reading) and manipulation (modification) or more modification versus consultation. A slightly larger memory requirement due to the storage of the red and black flag.

The main difference is in the coloring. You have fewer rebalancing actions in the RB tree than in AVL, as the coloring sometimes allows you to skip or shorten rebalancing actions that are relatively expensive. Because of the coloring, the RB tree also has a higher node level as it can accommodate red nodes between black nodes (with the possibility of ~ 2x more levels) which makes searching (reading) a little less efficient ... but because it has a is constant (2x), it stays in O (log n).

If you consider the achievement hit for a change of a tree (significant) compared to the achievement hit for the consultation of a tree (almost insignificant), it goes without saying for a general case to prefer RB over AVL.


It's just your implementation choice - they can be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore everyone is as good as everyone.

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.