Programming: Trees in data Structures

A tree consists of nodes connected by edges. Figure below shows a tree. In such a picture of a tree (or in our Workshop applet) the nodes are represented as circles, and the edges as lines connecting the circles.

Trees have been studied extensively as abstract mathematical entities, so there’s a large amount of theoretical knowledge about them.

In computer programs, nodes often represent data items such as people, car parts, airline reservations, and so on; in other words, the typical items we store in any kind of data structure. In an OOP language like C++ these real-world entities are represented by objects. We’ve seen such data items stored in arrays and lists; now we’ll see them stored in the nodes of trees.

The lines (edges) between the nodes represent the way the nodes are related. Roughly speaking, the lines represent convenience: It’s easy (and fast) for a program to get from one node to another if there is a line connecting them. In fact, the only way to get from node to node is to follow a path along the lines. Often you are restricted to going in one direction along edges: from the root downward. Edges are likely to be represented in a program by pointers, if the program is written in C++.

Typically there is one node in the top row of a tree, with lines connecting to more nodes on the second row, even more on the third, and so on. Thus trees are small on the top and large on the bottom. This might seem upside-down compared with real trees, but generally a program starts an operation at the small end of the tree, and it’s (arguably) more natural to think about going from top to bottom, as in reading text.

There are different kinds of trees. We’ll be mostly interested in a particular kind of tree called a binary tree. The tree shown in Figure has more than two children per node. (We’ll see what children means in a moment.) Each node in a binary tree has a maximum of two children. More general trees, in which nodes can have more than two children, are called multiway trees. We’ve mentioned some aspects of trees in general; now let’s look at terms for various parts of trees.

Tree Terminology

Many terms are used to describe particular aspects of trees. You need to know a few of them so our discussion will be comprehensible. Fortunately, most of these terms are related to real-world trees or to family relationships (as in parents and children), so they’re not hard to remember. Figure below shows many of these terms applied to a binary tree.

Path

Think of someone walking from node to node along the edges that connect them. The resulting sequence of nodes is called a path.

Root

The node at the top of the tree is called the root. There is only one root in a tree. For a collection of nodes and edges to be defined as a tree, there must be one (and only one!) path from the root to any other node. Figure below shows a non-tree. You can see that it violates this rule.

Parent

Any node (except the root) has exactly one edge running upward to another node. The node above it is called the parent of the node.

NEW TERM

Child

Any node can have one or more lines running downward to other nodes. These nodes below a given node are called its children.

Leaf

A node that has no children is called a leaf node or simply a leaf. There can be only one root in a tree, but there can be many leaves.

Sub tree

Any node can be considered to be the root of a subtree, which consists of its children, and its children’s children, and so on. If you think in terms of families, a node’s subtree contains all its descendants.

Visiting

A node is visited when program control arrives at the node, usually for the purpose of carrying out some operation on the node, such as checking the value of one of its data members, or displaying it. Merely passing over a node on the path from one node to another is not considered to be visiting the node.

Traversing

To traverse a tree means to visit all the nodes in some specified order. For example, you might visit all the nodes in order of ascending key value. There are other ways to traverse a tree, as we’ll see in the next hour.

Levels

The level of a particular node refers to how many generations the node is from the root. If we assume the root is Level 0, its children will be Level 1, its grandchildren will be Level 2, and so on.

Keys

We’ve seen that one data item in an object is usually designated a key value. This value is used to search for the item or perform other operations on it. In tree diagrams, when a circle represents a node holding a data item, the key value of the item is typically shown in the circle. (We’ll see many figures later on that show how this looks.)

Binary Trees

If every node in a tree can have at most two children, the tree is called a binary tree. In this hour we’ll focus on binary trees because they are the simplest, the most common, and in many situations the most frequently used.

The two children of each node in a binary tree are called the left child and the right child, corresponding to their positions when you draw a picture of a tree, as shown in Figure. A node in a binary tree doesn’t necessarily have the maximum of two children; it might have only a left child, or only a right child, or it can have no children at all (which means it’s a leaf). The kind of binary tree we’ll be dealing with in this discussion is technically called a binary search tree. The defining characteristic of a binary search tree is this: A node’s left child must have a key less than its parent, and a node’s right child must have a key greater than or equal to its parent. Figure below shows a binary search tree.

C++ Program to Traverse a Binary tree

Trees in data Structures

Madurai