Data Structures & Algorithms

Chapter 1 Introduction

1.1 - Overview
1.2 - How to Create Programs
1.3 - How to Analyze Programs

1.1 Overview

Computer Science can be defined as the study of algorithms. This study encompasses four distinct areas:

Machines for executing algorithms.
Languages for describing algorithms.
Foundations of algorithms.
Analysis of algorithms.

It can be easily seen that "algorithm" is a fundamental notion in computer science. So it deserves a precise definition. The dictionary's definition, "any mechanical or recursive computational procedure," is not entirely satisfying since these terms are not basic enough.
Definition: An algorithm is a finite set of instructions which, if followed, accomplish a particular task. In addition every algorithm must satisfy the following criteria:

input: there are zero or more quantities which are externally supplied;
output: at least one quantity is produced;
definiteness: each instruction must be clear and unambiguous;
finiteness: if we trace out the instructions of an algorithm, then for all cases the algorithm will terminate after a finite number of steps;
effectiveness: every instruction must be sufficiently basic that it can in principle be carried out by a person using only pencil and paper. It is not enough that each operation be definite, but it must also be feasible.

In formal computer science, one distinguishes between an algorithm, and a program. A program does not necessarily satisy the fourth condition. One important example of such a program for a computer is its operating system which never terminates (except for system crashes) but continues in a wait loop until more jobs are entered.
An algorithm can be described in several ways. One way is to give it a graphical form of notation such as. This form places each processing step in a "box" and uses arrows to indicate the next step. Different shaped boxes stand for different kinds of operations.
Our first definition places great emphasis on the concept of algorithm, but never mentions the word "data". If a computer is merely a means to an end, then the means may be an algorithm but the end is the transformation of data. That is why we often hear a computer referred to as a data processing machine. Raw data is input and algorithms are used to transform it into refined data. So, instead of saying that computer science is the study of algorithms, alternatively, we might say that computer science is the study of data.

1.2 How to Create Programs
For an improvement in creating programs, some discipline should be applied. Creating a program can be processed better with the five phases stated below:

Requirements: Make sure you understand the information you are given (the input) and what results you are to produce (the output). Try to write down a rigorous description of the input and output which covers all cases.
Design: You may have several data objects (such as a maze, a polynomial, or a list of names). For each object there will be some basic operations to perform on it (such as print the maze, add two polynomials, or find a name in the list.) Assume that these operations already exist in the form of procedures and write an algorithm which solves the problem according to the requirements. Use a notation which is natural to the way you wish to describe the order of processing.
Analysis: If you can think of another algorithm, then write it down. Next try to compare the two algorithms you have in hand. It may already be possible to tell if one will be more desirable than the other. If you cannot distinguish between two, choose one to work on for now.
Refinement and coding: You must now choose representations for your data objects (a maze as a two dimensional array of degree and coefficients, a list of names possibly as an array) and write algorithms for each of the operations on these objects.
Verification: Verification consists of three distinct aspects: program proving, testing and debugging. Each of these is an art in itself. Before executing your program you should attempt tp prove it is correct. Testing is the art of creating sample data upon which to run your program. If the program fails to run correctly then debugging is needed to determine what went wrong and how to correct it.

Suppose we devise a program for sorting a set of n>=1 integers. One of the simplest solutions is given by the following:
"from those integers which remain unsorted, find the smallest and place it next in the sorted list."
This statement is sufficient to construct a sorting program. However several issues are not fully specified such as where and how the integers are initially stored and where result is to be placed. One solution is to store the values in an array in such a way that the i-th integer is stored in the i-th array position, a[i] 1<=i<=n. Below is a second refinement of the solution:
for i:=1 to n do
begin
examine a[i] to a[n] and suppose the smallest integer is at a[j];
interchange a[i] and a[j];
end;
There now remain two clearly defined subtasks: (i) to find the minimum integer and (ii) to interchange it with a[i]. This latter problem can be solved by the code

t:=a[i];    a[i]:=a[j];    a[j]:=t;
The first subtask can be solved by assuming the minimum is a[i], checking a[i] with a[i+1], a[i+2],...and whenever a smaller element is found, regarding it as the new minimum. Eventually a[n] is compared to the current minimum and we are done. Putting all these observations together we get the procedure sort.
Let us develop another program. We assume that we have n>=1 distinct integers which are already sorted and stored in the array a[1..n]. Our task is to determine if the integer x is present and if so to return j such that x=a[j]; otherwise return j=0. By making use of the fact that the set is sorted we conceive of the following efficient method:

"let a[mid] be the middle element. There are three possibilities. Either x<a[mid] in which case x can only occur as a[1] to a[mid-1]; or x>a[mid] in which case x can only occur as a[mid+1] to a[n]; or x=a[mid] in which case set j to mid and return. Continue in this way by keeping two pointers, lower and upper, to indicate the range of elements not yet tested."

The result is given in the procedure binsrch.

1.3 How to Analyze Programs
There are many criteria upon which we can judge a program, for instance:

Does it do what we want it to do?
Does it work correctly according to the original specifications of the task?
Is there documentation which describes how to use it and how it works?
Are procedures created in such a way that they perform logical sub-functions?
Is the code readable?

There are other criteria for judging programs which have a more direct relationship to performance. These have to do with computing time and storage requirements of the algorithms. Performance evaluation can be loosely divided into 2 major phases: (a) a priori estimates and (b) a postpriori testing. Both of these are equally important. Consider the examples below:
x:=x+1;
We assume that the statement x:=x+1 is not contained within any loop either explicit or implicit. Then its frequency count is one.
for i:=1 to n do
   x:=x+1;
Now, the same statement will be executed n times.
for i:=1 to n do
for i:=1 to n do
  x:=x+1;
It will be executed (n*n) times now.
To clarify some of the ideas stated in this section, take a look at a simple program for computing the n-th Fibonacci number. The Fibonacci sequence starts as

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55,....
Each new term is obtained by taking the sum of the two previous terms. If we call the first term of the sequence F[0] then F[0]=0, F[1]=1 and in general

F[n] = F[n-1] + F [n-2], n>=2.