Liberty Basic is develeopped by Carl Gundel
Original Newsletter compiled by Alyce Watson and Brosco
Translation to HTML: Raymond Roumeas

The Liberty Basic Newsletter - Issue #41 - JUN 99

"Knowledge is a gift we receive from others." - Michael T. Rankin

In this issue:

Searching Arrays - Part 1, by Ian Davies Thanks, Ian for this great article - and for all of your contributions to

the LB community!

In future issues:


Copyright (c) 1999, Ian Davies: davie-@ipc.kit.ac.jp

Searching arrays - Part 1

This edition of the Liberty BASIC newsletter will concentrate on how to efficiently search arrays in Liberty BASIC and introduces the concept of binary searching. The second part of the newsletter will focus on a more general example of binary searching.

If the explanation within this newsletter is difficult to understand then feel free just to adapt the code to your own needs. The main point should be that you try and incorporate the binary search technique within your programmes.

Array Basics

As we all probably know, arrays are very useful tools to store lists of information. A simple example would be an address book. If we want to store the names of 20 friends i n our programme then we can use an array in the following manner:

' First we should DIM the array to a value larger than we'll ever use
DIM name$(100)
' Next we fill in the array with names of our friends
name$(1)="John"
name$(2)="Janet"
.....
.....
.....
name$(20)="Susan"

Well, this isn't a very useful basis for an address book as it only contains the names of our friends. In fact, this example is the simplest type of array and called a "1 dimensional" array as each row, i.e., name$(1), name$(2), etc., can store only one piece of information - in this case the person's name.

It would be much more useful to have an array that can store several pieces of information for each row. For example, in an address book we might want to store the person's name, sex, birthday, telephone number, and address. In this case we need to store 5 pieces of information for each person so we need a 5 dimenional array as follows:

DIM address$(100,5)
address$(1,1)="John" : address$(1,2)="M" : address$(1,3)="June 4th"
address$(1,4)="(0123) 456789 : address$(1,5)="3 High Street, Anytown,
Anyplace"
 
address$(2,1)="Janet" : address$(2,2)="F" : address$(2,3)="October 23rd"
address$(2,4)="(0987) 654321 : address$(2,5)="45 Main Road, Bigtown,
Overthere"
.....
.....
.....
address$(20,1)="Susan" : address$(20,2)="F" : address$(20,3)="February 16th"
address$(20,4)="(0321) 987654 : address$(20,5)="87 River Avenue,
Smallvillage, Nowhere"
 

Why do we want to search an array?

Suppose that we want to write a programme to store and the names and addresses of our friends. The programme would need a way to search through this array in order, for example, to check if a new name we want to input is already in our address book. An easy way to search through an array would be the following:

' Number of rows occupied in the array
N=20
' The name of the person we wish to search for
Search$="Lisa"
' Uses a flag to check if we find a match
Flag=0
' Counts how many times the FOR ... NEXT loop is executed
Loops=0
FOR x=1 TO N
' Increments Loops by 1
Loops=Loops+1
IF Search$=address$(x,1) THEN Flag=1
NEXT x
IF Flag=1 THEN PRINT "This person is already in the database" ELSE _
PRINT "No match was found"

This simple programme will check all N names in our array (N=20 for our case) to search for a match. If there is a match then Flag=1 and we can use the value of the flag to print an appropriate message or jump to a subroutine. Note that in this case we were only interested in whether ANY duplicates existed. If we wanted to know HOW MANY duplicates existed then we could change "Flag=1" to "Flag=Flag+1".

In the case that we are concerned only whether ANY duplicates exist, we can see that the above code is very inefficient. The reason being that we search through all N rows of the array whether or not a duplicate is found. In this case the value of Loops will always be N at the end of the programme. However, if we want to search efficiently (i.e., quickly) through an array then we need to find a way to make the final value of "Loops" as small as possible.

One simple method to improve the efficiency of our search is to exit the FOR ... NEXT loop as soon as we find a match:

FOR x=1 to N
Loops=Loops+1
' Note the extra instruction in the following line
IF Search$=address$(x,1) THEN Flag=1 : GOTO [Match]
NEXT x
' The programme will only reach this point if there WAS NOT a match
PRINT "No match was found"
GOTO [No_match]
' The programme will only reach the following point if there WAS a match
[Match]
PRINT "This person is already in the database"

This routine might look fine to start with but in fact it contains an example of bad programming that we should try to avoid. The reason is that we should never use a GOTO statement to jump out of a FOR .... NEXT loop. Instead we should use something like the following:

FOR x=1 to N
Loops=Loops+1
' Note the change of x=N in the following
IF Search$=address$(x,1) THEN Flag=1 : x=N
NEXT x
IF Flag=1 then PRINT "This person is already in the database" ELSE _
PRINT "No match was found"

We can see that, by setting x=N when a match exists, the programme will exit the FOR .... NEXT loop at the next NEXT statement. Generally it is always advisable to exit a FOR .. NEXT loop by setting a flag variable and making x=N (or whatever is appropriate for your case).

In our very first example we saw that Loops=N when we exited the FOR ... NEXT loop. How about this new case? Well, sometimes we will find a match quickly (when x is small), sometimes we'll find a match near the end of searching (when x is nearly equal to N) and other times we will not find any match (in which case x will be equal to N). The answer is that, on average, if the array contains one match then Loops=N/2. Indeed, if the array contains m matches then typically Loops=N/(m+1).

If we assume there to be a single match then we have changed Loops from N in our first example to N/2 in the present example. This shows that we can decrease searching time by 50% with hardly any programming effort.

Well, if your arrays only have 10 or 20 rows then you could stop at this point and be satisfied with a 50% decrease in searching time. However, what if our address programme contains details for everyone in our street or town or country? What if, instead of N being 10 or 20, we have N equal to 1,000 or 1,000,000?

We can imagine that even a 50% decrease is searching time is not going to help us much if we still have to search through 500,000 rows. On a normal PC this might take 30 minutes or so. What if I said that a method exists to decrease our searching time by 99.998%, i.e., make the searching speed 50,000 times quicker?

I thought that you would be impressed.

Binary searching

Binary searching is a very efficient method to search ordered arrays. We can present the idea of binary searching through a simple example as follows.

Imagine that you have a ruler whose length is 32 centimetres instead of the normal 30 centimetres. Now think of a number between 1 and 32 ..... say 11.

Put your finger at the halfway (i.e., 16 cm) mark.

Is the number we want smaller or larger than where we are now? The answer is "smaller" as 11<16. Now move your finger 1/2*1/2 of the ruler (i.e., 1/4*32=8) towards the 0 cm mark. Your finger should now be on the 8 cm mark of the ruler.

Is the number that we want smaller or larger than where we are now? The answer is "larger" as 11>8. Now move your finger 1/2*1/2*1/2 of the ruler (i.e., 1/8 *32=4) towards the 32 cm mark. Your finger should now be on the 12 cm mark.

Is the number we want smaller or larger than where we are now? The answer is "smaller" as 11<12. Now move your finger 1/2*1/2*1/2*1/2 of the ruler (i.e., 1/16*32=2) towards the 0 cm mark. Your finger should now be on the 10 cm mark.

Is the number we want smaller or larger than where we are now? The answer is "larger" as 11>10. Now move your finger 1/2*1/2*1/2*1/2*1/2 of the ruler (i.e., 1/32*32=1) towards the 32 cm mark. Your finger should now be on the 11 cm mark.

You can see that we "zoomed" into our search value, i.e., 11, using only 4 steps (or 5 if you include putting our finger at the midpoint). Each step would be equivalent to one loop in our earlier code. We can also see that the amount we moved quickly became smaller ... 8 ... 4 ... 2 ... 1. Anyone familiar with computers should recognise these numbers to be the first four of the series 2^x where x is referred to as the number of bits. The series is

1 ... 2 ... 4 ... 8 ... 16 ... 32 ... 64 ... 128 ... 256 ... 1024 ... etc. For example, an 8 bit display can show 2^8 or 2x2x2x2x2x2x2x2 or 256 colours.

Well, what value of x do we need for the case of our ruler, i.e., 2^x=32. The answer is x=5 and not by coincidence was this was the number of steps we needed to find the 11 centimetre mark. In fact, if we want to binary search an array with N rows then the number of steps we need is given by 2^x=N. In fact, we'll see in the second part of this newsletter that this isn't always true but for the present examples we can assume it to be true.

What we are saying is that, for an array of size N, the value of "Loops" when our programme ends will be given by 2^Loops=N or we can write it another way as Loops=log(N)/log(2). Our first searching technique gave us Loops=N, then Loops=N/2, and now Loops=log(N)/log(2).

Now let's calaculate the value of "Loops" for arrays with different sizes, i.e., different values of N.

N=32

Method 1 : Loops=32

Method 2 : Loops=16

Binary method: Loops=4

N=1024

Method 1: Loops=1024

Method 2: Loops=512

Method 3: Loops=10

N=1,048,576

Method 1: Loops=1,048,576

Method 2: Loops=524,288

Method 3: Loops=20

If we assume that it takes about the same time to perform one loop for each searching method then we can say that an array with 1,048,576 rows can be searched more than 26,000 times quicker using the binary search technique. Even for a small array with 32 rows this technique is at least 4 times quicker.

By this point you might be asking the following questions:

(i) How can searching for numbers on a ruler be applied to a real array?

(ii) How can the binary method be applied to my array when it isn't 2^x rows in size?

(iii) What happens if I binary search an array where there aren't any matches?

In this newsletter we'll only consider the first question. The other questions will be answered in part two of the newsletter.

To answer the first question, imagine that our ruler is an array and each centimetre mark is one row such that we have ruler(1), ruler(2), ruler(3), ruler(4) ...... ruler(32). The actual values of each row are not important for the binary search; only that the array should be sorted with ruler(1) the smallest and ruler(32) the largest. This is important ....

Only sorted arrays can be searched using the binary method.

The SORT command in Liberty Basic will correctly and quickly sort arrays of any dimension.

Well, enough of this talking. Here is some code that shows the result of binary searching an array with 8192 (i.e., 2^13) rows. The array is called "address" and to make it easy, the array values are simply address(1)=1, address(2)=2, address(3)=3 .... address(8192)=8192. However, they could have been filled with names, dates, colours, country names, etc., and it wouldn't make any difference. Of course, you'd have to use address$ instead of address if the array contained text.

When you run the programme it will take a few seconds before anything happens. This is due to the filling of the array with its 16,384 values together with the SORT command (which was not strictly necessary in this case as the array was already sorted when created). A random number between 1 and 16,384 is then picked and this number is searched for within the array. Results for each search will be printed meaning that 20 searches have been performed between the time the first text appears and the time the text stops moving up the screen In fact, searching through an array with 268 million rows (i.e., 16384^2) would take only twice as long to complete but actually creating the array prior to searching would take forever even if the computer had sufficient memory.

If you want to be brave then try increasing the array size to N=32768 or N=65536. You will see that the search time itself is only slightly increased.

 ' Simple programme to show binary search method
' Ian Davies
'
' First we should DIM the array to a value
' larger than we'll ever use
DIM address(70000)
' Actual size of the array we use
N=16384
' How many times to perform the search
Cycles=20
 
GOSUB [Setup_Array]
FOR y=1 to Cycles
GOSUB [Compare]
GOSUB [Results]
NEXT y
END
 
[Setup_Array]
' Create an simple array filled with
' integer numbers
FOR x=1 to N
address(x)=x
NEXT x
' In this case the array is already sorted
' but you might need to SORT for your array
SORT address(,1,N
RETURN
 
[Compare]
' Picks a number between 1 and N
' to search for
SearchNumber=INT(RND(1)*N)+1
' Sets initial pointer to N/2
' i.e., the midpoint of the array
CurrentPointer=N/2
' Sets the first movement of the pointer to N/4
CurrentMove=N/4
' Sets the number of loops to zero
Loops=0
' The important part starts here
WHILE CurrentMove>0.25
OldPointer=CurrentPointer
' Moves the pointer up or down the array
IF SearchNumber>address(INT(CurrentPointer)) THEN _
CurrentPointer=CurrentPointer+CurrentMove ELSE _
CurrentPointer=CurrentPointer-CurrentMove
' Decreases the pointer movement value by 1/2
CurrentMove=CurrentMove/2
' Increments the Loops variable
Loops=Loops+1
WEND
RETURN
 
[Results]
IF SearchNumber=address(INT(OldPointer)) THEN _
PRINT "Value searched for ";SearchNumber: _
PRINT "Value found ";address(INT(OldPointer)) _
ELSE _
PRINT "Value searched for ";SearchNumber: _
PRINT "Value found ";address(INT(OldPointer+1))
PRINT "Number of loops ";Loops
' Checks that we really found the value
IF SearchNumber<>address(INT(OldPointer)) AND _
SearchNumber<>address(INT(OldPointer+1)) THEN _
PRINT "ERROR ERROR ERROR ERROR ERROR ERROR ERROR"
PRINT : PRINT
RETURN

If you want to understand exactly what is happening then try using N=4 or N=8 and follow the programme through with the debugger.

One result from this programme is that the value we are searching for, "SearchNumber", will be either address(OldPointer) or address(OldPointer+1). We have set up the array so that a match definitely exists which means that if SearchNumber is not equal to address(OldPointer) or address(OldPointer+1) then something went wrong and an error message will be printed. It would be possible to create a binary search that definitely picks the correct array row. However, it isn't that bad if we search through 16384 rows and end up with two possible matches as it's only an extra line of code to decide which of these possibilities is the correct one.

Unlike the non-binary search methods, we didn't check for a match each time that we went through the WHILE ..... WEND loop. You'll see that we only checked for a match when the "WHILE CurrentMove>0.25" became false, meaning that we had already zoomed into the likely target. The reason is that we are dealing with large arrays so it's unlikely that we'll accidently land on the correct array row until that condition is met. For example, if it takes us 20 loops to search through an array with 1 million rows then the chance of us accidently finding the correct row until the very end is extremely unlikely.

The most important line within the programme is:

WHILE CurrentMove>0.25

as this decides at what point we believe we are close enough to the target to stop searching.

You can try changing this value to something larger such as 0.5 but then you'll notice that some of the results show an error as the correct array row wasn't found. So, at least for the way this programme is written, "WHILE CurrentMove>0.25" is the correct condition that decides when to stop searching.

Summary

In summary, we considered several different methods to search arrays. If we deal with large arrays then the binary method is hundreds or thousand times quicker than other methods. The only constraint for binary searches is that the array must be sorted.

The example of binary searching we presented showed that an array with 16,384 rows can be searched very quickly. In the next part of this newsletter we will look at a more general case of binary searching of arrays.


Newsletter compiled and edited by: Brosco and Alyce.

Comments, requests or corrections: Hit 'REPLY' now!

mailto:brosc-@orac.net.au

or

mailto:awatso-@mail.wctc.net