LB NEWSLETTER #11

Liberty Basic is develeopped by Carl Gundel
Original Newsletter compiled by Alyce Watson and Brosco
Translation to HTML: Raymond Roumeas

Brosco's Liberty Basic Newsletter - Issue #11 - July 98

In this Issue:

A proposal for you to consider
Indexing Concepts

1) A proposal for you to consider

If you look at my previous newsletters you see that most of them hover around the 10Kb mark. The reason for this is simple. ListBot provides this as a free service. If I wanted to write a larger Newsletter I would have to upgrade to their commercial service - around US$100 per year.

Considering that I am already spending extra money with my ISP to pay for accesses to my Web Site, plus I contribute a substancial amount of time to produce this Newsletter, I dont believe that I should incur yet another cost to provide a FREE service.

There are now over 80 subscribers to this newsletter, if everyone contributed a dollar or so - we would have 12 months access to a Newsletter service with NO restrictions and NO adverts.

In reality, the likelyhood of everyone contributing a $1.00 is non-existent. If there are any contributions at all, they would come from just a handful of generous souls!

So - does that handful of generous contributers exist? If you are interested in contributing to (what I believe to be) an important LB resource - please send me an email stating the amount that you could contribute. You never know, I might get sufficient contributers to allow me to happily reply to you that the contribution can be reduced! Regardless, at this stage, your email would NOT be a committment - it would purely be an expression of interest. I would not ask for any money to be paid until I was sure that the full $100 could be collected. I also would not subscribe to the commercial service until the full amount had been sent in.

By the way, this offer is NOT open to LBers who are already substancial contributers in the way of Sites, code, assistance, etc. - you know who I mean - but I wont list them here for fear of leaving someone off the list (and a 10KB maximum).

This offer is for LBers who would like to contribute something back to the community - but so far have not found a way to do so.

WHAT WOULD YOU GET IN RETURN?

Basically, very little, other than the warm feeling that comes from knowing you are assisting others.

Your name (and amount contributed) would be listed in the Newsletter - every issue for the next 12 months. You may remain anonymous, if you desire.

Would you get any special privledges - like priority with support requests, or having your topic covered more quickly in the Newsletter? - ABSOLUTELY NOT!!!

Would the Newsletters become bigger than 10Kb? Not substantially, but regularly I need to edit out 1 or 2Kb just to make it fit. Also, sometimes I need to direct you to an additional download for some additional material that accompanies the Newsletter. This would become unneccessary.

Could I potentially make money out of this? ABSOLUTELY NOT. All contributions will be listed in the Newsletter. You will be able to see the status on the 'fund' and exactly how much money has been donated. Excess collections will either be refunded, or, kept for the following year - depending on your preference.

In I get little or no response from this - I will not bring the matter up again - we will just continue the way we are. I know that most people in the community are hobbiests, and many on limited budgets. It will not be a personal affront to me if we can't get this working, but it will be a major disappointment!

Please email me your thoughts on this topic.

2) Indexing Concepts.

Let me get one thing cleared up immediately.

In the PC world - many people refer to indexing a Random file as using a DataBase. This is technically incorrect. A 'True' database usually contains several files which are 'related' in some way. For example - in a package that looks after the accounts of a business there will be several files:

Customer info
Invoices
Inventory
BankBook info
 
etc.

It is all of these files in combination that make up the database - but in isolation - they are just indexed files.

We have been using the word 'Database' to describe the file that is holding our Movie Cassette collection - so we'll stick with it. However, to be technically correct - this newsletter is really about 'indexing' techniques - not Database - OK?

First of all - what is an Index?

An Index is just another file that contains shortcuts to information stored in a data file. For example - all your bank account information is stored in data files on your bank's computer. And when you use your ATM card to withdraw money - the program must locate the information about your account to verify that you have sufficient funds available to make the withdrawal. If the program had to scan the entire data file to find your account information - it would take far too long - because most banks have hundreds of thousands (even millions) of accounts.

So to speed up the process of finding your account information, there is a separate file - called an index. This index just contains a list of all the Account Numbers and the Record Number of where the information is stored on the data file. Now scanning an index file that contains millions of entries wouldn't be much faster than scanning the data file - so how does this help?

The index file is created in such a way as to make searching very fast. If you read my tutorial about array searching techniques - you will see a very fast technique called 'Binary Search'. An index works in a very similar way. You dont need to understand how this works - but if you are a glutton for technical detail - here's a very simplistic example:

Suppose that a bank only had 256 active accounts. The index would be constructed like this:

Entry #1: A pointer to another index entry that indexes Account Numbers in the range '1' to '128' - lets assume thats in Index Entry #2. And a pointer to another index entry for the Account numbers in the range '129' to '256'.

Entry #2: A pointer to another index entry that indexes Account Numbers in the range '1' to '64' - say entry #3. And a pointer to the index entry for account numbers '65' to '128'.

Entry #3: A pointer to another index entry that indexes Account Numbers in the range '1' to '32', and a pointer to the index entries in the range '33' to '64'.

etc.

This 'Halving the keys' process is continued until there is only one key left - and that index entry holds the Record number of the data in the data file.

So - to locate an Account Number - the Indexing software only needs to access 8 Index entries. Since these entries are very small - (just the KEY and a KeyReference number ) all the index entries will normally be held in memory. Even for a very large database with thousands of records, the number of disk accesses to the index file is minimal - just 2 or 3 - maximum.

On a test I did with DBdll - I created an index for a list of book titles. The title was 70 bytes long and there were 15,000 titles. The index file created was 2 Megabytes! BUT - to find a find any particular entry only required a maximum of 4 disk acccesses! Usually, 2 of these entries where still in memory (in the buffer) so, in reality, there were only 2 physical disk accesses for any particular search!

The sofware required to maintain an index is incredibly complex and way beyond the scope of this newsletter. But the DBdll does all of this for you.

OK - enough technical stuff - lets get back to our Bank Account example.

You can understand that the data file needs to be indexed by the account number so that your account information can be located very quickly.

The Account Number is referred to as the "PRIMARY KEY". An Account Number is UNIQUE. That is, every person will have a different account number.

You got that? Do NOT progress any further into this until you fully understand the above concepts. Read it over a few times if necessary.

OK - I believe you - its not that hard to understand - is it?

Today we're having a bad day - we've lost our wallet and therefore our ATM card. So we walk into the bank to explain the problem and request a new card.

The conversation will go something like this:

Bank Teller:   "What's your Account Number?".                        
ME:            "How the hell do I know?  That's recorded on my ATM card
                - and that's lost!  Geeeez - what a stupid question."
Bank Teller:   "OK - calm down sir, what's your name?"
ME:            "Brosco"
Bank Teller:   (after typing something in the computer)

 "ah yes, we have a couple of people by the name of Brosco with 
 accounts here - could you just give me your Address and Date-of-birth 
 so that I can verify which account number is yours"

OK - the Accounts file is indexed by Account Number. How did the bank teller find my account details so quickly? The secret is that there is a "SECONDARY INDEX". This is a second index file, but instead of using account number as the Key - it uses customer Name as a Key.

Now its quite possible for people to have the same name - so this key is NOT UNIQUE. The indexing program must allow for DUPLICATE keys pointing to different account information.

The DBdll will allow you to create as many indexes as you need. The only restriction is the FILES limitation in Windows - usually 16.

(another example of a newsletter that is only 10KB - another 1 or 2K would have allowed just that little extra!)

Newsletter written by: Brosco.
Comments, requests or corrections to: brosco@orac.net.au
Translated from Australian to English by an American:
Alyce Watson.  Thanks Alyce.