********************************************************************* This article is being presented through the *StarBoard* Journal of the FlagShip/StarShip, SIGS (Special Interest Groups) on the Delphi and GEnie telecommunications networks. Permission is hereby granted to non-profit organizations only to reprint this article or pass it along electronically as long as proper credit is given to both the author and the *StarBoard* Journal. ********************************************************************* RELATIVE FILES MADE RELATIVELY EASY (Epilogue 1) by Bill Brier (TROUBLESOME on DELPHI) I. RECORD INDEXING RELative files operate by assigning each record a number. The record number tells the disk operating system (DOS) where to look in the file for the desired data. As a result, searches can be rapidly performed without a lot of overhead in the program. While the technique of assigning numbers to records is quite efficient from the disk drive's point of view, it is quite annoying from the user's point of view. This is because human beings find numbers rather difficult to associate with other information. Of course, the computer finds such association quite easy to deal with. Therefore, some sort of compromise must be worked out. That compromise is the FILENAME INDEX. In RELATIVE FILES MADE RELATIVELY EASY parts 1 and 2 we discussed the methods by which a mailing list could be implemented using a RELative file. So, let's see what it would take to index our mailing list. Each record contains the person's or company's name, street address, city and so forth. We can consider the person's or company's name to be the key field or element in the record. So, why not construct a filename index using filenames derived from each person's name? Then, once we've made a filename, we can simply append the record number to that filename and store it along with other filenames in a SEQuential file on the same disk. When the mailing list program is first RUN we would load the resulting filename index into memory and modify it each time a new record was added or removed from the RELative file. About this time you are saying that this is a great idea, but how do I create the filename? Let's do that right now. We'll introduce some new variables to the small army of variables that we already have. Here is how to create a name and a filename using simple string functions: 100 INPUT"FIRST NAME";GN$:IF GN$=""THEN100:REM NO NULLS 110 INPUT"LAST NAME";NA$:IF NA$=""THEN110 120 NA$=LEFT$(NA$,18):REM TRUNCATE LAST NAME 130 NF$=NA$+LEFT$(GN$,1):REM CREATE FILENAME 140 NA$=LEFT$(GN$,13)+CHR$(32)+NA$ This will result in NA$ containing the first and last name separated by a space so that a first name of Steven and a last name of Smith would look like this: STEVEN SMITH Notice how we truncated the first and last names so that the combined length of the two plus the space will not exceed the size of the name field in the record. This is necessary to avoid corrupting the next field with an overflow. In line 130 we create the filename by attaching the first letter of the first name to the end of the last name. Steven Smith's file name would be: SMITHS In order for our filename system to correctly operate, we must never allow a space in the filename and we may not duplicate filenames as the system will then be unable to distinguish one record from another. Lastly, we must organize the filenames into an array, and we must have some means of extracting the record number from the filename. In the process of accomplishing all of this, we will also incorporate a useful feature...filename PATTERN-MATCHING. II. FILENAME ARRAY STRUCTURE In order to manipulate the filenames we need to set up an string array in memory and store all of the filenames in it. This is initially accomplished with the DIMension statement. We will now introduce a few more variables for this purpose. The variable R will represent the number of the highest element in the array, and the variable N will represent the number of active filenames in the array. The variable R, in our mailing list, would equal 499 (not 500, as one might presume). The value of N could be anything from 0 (no names on file) to 500 (a full file). As you will see, these two variables will assume considerable importance in our program. First, let's DIMension our filename array: DIM NF$(R) That was simple, wasn't it? Remember that R has a value of 499. The DIMension statement sets up an array starting with NF$(0) and ending with NF$(499). Thus, the array actually has 500 elements. Of course, right now all 500 elements are nulls. For the sake of easy use we need to initialize the array, a task somewhat similar to the initializing of the RELative file when we first created it. Initializing the array will create 500 "dummy" filenames, complete with 500 record numbers. Later, as you add records, the dummy filenames will be replaced with actual filenames, as derived in the previous chapter. Here is how to initialize the index array: FOR I=0 TO R:NF$(I)=MID$(STR$(I+1),2):NEXT When this loop is done each element will contain a number from 1 to 500 (which corresponds with the available records in the RELative file). For example, NF$(212) would look like this: 213 The MID$ function eliminates the leading space that all positive numbers are stored with. The space is unnecessary and simply consumes more room in both the disk file and in memory. Initializing of the index array should be performed only when the new RELative file is first created. Thereafter, the index will be manipulated by the insertion or removal of filenames. III. APPENDING AND EXTRACTING RECORD NUMBERS Once the dummy filenames have been created we can start to fill our RELative file with the actual records. To do this requires that we know which records are available. Assuming that the index is always kept in alphabetic order (we'll discuss later how to do this from BASIC without a lot of hassle), the variable N will always "point" to the next dummy filename and thus the next available record. In our newly-created dummy filename index, variable N would initially have a value of zero. So, if we "look" at NF$(N) we would find: 1 The 1 is the unused and therefore available record number. Extracting the record number from a dummy filename is quite easy as the VAL function can be used: J = VAL(NF$(N)) Once we have extracted the unused record number we can then append it to the previously created filename with string functions. Assuming that the record number is represented by the variable J, we would append this record number to Steven Smith's filename (SMITHS) as follows (SMITHS is contained in NF$): NF$=NF$+STR$(J) This will result in NF$ containing: SMITHS 1 The space between the filename and the record number occurs because a string equivalent of a positive numeric variable takes the space as well as the number itself. This works out very nicely for us as we need the space to separate the filename and record number for easy location and extraction of the number. Obviously, we can't extract the record number of a "real" or ACTIVE filename with VAL as we did with the dummy filename as VAL will return a zero. Extracting the record number from an active filename takes a bit of programming with string functions. This is because the location of the record number depends on the length of the filename itself. Therefore, to find the record number we have to PARSE the filename in a loop. Parsing a filename successively looks at each character within the filename by taking repeated substrings of the filename. When doing searches of the mailing list you will have to extract the record number from each active filename in order to look at all records. This is best accomplished within a FOR-NEXT loop (or DO-LOOP in BASIC 7.0). Assuming that the loop variable I is pointing to the filename, here is how you get the record number into J: 8550 F=2 8555 IF MID$(NF$(I),F,1)<>CHR$(32)THEN F=F+1:GOTO 8555 8560 J = VAL(MID$(NF$(I),F)):RETURN Line 8555 does the actual parsing of the filename. It scans the filename until the space separating the name from the record number is located. When the space is found the routine falls through to 8560 where the actual value of the record number is derived. The reason for setting F to 2 is because that is the lowest position in the filename in which a space is to be found. The routine will operate slightly faster. In line 8560 we take advantage of a little-known function of Commodore BASIC...the REMAINDER STRING. A MID$ function without the second value returns all of the characters remaining at the point determined by the first value. The result of this in 8560 is to extract the entire record number rather than just a character or two. Let's see how this routine would work on Steven Smith's filename: SMITHS 1 When the routine is first entered, variable F will point at the M in SMITHS. Since the PETASCII value of M is not 32, F would be incremented and would successively point at I, T, H and S. Finally, F would point at the space between SMITHS and 1, and the loop would end. We would exit the routine with the variable J equal to the record number (1 in this example). NOTE: In BASIC 7.0 on the C-128 the INSTR function can be used to locate the space. The method used would be: F=INSTR(NF$(I),CHR$(32)). This would replace lines 8550 and 8555 in the above example. The reason for placing this sequence into a subroutine is so your program can easily extract the record number out of any filename in the index. This function will play great importance in searches. IV. MANIPULATING THE FILENAME INDEX Up to this point we have seen how to create a filename and how to append a record number to it. Additional examples will now be presented that scan the index for a duplicate filename, insert the new filename into the index at the proper point to keep the index in alphabetic order, and save the index on the file disk. As previously mentioned the variables N and R are quite important to the manipulation of the index. Variable N indicates the number of active filenames in the index and also "points" to the next dummy filename. Variable R is a constant of 499, which is the maximum number of records allowed in the mailing list. Prior to actually accepting a new record and filename for inclusion in the mailing list it is necessary to determine if space is available for the new record. This is easily accomplished by comparing N to R and setting a flag to indicate the result: F=0:IF N > R THEN F=1 If N is less than 500 then it is not greater than R (which is equal to 499) and flag F remains zero. If the file is full, N will equal 500 and the flag F will be set to one, indicating that no more entries can be accommodated. It is important to trap this condition as referencing NF$(N) when N=500 will cause a BAD SUBSCRIPT error (NF$ was DIMensioned to 499). If the flag F equals one then we know that the file is full, and a suitable warning can be displayed to the user indicating this fact. If room exists for a new entry then the new filename can be inserted in the index, and the counter N can be increased by one to show that the index has also increased by one. Insertion of a new name into the index should not be performed until after the record has been written into the RELative file. This is to avoid modifying the index in the event a disk error aborts the writing of the record. So, the procedure for adding a record to the mailing list is something like this: 1. Check variable N for space. If no space exists then abort the entry process. 2. Have the user type in the first and last name. 3. Construct the filename (SMITHS for example) and scan the active index for a duplicate. 4. If a duplicate is found advise the user and abort. Otherwise, have the user type in the remaining information for the record (address, city, etc.). 5. Extract the next available record number from the index as demonstrated above. This is variable J. 6. Write the record to the disk using J for the record number. 7. Upon a successful write to the disk append the record number in variable J to the filename and insert the filename into the index. 8. When all entries have been made save the filename index to the disk. Steps 1,2, and part of 3 have already been described in the chapters on manipulating the filename. We will now look at how to determine if a filename is a duplicate or is unique. The technique of PATTERN-MATCHING will also be presented. V. SCANNING THE INDEX FOR A DUPLICATE FILENAME Before we actually insert the new record into the mailing list and insert the new filename into the index, we must scan the existing filenames for a pattern match. If two identical filenames were permitted to exist in the index, it would be impossible to differentiate one from the other. To prevent such an occurrence we must scan the index from beginning to end and attempt to locate a duplicate filename. In the process of doing this we can also determine at what point in the index we should insert the new filename to keep everything in alphabetic order. Let's see how we can accomplish these operations. We know from previous discussion that variable N points to the next dummy filename in the index. This implies that the active index spans from NF$(0) up to NF$(N-1), containing all of the active filenames. Therefore, what needs to be done is to compare the new filename against each of the active filenames using a loop. If a duplicate is found we terminate the loop and set a flag to indicate the match. Here's a method of accomplishing this: 8570 I=0:F=0:L=LEN(NF$):IF N=0 THEN 8590 8575 IF LEFT$(NF$(I),L) = NF$ THEN F=1:GOTO 8590 8580 IF NF$ < NF$(I) THEN 8590 8585 I=I+1:IF I 19 THEN 1540 1520 X=0:PRINT#2,MID$(STR(N),2) 1530 FOR I = 0 TO R:PRINT#2,NF$(I):NEXT 1540 CLOSE2:IF X THEN 1580 1550 PRINT#1,"S0:INDEX":INPUT#1,X,X$ 1560 IF X > 19 THEN 1580 1570 PRINT#1,"R0:INDEX=INDEX.TEMP":FF=0:INPUT#1,X,X$ 1580 CLOSE1:RETURN Lines 1500 and 1510 OPEN the required files and check for errors. Variable X will contain the DOS error number while X$ will contain the plain English error message. If X is at least equal to 20 some type of DOS error has occurred and the routine aborts. Otherwise the string equivalent of variable N is saved in the file along with the 500 filenames. All of this data goes into temporary file INDEX.TEMP. Only after the save has been completed is the old index SCRATCHed from the disk. Then INDEX.TEMP is RENAMEd to INDEX. The purpose of this is to avoid the loss of the existing index in the event that power or hardware failure aborts the save of the new index. This is similar to the SAVE with replace (@0:) DOS function but avoids the bug that the DOS equivalent contains. Upon completion of the RENAME of the INDEX.TEMP file, the "file flag" FF is reset to zero. If a DOS error occurs during the save of the index, the routine will abort and report the error in X$. NOTE: In BASIC 3.5, 4.0 or 7.0 the DOS variables DS and DS$ perform the functions of reading the error channel and may be used in place of X and X$. The INPUT#1 routines can be likewise removed and the SCRATCH and RENAME functions used in place of lines 1560 and 1580 respectively. To load the index into memory the following routine may be used: 1600 DIM NF$(R):OPEN1,8,15:OPEN2,8,2,"0:INDEX" 1610 INPUT#1,X,X$:IF X > 19 THEN 1640 1620 INPUT#2,N$:N=VAL(N$):FOR I = 0 TO R 1630 INPUT#2,NF$(I):NEXT 1640 CLOSE2:CLOSE1:RETURN Upon exit from this routine variables X and X$ will report any disk errors as above. The error you would want to watch for is error 62 (file not found) which would indicate that the file disk has no mailing list on it. You could then create the new RELative file and filename index as we have discussed. VIII. DELETING RECORDS FROM THE MAILING LIST From time to time it may be necessary to remove a record from the mailing list. This may be accomplished by simply removing the corresponding filename from the index and placing the corresponding record number back into the "pool" of available records. To do this requires that you first locate the desired filename to be removed, noting its position in the index. Then you would extract the record number from it, shift the rest of the index down one position to close the gap where the filename used to be located and then assign the record number to an unused filename in the index. You would follow this by reducing the filename count (variable N) by one and setting the file flag FF for writing the revised index back to the disk. Fortunately, some of the subroutines that we have already developed for other functions can be used for deleting filenames. To delete a filename you need to know what that name is: 1700 INPUT"FILENAME";NF$:IF NF$=""THEN 1700:REM NO NULLS Following the input of the filename you should call the subroutine at 8570 (see above) and test the flag variable F. If F is zero then nothing matching the input filename was located, and the user should be advised with a suitable error response from your program. It is important to consider that if flag F is set to one upon exit from the subroutine at 8570, it means that the input filename patter-matched with an index filename. Therefore the index filename, which will be the one that will be deleted, should be displayed to the user for confirmation before it is actually removed. For example, the user may wish to delete Steven Smith's record. In response to the input prompt the user types SMITH. However, let's suppose that in the index the filenames SMITHD, SMITHF, SMITHR, SMITHS (the filename that the user wishes to delete) and SMITHT all exist. Because the index is scanned from bottom to top by the subroutine at 8570, and because the SMITH input filename will pattern-match with anything starting with SMITH, the program will flag SMITHD as the one to delete. Hence the need for confirmation. Once the user has confirmed that the displayed filename is to be deleted you must extract the record number from the filename so that the record number can be returned to the index for reuse. This of course may be performed by calling the subroutine at 8550 (see above). Upon exit from this subroutine variable J will contain the record number of the filename to be deleted. With these operations completed the next step would be to shift all index filenames down one position starting with the filename immediately above the one to be deleted. Then, after the shift had been completed, the record number stored in variable J would be inserted at the top of the active index and the value of variable N would be reduced by one. Here is a routine that would accomplish this 1800 IF N=1 OR I=N-1 THEN 1820 1810 FOR F = I+1 TO N-1:NF$(F-1)=NF$(F):NEXT 1820 NF$(N-1)=MID$(STR$(J),2):N=N-1:FF=1 Line 1800 bypasses the shift routine in line 1810 if N equals 1 or if I equals N-1 because in those two cases a shift isn't required. If N equals one this implies that the last filename remaining in the index is to be deleted, which of course obviates the need for a shift. If variable I is the same as N-1 this implies that the filename being deleted is the last one in the active portion of the index and also obviates the need for a shift. If neither one of these conditions exist then the index will be shifted down one position until the end of the active index (N-1) is reached. Finally the value of N is reduced as there is one less filename in the index and the file flag FF is set to one indicating that the index must be written back to the disk. With the filename deleted from the index there is no way to retrieve the corresponding record from the RELative file. Although the RELative file itself was not disturbed, as far as the mailing list is concerned, the record no longer exists and the record is now available for use by some other entry. IX. STRING MANIPULATION AND GARBAGE COLLECTION ON THE C-64 The foregoing routines all involve considerable string manipulation in order to structure and control the filename index. In BASIC 2.0, as implemented on the C-64, this may result in garbage collection and program delays. There is really no simple method of avoiding garbage collection from BASIC due to the manner in which BASIC stores strings in memory. A good compiler can be of assistance in this case. Not only will the compiler perform much more efficient garbage collection, it will also make the search and index shift routines execute much faster, resulting in a substantial improvement to the overall program speed. The main disadvantage of using a compiler is that the compiled program usually is quite large compared to the BASIC source program. This will reduce the available memory for variable storage and may cause the program to crash with an OUT OF MEMORY error. The time required to locate a filename in the index and the time required to shift the index when inserting or removing filenames depends on the number of elements in the index and the relative position of the affect index entry in the entire array. If the index is nearly full (500 entries) eight to ten seconds may be required to locate the desired index filename. A compiled version of the program can improve that by a factor of ten or more. The same holds true for index shifting. Therefore in the interest of speed, compiling is recommended. X. WRAP-UP In the RELATIVE FILES MADE RELATIVELY EASY series we have seen how to create a RELative file, how to enter records into that file, how to do field searches and, with this article, how to construct and maintain a filename index of the records. Although we used a mailing list as an example of how to perform these functions, the principles set forth in these articles may be applied to virtually any kind of database system that requires a RELative file and index approach. Whether the "filenames" are actually names, part numbers of a stock room inventory or a list of files on a disk, the essential elements of a successful program remain the same. I hope that this series of articles has been beneficial and instructive. If you have questions on anything that has not been covered in one of these articles, please direct it to me via the DELPHI FLAGSHIP FORUM. W.J. Brier 7-86