cbaseTM The C Database Library Citadel Brookville, Indiana Copyright  1989 by Citadel. All rights reserved. Citadel 241 East Eleventh Street Brookville, IN 47012 317-647-4720 BBS 317-647-2403 Version 1.0.1 This manual is protected by United States copyright law. No part of it may be reproduced without the express written permission of Citadel. Technical Support The Citadel BBS is available 24 hours a day. Voice support is available between 10 a.m. and 4 p.m. EST. When calling for technical support, please have ready the following information: - product name and version number - operating system and version number - C compiler and version number - computer brand and model UNIX is a trademark of AT&T. MS-DOS is a trademark of Microsoft Corporation. Turbo C is a trademark of Borland International, Inc. Contents Chapter 1. Introduction 1 Chapter 2. Database Basics 3 2.1 Sequential File Structures 2.2 Inverted Files 2.3 B-trees Chapter 3. cbase Library Functions 7 3.1 Access Control Functions 3.2 Lock Functions 3.3 Record Cursor Position Functions 3.4 Key Cursor Position Functions 3.5 Input/Output Functions 3.6 Data Import/Export Functions Chapter 4. An Example Program 15 4.1 Data Definition 4.2 Opening a cbase 4.3 Locking a cbase 4.4 Accessing a cbase 4.5 Closing a cbase 4.6 Manipulating Exported Data Appendix A. Installation Instructions 27 A1 The blkio Library A2 The btree Library A3 The lseq Library A4 The cbase library Appendix B. Defining New Data Types 33 B1 The Type Name B2 The Comparison Function B3 The Export and Import Functions B4 The Type Count Appendix C. Porting to a New Operating System 37 C1 The HOST macro C2 The File Descriptor Type C3 File Access System Calls C4 File Locking System Calls References 41 Chapter 1: Introduction cbase is a C database file management library. Records may be accessed both randomly and sequentially through indexes stored in B+-trees. Records may also be accessed sequentially in the order in which they are stored. Multiuser access is supported under any operating system with file locking capabilities. cbase is designed for portability. It is written in strict adherence to the ANSI C standard while retaining compatibility with with K&R C compilers. All system dependent code is isolated in order to make porting easy. Many of the operations performed by cbase internally represent independently useful tools, and the software has been designed with this in mind. cbase actually comprises four individual libraries, each complete and independently accessible. Figure 1.1 shows these libraries and their relationships. ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ cbase ³ ÀÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÙ ÚÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄ¿ ÚÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄ¿ ³ lseq ³ ³ btree ³ ÀÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÙ ÀÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÙ ÚÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄ¿ ³ blkio ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 1.1. cbase and Underlying Libraries At the foundation of cbase is the blkio (block buffered I/O) library. blkio is a buffered I/O library similar to stdio, but based on a file model more appropriate for structured files such as used in database software. While stdio models a file as an unstructured stream of characters, blkio models a file as a collection of blocks made up of fields (see FROS89 for a complete description of blkio). The lseq (linked sequential file) library provides all the facilities necessary for the creation and manipulation of doubly linked sequential files. The btree (B-tree) library provides the same for B+-tree files. The cbase library uses lseq and btree to perform all structured file management operations. The lseq library is used for record storage and the btree library for inverted file key storage. When using a particular library, all operations are performed with functions provided by that library. No references need be made to underlying libraries. When using the cbase library, it is therefore not necessary to know the functions included in the other libraries. Chapter 2: Database Basics This chapter describes some of the basic database concepts relevant to cbase. It is intended to be a brief and readily accessible introduction, and can be easily skipped by one already familiar with database fundamentals. References are given where in-depth discussions may be found. 2.1 Sequential File Structures One of the simplest file organizations is the physical sequential file. In this organization, records are simply written one after the other and sequential access is done in the order in which the records are physically stored. The physical sequential file works very well in cases where the data is static, but it is extremely inefficient when the data is dynamic. Consider a file containing 100,000 records stored in a sorted order. The insertion of a single record at the beginning of this file would result in 100,000 records being moved to make room at the beginning of the file for the new record. This identical problem also occurs when large ordered lists are stored in memory, and the same solution, the linked list, can be used for files (see pp. 106-12 of HORO76 for an explanation of linked lists in memory). In a linked list, the record order is determined by pointers stored with each record, not by the physical record locations; a record logically located at the beginning of a file can be physically located anywhere within the file. In a singly linked list each record has only a pointer to the next record and so the file can be traversed in just one direction. In a doubly linked list, each record has both a next and a previous pointer, allowing bidirectional access. 2.2 Inverted Files Assume a data file containing member records for an organization, and that the record format for this file is typedef struct { long number; char name[24]; char address[81]; char city[24]; char state[3]; char zip[11]; } member_t; Assume further that the data file has a physical sequential file structure, and that the records are sorted by the number field, therefore a binary search may be performed to quickly find a record with a given number field. The field that the records are sorted on is called the primary index. The problem arises when a query is made on another field besides the primary index, in which case a search of the entire data file is required. The problem of being able to perform queries efficiently on more than one field can be resolved through the use of inverted files. For each secondary index exists an inverted file containing an entry for each record in the data file. The inverted file for the name field, for example, would have entries containing the name field and record position, and these would be sorted by the name field. To find a record with a given value in the name field, the inverted file for the name field would be searched. The record position paired with the specified name value would then be used to locate the record in the data file. See pp. 531-533 of HORO76 and pp. 75-78 of ULLM82 for more information on inverted files. 2.3 B-trees As discussed above, the physical sequential file provides unacceptable performance for dynamic data. For static data, however, it provides efficient sequential and passable random access (using a binary search). The linked sequential file solves the problem of performance for dynamic data while preserving efficient sequential access, but the ability to randomly access records is lost. The need to store dynamic data which may be randomly accessed led to the development of the B-tree. As the name implies, the B-tree stores data in a tree structure, which allows random access. And since the tree nodes are connected by pointers, the update problem is solved in the same way as by the linked sequential file organization. The basic B-tree does not provide very efficient sequential access, however. This led to a B-tree variant called the B+-tree. The B+-tree incorporates a linked list into its structure to achieve efficient sequential as well as random access. While the B+-tree does come close to providing the best of all worlds, it has two important drawbacks. First, there is significantly more storage overhead for a B+-tree than for a sequential file. Every entry in a B+-tree is stored in a leaf node, and the rest of the tree is simply scaffolding used when the tree is searched. Second, each entry in any type of B-tree must be unique. These characteristics make B-tree file organizations inappropriate for data files containing records, which may be both large and duplicated. But for inverted files, generally having relatively short entries that are by their very nature unique (because the file position of each record in the data file would be unique), the B+-tree is an ideal choice. cbase therefore uses the linked sequential file organization for record storage and B+-trees for inverted files. The mechanics of B-trees is beyond the scope of this chapter. A good discussion of the B-tree and its major variants may be found in COME79. Chapter 3: cbase Library Functions 3.1 Access Control Functions The cbcreate function is used to create a new cbase. int cbcreate(const char *cbname, size_t recsize, int fldc, const cbfield_t fldv[]); cbname points to a character string which is the name of the cbase. This name is used as the name of the data file containing the records in the cbase. recsize specifies the record size to be used. fldc is the number of fields in the cbase, and fldv is an array of fldc field definition structures. Each structure in the array contains the definition for one field. field definitions must be in the order the fields occur in the record. The field definition structure type cbfield_t is defined in . typedef struct { /* field definition */ size_t offset; /* field offset */ size_t size; /* size of field */ int type; /* type of field */ int flags; /* flags */ char filename[FILENAME_MAX + 1]; /* data file name */ } cbfield_t; offset is the location of the field within the record and size is the size of the field. type specifies the field data type, legal values for which are shown in Table 3.1. flags values are constructed by bitwise ORing together flags from the following list. CB_FKEY Field is to be a key. CB_FUNIQ Only for use with CB_FKEY. Indicates that the key is constrained to be unique. FILENAME_MAX is an ANSI C definition indicating the maximum length of a filename string. It is defined in . t_char signed character t_charv signed character array t_uchar unsigned character t_ucharv unsigned character array t_short signed short integer t_shortv signed short integer array t_ushort unsigned short integer t_ushortv unsigned short integer array t_int signed integer t_intv signed integer array t_uint unsigned integer t_uintv unsigned integer array t_long signed long integer t_longv signed long integer array t_ulong unsigned long integer t_ulongv unsigned long integer array t_float floating point t_floatv floating point array t_double double precision t_doublev double precision array t_ldouble long double t_ldoublev long double array t_pointer pointer t_string character string t_cistring case-insensitive character string t_binary block of binary data (e.g., graphics) Table 3.1. cbase Data Types The first step in accessing an existing cbase is to open it, which is done using the function cbase_t *cbopen(const char *cbname, const char *type, int fldc, const cbfield_t fldv[]); cbname, fldc, and fldv are the same as for cbcreate, and must be given the same values as when the cbase was created. type points to a character string specifying the type of access for which the cbase is to be opened (as for the stdio function fopen). Legal values for type are "r" open for reading "r+" open for update (reading and writing) cbopen returns a pointer to the open cbase. The cbsync function causes any buffered data for a cbase to be written out. int cbsync(cbase_t *cbp); The cbase remains open and the buffers retain their contents. After processing is completed on an open cbase, it must be closed using the function int cbclose(cbase_t *cbp); The cbclose function causes any buffered data for the cbase to be written out, unlocks it, closes it, and frees the cbase pointer. 3.2 Lock Functions Before an open cbase can be accessed, it must be locked. The function used to control the lock status of a cbase is int cblock(cbase_t *cbp, int ltype); where cbp is a pointer to an open cbase and ltype is the lock type to be placed on the cbase. The legal values for ltype are CB_RDLCK lock cbase for reading CB_WRLCK lock cbase for reading and writing CB_RDLKW lock cbase for reading (wait) CB_WRLKW lock cbase for reading and writing (wait) CB_UNLCK unlock cbase If ltype is CB_RDLCK and the cbase is currently write locked by another process, or if ltype is CB_WRLCK and the cbase is currently read or write locked by another process, cblock will fail and set errno to EAGAIN. For the wait lock types, cblock will not return until the lock is available. The cbgetlck function reports the lock status held by the calling process on a cbase. int cbgetlck(cbase_t *cbp); It returns one of the legal values for the ltype argument in the cblock function. 3.3 Record Cursor Position Functions Each open cbase has a record cursor. At any given time the record cursor is positioned either on a record in that cbase or on a special position called null. The record on which the cursor is located is referred to as the current record. The operations performed by most cbase functions are either on or relative to the current record, so the initial step in a transaction on a cbase is usually to position the record cursor on the desired record. When accessing the records in a cbase in the order that they are stored, the following functions are used to move the record cursor. int cbrecfirst(cbase_t *cbp); int cbreclast(cbase_t *cbp); int cbrecnext(cbase_t *cbp); int cbrecprev(cbase_t *cbp); The cbrecfirst function positions the record cursor to the first record, and cbreclast to the last record. Before calling either of these functions cbreccnt should be used to test if the cbase is empty. unsigned long cbreccnt(cbase_t *cbp); If the cbase is empty, there is no first or last record and so these functions would return an error. The cbrecnext function advances the record cursor to the succeeding record, and cbrecprev retreats it to the preceding record. In the record ordering, null is located before the first record and after the last. There are also functions for saving the current position of the record cursor and resetting it to that position. int cbgetrcur(cbase_t *cbp, cbrpos_t *cbrposp); int cbsetrcur(cbase_t *cbp, const cbrpos_t*cbrposp); The cbgetrcur function gets the current position of the record cursor and saves it in the variable pointed to by cbrposp. cbrpos_t is the cbase record position type, defined in . cbsetrcur can then be used later to set the record cursor back to that position. The record cursor can be positioned on null by passing cbsetrcur the NULL pointer rather than a pointer to a variable. Other than this special case, cbsetrcur should only be called with record cursor positions previously saved with cbgetrcur. The cbrcursor macro is used to test if the record cursor for a cbase is positioned on a record or on null. void *cbrcursor(cbase_t *cbp); If the record cursor of the cbase pointed to by cbp is positioned on null, cbrcursor returns the NULL pointer. If it is on a record, cbrcursor returns a value not equal to the NULL pointer. This function is useful for loops needing to test when the last (or first) record has been reached. The cbrecalign function aligns the record cursor with a specified key cursor. int cbrecalign(cbase_t *cbp, int field); field is the key with which to align the record cursor. The relationship between the key cursors and the record cursor is explained in the next section. 3.4 Key Cursor Position Functions In addition to a record cursor, each open cbase also has a key cursor for each key defined for that cbase. Like the record cursor, a key cursor is positioned either on a record in that cbase or on null. To access a cbase in the sort order of a certain key, the appropriate key cursor is used instead of the record cursor. Each key cursor moves independently of the others, but whenever a key cursor position is set, the record cursor is moved to the same record. The key cursors are not affected by moving the record cursor. The following functions are used to move a key cursor. int cbkeyfirst(cbase_t *cbp, int field); int cbkeylast(cbase_t *cbp, int field); int cbkeynext(cbase_t *cbp, int field); int cbkeyprev(cbase_t *cbp, int field); These perform as do the corresponding functions for the record cursor. Note that the key cursor functions can be used only with fields defined to be keys (see cbcreate in section 3.1). The following function is used to search for a key of a certain value. int cbkeysrch(cbase_t *cbp, int field, const void *buf); field is the key to search for the data item pointed to by buf. If the key is found, 1 is returned and the key and record cursors positioned to the record having that key. If there is no record with that key, 0 is returned and the key and record cursor positioned to the record (possibly null) that would follow a record with that key value. Since the key cursors do not automatically follow the record cursor, the situation sometimes occurs where the record cursor is positioned to the desired record, but the cursor for the key to be used next is not. The cbkeyalign function is used to align a specified key cursor with the record cursor. int cbkeyalign(cbase_t *cbp, int field); The reason the key cursors are not updated every time the record cursor moves is not because it would be in any way difficult to do so, but because this would increase the overhead enormously. And since only one key cursor is normally used at a time, this extra overhead would almost never provide any benefit in return. As for the record cursor, each key cursor position can be tested to be positioned on a record or on null. void *cbkcursor(cbase_t *cbp, int field); If the key cursor specified by field of the cbase pointed to by cbp is positioned on null, cbkcursor returns the NULL pointer. If it is on a record, cbkcursor returns a value not equal to the NULL pointer. 3.5 Input/Output Functions To read a record from a cbase, the record cursor for that cbase is first positioned to the desired record using either the record cursor position functions or the key cursor position functions. One of the following functions is then called to read from the current record. int cbgetr(cbase_t *cbp, void *buf); int cbgetrf(cbase_t *cbp, int field, void *buf); cbp is a pointer to an open cbase and buf points to the storage area to receive the data read from the cbase. The cbgetr function reads the entire current record, while cbgetrf reads the specified field from the current record. The function for inserting a new record into a cbase is int cbinsert(cbase_t *cbp, const void *buf); where buf points to the record to be inserted. When a new record is inserted into a cbase, the position it holds relative to each key cursor is defined by the sort order for that key field. There is no predefined sort order associated with the record cursor, however, and it is up to the user whether or not to store the records for each cbase in a sorted or unsorted order. To store records in a sorted order, the record cursor is first positioned the the record after which to insert the new record. cbinsert is then called to insert the record pointed to by buf after the current record. If no sort order is desired, the step to position the record cursor is skipped. The cbdelcur function is used to delete a record. int cbdelcur(cbase_t *cbp); The record cursor must first be positioned on the record to delete, then cbdelcur called to delete the current record. The cbputr function writes over an existing record. int cbputr(cbase_t *cbp, const void *buf); buf points to the new record contents. Writing over an existing record is equivalent to deleting the record and inserting a new one in the same position in the file. If the new record contains an illegal duplicate key, this will cause the insert to fail, resulting in the record having been deleted from the cbase. The exact behaviour that a program should have in such a circumstance is different for different applicatons, and so it is often desirable to use cbdelcur and cbinsert directly rather than cbputr. 3.6 Data Import/Export Functions cbase data can be exported to a text file using the cbexport function. int cbexport(cbase_t *cbp, const char *filename); Every record in cbase cbp is converted to a text format and written to the file filename. The export file format is defined as follows. - Each record is terminated by a newline ('\n'). - The fields in a record are delimited by vertical bars ('|'). - Each field contains only printable characters. - If a field contains the field delimiter character, that character is replaced with \F. - The individual elements of array data types are exported as individual fields. Data may be imported from a text file using the cbimport function. int cbimport(cbase_t *cbp, const char *filename); cbimport reads each record from the text file filename and inserts it into the cbase cbp. Data import/export is primarily used to move data between different database formats. This sometimes requires some slight rearranging of the text before importing. This can be most easily done using awk, a language designed specifically for manipulating records stored in text files. awk is normally included with UNIX, and is also available for MS-DOS. Chapter 4: An Example Program Included with cbase is rolodeck, an example program illustrating the use of cbase. Rolodeck is a program for storing business cards. To allow it to be compiled without requiring any additional libraries for displays, and because the purpose of the program is purely instructional, the program has been given only a simple user interface. 4.1 Data Definition The first step in writing a program using cbase is to define the data to be stored. This should be done in a separate header file for each record type to be used. Figure 4.1 shows rolodeck.h, the data definition header for the business card record type used by the rolodeck program. Starting at the top, the first thing to note is the definition of a macro to prevent multiple includes. This is a general practice applicable to any header file, cbase or otherwise, whose purpose is to allow a header to be specified for inclusion multiple times in a file while ensuring that the definitions within the file are processed only once. For a header containing nothing but macro definitions, the only effect of preventing multiple includes will be reduced compile times. But for a header containing, for instance, type definitions, multiple includes would produce compilation errors. Notice how the include macro is related to the header file name; the file name is converted to upper case and the period replaced by an underscore ('.' is not a valid character for C identifiers). These macros should be named consistently to minimize the possibility of accidentally defining the same macro elsewhere for some other purpose. At the beginning of every data definition header file, should be included. Following this is defined the name of the cbase. This character string is the name used for the data file. This macro is to be used for the cbname argument when calling cbcreate and cbopen. #ifndef ROLODECK_H /* prevent multiple includes */ #define ROLODECK_H #include /* cbase name */ #define ROLODECK ("rolodeck.dat") /* record definition */ typedef struct { char rd_contact[41]; /* contact name */ char rd_title[41]; /* contact title */ char rd_company[41]; /* company name */ char rd_addr[2*40+1]; /* company address */ char rd_city[26]; /* city */ char rd_state[3]; /* state */ char rd_zip[11]; /* zip code */ char rd_phone[13]; /* phone number */ char rd_ext[5]; /* phone extension */ char rd_fax[13]; /* fax number */ char rd_notes[4*40+1]; /* notes */ } rolodeck_t; /* field names */ #define RD_CONTACT (0) #define RD_TITLE (1) #define RD_COMPANY (2) #define RD_ADDR (3) #define RD_CITY (4) #define RD_STATE (5) #define RD_ZIP (6) #define RD_PHONE (7) #define RD_EXT (8) #define RD_FAX (9) #define RD_NOTES (10) #define RDFLDC (11) /* field definition list */ extern const cbfield_t rdfldv[RDFLDC]; #endif /* #ifndef ROLODECK_H */ Figure 4.1. rolodeck.h #ifndef ROLODECK_I /* prevent multiple includes */ #define ROLODECK_I #include #include #include "rolodeck.h" /* field definition list */ static cbfield_t rd_fldv[] = { {offsetof(rolodeck_t, rd_contact), sizeofm(rolodeck_t, rd_contact), t_string, CB_FKEY | CB_FUNIQ, "rdcont.ndx"}, {offsetof(rolodeck_t, rd_title), sizeofm(rolodeck_t, rd_title), t_string, 0, ""}, {offsetof(rolodeck_t, rd_company), sizeofm(rolodeck_t, rd_company), t_string, CB_FKEY, "rdcomp.ndx"}, {offsetof(rolodeck_t, rd_addr), sizeofm(rolodeck_t, rd_addr), t_string, 0, ""}, {offsetof(rolodeck_t, rd_city), sizeofm(rolodeck_t, rd_city), t_string, 0, ""}, {offsetof(rolodeck_t, rd_state), sizeofm(rolodeck_t, rd_state), t_string, 0, ""}, {offsetof(rolodeck_t, rd_zip), sizeofm(rolodeck_t, rd_zip), t_string, 0, ""}, {offsetof(rolodeck_t, rd_phone), sizeofm(rolodeck_t, rd_phone), t_string, 0, ""}, {offsetof(rolodeck_t, rd_ext), sizeofm(rolodeck_t, rd_ext), t_string, 0, ""}, {offsetof(rolodeck_t, rd_fax), sizeofm(rolodeck_t, rd_fax), t_string, 0, ""}, {offsetof(rolodeck_t, rd_notes), sizeofm(rolodeck_t, rd_notes), t_string, 0, ""} }; #endif /* #ifdef ROLODECK_I */ Figure 4.2. rolodeck.i Next is the type definition for the record being defined. Each field in the record (member in the structure) begins with a character sequence derived from the cbase name, usually two characters, followed by an underscore. This character sequence should be unique across all cbase record types used by a given program. This type is for use when declaring record variables. Variables of this type should be used for functions such as cbgetr requiring a pointer to a record. Following the record type definition is a list of macros used to identify a field when calling cbase functions. The names of these macros are constructed by taking the field names from the record type definition and converting them to uppercase; this is why the field names must be made unique. Each field name macro must be an integer indicating the number of the field in the record type definition, starting at zero. A macro for the total number of fields is also defined. This macro is to be used for the fldc argument when calling cbcreate and cbopen. Finally, the field definition array is declared. The elements of the field definition list are of type cbfield_t, defined in . Since more than one source file will in general need to include the same header, the actual definition of the field definition array must be placed in a separate file. This file is given a .i extension to indicate that it contains data or code as opposed to a .h file containing only definitions, and it is included by only one source file, normally the one containing main. rolodeck.i is shown in figure 4.2. The field definition array specifies, for each field in the record, that field's offset in the record, size, type, some flags, and a filename to be used if the field is to be a key. The offsetof macro should be used to specify the offset. A macro sizeofm (size of member) is defined in to give the size of a member of a structure. size_t sizeofm(struct_t, member); The arguments to sizeofm are the same as for offsetof. The size cannot be calculated from the difference between field offsets because padding characters may be added between structure members to maintain proper alignment (see KERN88). The field definition array is to be used as the fldv argument to cbcreate and cbopen. It should be noted that every record type should normally have at least one unique key field. This field is used to uniquely identify records. The physical record position is often used for this purpose, but, as will be discussed later in this chapter, a unique key is required for multiuser applications. In addition to the data definition header, all programs using cbase normally require that the following header files also be included. #include #include #include is the header for the blkio library. It is included to provide the definition of the NULL pointer and the declaration of the function bexit. bexit is for use in place of exit in any program using the blkio library for file access. It writes out any buffered data for any open block file then calls exit (see bexit in the blkio reference manual). is the cbase header file containing all the constant and type definitions and function declarations for using the cbase library. contains the declaration of errno as well as definitions of error code values; all cbase functions use errno for error reporting. 4.2 Opening a cbase The first step in accessing an existing cbase is to open it. Figure 4.3 shows the code from rolodeck.c to open the rolodeck cbase. rolodeck is opened with a type argument of "r+" to allow both reading and writing. The other arguments are the cbase name, ROLODECK, the field count, RDFLDC, and the field definition list, rdfldv, all defined in the data definition header file, rolodeck.h. On error cbopen returns the NULL pointer. For this program there is only one cbase, but most applications will require more. If the named cbase does not exist, cbopen will fail and set errno to ENOENT. In this example, if the rolodeck cbase does not exist, it is created and the program continues as normal. Note that the cbase must still be opened after it is created. In some cases a separate program is written to create all the cbases required by an application; in this case the main program would probably interpret ENOENT as an error. /* open rolodeck cbase */ cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv); if (cbp == NULL) { if (errno != ENOENT) { bexit(EXIT_FAILURE); } /* create rolodeck cbase */ printf("Rolodeck does not exist. Creating...\n"); if (cbcreate(ROLODECK, sizeof(rolodeck_t), RDFLDC, rdfldv) == -1) { bexit(EXIT_FAILURE); } cbp = cbopen(ROLODECK, "r+", RDFLDC, rdfldv); if (cbp == NULL) { bexit(EXIT_FAILURE); } } Figure 4.3. Opening a cbase 4.3 Locking a cbase Before accessing an open cbase, it must first be locked. If data is to be written to the cbase, it must be write locked, otherwise only a read lock is required. A cbase can be read locked by more than one process at the same time, and read locks are therefore also called shared locks. A write lock, on the other hand, is an exclusive lock; a write locked cbase can be neither read nor write locked by any other process. Write locks are exclusive because, if one process tried to read data while it was partially modified by another, the data would probably be in an inconsistent state. Processes that will only read data, however, can safely do so concurrently. While a cbase is write locked, other processes needing to access that cbase must wait until it is unlocked so that they can in turn lock it themselves to complete their processing. While a cbase is read locked, only processes needing to write must wait. Using a write lock when a read lock would suffice will therefore delay other processes unnecessarily. Locks of either type should be held for the shortest time possible; a common mistake in writing multiuser applications is to pause for use input while holding a lock, causing that lock to be held indefinitely. If an attempt is made to obtain a lock on a cbase, but is blocked by a lock held by another process, cblock will fail and set errno to EAGAIN. The call to cblock is therefore usually made in a loop with a predefined maximum number of tries. It is convenient to place this in a function configured for the application being developed. Figure 4.4 shows this function from rolodeck.c. It may also be suitable in some instances to sleep for a short (possibly random) time between attempts to lock. #define LMAXTRIES (20) /* max lock tries */ /* rdlock: rolodeck lock */ int rdlock(cbase_t *cbp, int ltype) { for (int i = 0; i < LMAXTRIES; ++i) { if (cblock(cbp, ltype) == -1) { if (errno == EAGAIN) { continue; } return -1; } else { errno = 0; return 0; } } errno = EAGAIN; return -1; } Figure 4.4. Rolodeck Locking Function There are also two lock types (CB_RDLKW and CB_WRLCKW) which, if the requested lock is blocked, will wait until it can be obtained. These are not usually used, however, because if the lock does not become free in a reasonable time, the process waiting for the lock will be hung. For an applications where there will be only a single process, the necessary locks can be set immediately after opening the cbases to be accessed and left locked. One critical concern when locking multiple cbases is the possibility of deadlock. Deadlock is an extensive subject, and there are a number of ways of dealing with it. Most texts on operating systems (see CALI82) and database theory cover the subject in detail. 4.4 Accessing a cbase The gross structure of the rolodeck program is a case statement within a loop. At the start of the loop a user request is read and used to select the action performed in the case statement. Each individual action performed in the case statement illustrates the use of cbase to perform a basic operation, e.g., inserting a record, deleting a record, finding the next record, exporting data to a text file, etc. The operation of finding the next record serves as a good general example. The code for this from rolodeck.c is shown in figure 4.5. One of the most important points to notice in the example code is that a unique key (the contact name, here) is used to relocate the current record when a cbase is locked. This is because, when a cbase is unlocked, it may be modified by another process. A record at a given location may be deleted, and the empty slot possibly reused for a new record. Because of this, cbsetrpos cannot be used with a record position obtained during a previously held lock. As mentioned in the section 4.3, applications that do not involve multiple processes accessing the same data can simply lock a cbase and leave it locked rather than locking the cbase immediately prior to each transaction. Another central point is the use of multiple keys. In the rolodeck program, both the contact and the company names are keys. A variable sf is used in rolodeck.c to identify the current sort field, which can be changed interactively. Before using the cbkeynext function, the appropriate key cursor must first be positioned. cbkeysrch positions only the key being searched, here being the unique key. If the next card is to be found using the sort order of a different key, cbkeyalign must first be used to align that key cursor with the current record. case REQ_NEXT_CARD: /* next card */ rdlock(cbp, CB_RDLCK); if (cbreccnt(cbp) == 0) { printf("The rolodeck is empty.\n\n"); rdlock(cbp, CB_UNLCK); continue; } /* use unique key field to set rec cursor */ found=cbkeysrch(cbp,RD_CONTACT,rd.rd_contact); if (sf != RD_CONTACT) { /* align cursor of sort key */ cbkeyalign(cbp, sf); } if (found == 1) { /* advance key (and rec) cursor 1 pos */ cbkeynext(cbp, sf); } if (cbrcursor(cbp) == NULL) { printf("End of deck.\n\n"); rdlock(cbp, CB_UNLCK); continue; } cbgetr(cbp, &rd); rdlock(cbp, CB_UNLCK); break; /* case REQ_NEXT_CARD: */ Figure 4.5. Next Rolodeck Record 4.5 Closing a cbase When a program is through accessing a cbase, the cbase should be closed. Figure 4.6 shows this code from rolodeck.c. /* close cbase */ if (cbclose(cbp) == -1) { bexit(EXIT_FAILURE); } Figure 4.6. Closing a cbase A cbase is automatically unlocked when it is closed. A cbase is normally, but not necessarily, opened and closed only once in a program. 4.6 Manipulating Exported Data Exported data often requires some processing before it is used. For instance, consider modifying an application to add a new field to an existing cbase. If there is a considerable amount of data already stored, it would be desirable to use the import/export functions rather than manually entering the data again. Another common example is moving data between cbase and another database system, which may use different field and record separators for exported data. An ideal tool for processing records in text files is awk. Figure 4.7 shows an awk program for inserting a new field at position two in all the records in a text file (note that awk field numbering starts at one, not zero). The predefined variables FS and OFS are used to set the input and output field separators, respectively. The predefined variables RS and ORS are used to set the input and output record separators, respectively. Setting these variables appropriately is all that is necessary to convert between text file formats using different field and record separators. The awk program in figure 4.8 converts text files exported from a database using the tab character as a field separator for import by cbase. BEGIN { FS = "|"; # set input and output field OFS = FS; # and record separators RS = "\n"; ORS = RS; NEWFIELD = 2; # field to insert } # insfld: insert field n of current record function insfld(n) { if (n < 1 || n > NF + 1) { return -1; } for (i = NF; i >= n; --i) { $(i + 1) = $i; } $n = ""; return 0; } { if (insfld(NEWFIELD) == -1) { printf "Error inserting new 2nd fld.\n"; exit 1; } print $0; } END { exit 0; } Figure 4.7. awk Program to Insert Field BEGIN { FS = "\t"; # set input and output field OFS = "|"; # and record separators RS = "\n"; ORS = RS; } { print $0 } END { exit 0; } Figure 4.8. awk Program to Change Field Separator Appendix A: Installation Instructions The cbase library is distributed in MS-DOS format on either two 360K 5.25" diskettes or one 720K 3.5" diskette. On the distribution diskettes is a directory containing the files for each library and for rolodeck, the example program. There is also a directory for manx, the utility used to extract an on-line copy of the reference manual for each library, and a directory containing installation batch files for different MS-DOS compilers. blkio blkio library btree btree library lseq lseq library cbase cbase library manx manx utility rolodeck example program bats batch files for different compilers If cbase is being installed on an MS-DOS system, the following two commands will copy the contents of the distribution diskettes onto the main drive. > mkdir cbase > xcopy a:\ cbase /s /v An operating system besides MS-DOS will require either a facility to read MS-DOS diskettes or access to an MS-DOS machine from which files can be transferred by a serial link or network to the target machine. If the transfer process does not automatically convert the text files to the format of the target system, an additional conversion utility may be necessary. The second step in the installation is set a switch specifying the host operating system. This switch is the HOST macro in the blkio file blkio_.h. If the host operating system is MS-DOS, the MSDOSC macro in the same file must also be set to specify the C compiler being used. If the compiler being used is not completely ANSI compatible, some additional switches must also be set. These are a set of macros located in the blkio header file blkio.h, each of which specifies a particular ANSI feature. For instance, the macro AC_PROTO is used to indicate function prototype support. The remaining installation instructions for each library, which should be performed in the directory containing the files for that library, are given below. Before proceeding, the manx utility should be compiled and placed in a directory in the path. After all the libraries have been installed, the final step is to compile the example program; the instructions for doing this are given in the example program's readme file. The MS-DOS installation batch files, install.bat, each take two arguments. The first specifies the memory model, legal values for which are s, m, c, l, and h; the library file is named libm.lib, where lib would be the library name and m would correspond to the memory model of the library. The second, if present, causes the reference manual to be extracted from the source code into the file lib.man, where lib would again be the library name. The main batch file included with each library is written for Borland Turbo C. Because there is so little uniformity among C compilers for MS-DOS, modifications will be required for other compilers. Instructions for making these straightforward modifications are given at the beginning of each install.bat. Some batch files modified for other compilers can be found in the bats directory. Also, if a make utility is available, the UNIX makefiles may instead be adapted. A1. The blkio Library UNIX 1. Set the HOST macro in blkio_.h to UNIX. 2. Install the boolean header file. $ su # cp bool.h /usr/include # ^d 3. Extract the on-line reference manual. $ make man 4. Build the blkio library. $ make blkio 5. Install the blkio library. This will copy the blkio header file blkio.h to /usr/include and the blkio library archive to /usr/lib. $ su # make install # ^d MS-DOS 1. Set the HOST macro in blkio_.h to MS_DOS. 2. Set the MSDOSC macro in blkio_.h to the C compiler being used. 3. If necessary, modify install.bat for the C compiler being used. 4. Extract the reference manual and build and install the blkio library. > install s x Run again for each additional memory model desired. A2. The btree Library UNIX 1. Install the blkio library. 2. Extract the on-line reference manual. $ make man 3. Build the btree library. $ make btree 4. Install the btree library. This will copy btree.h to /usr/include and the btree library archive to /usr/lib. $ su # make install # ^d MS-DOS 1. Install the blkio library. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the btree library. > install s x Run again for each additional memory model desired. A3. The lseq Library UNIX 1. Install the blkio library. 2. Extract the on-line reference manual. $ make man 3. Build the lseq library. $ make lseq 4. Install the lseq library. This will copy lseq.h to /usr/include and the lseq library archive to /usr/lib. $ su # make install # ^d MS-DOS 1. Install the blkio library. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the lseq library. > install Run again for each additional memory model desired. A4. The cbase library UNIX 1. Install the btree and lseq libraries. 2. Extract the on-line reference manual. $ make man 3. Build the cbase library. $ make cbase 4. Install the cbase library. This will copy cbase.h to /usr/include and the cbase library archive to /usr/lib. $ su # make install # ^d MS-DOS 1. Install the btree and lseq libraries. 2. If necessary, modify install.bat for the C compiler being used. 3. Install the cbase library. > install s x Run again for each additional memory model desired. Appendix B: Defining New Data Types cbase is designed to allow custom data types to be defined by the user. Custom data types are completely integrated, becoming indistinguishable from those predefined. A data type definition consists of a macro used as the type name (e.g., t_string), and three functions: a comparison function, an export function, and an import function. The comparison function is the most important; it determines the sort order for data of that type. The export function is used to export data of the associated type to a text file, and the import function to import data. Below are given step-by-step instructions for defining a new cbase data type. B1. The Type Name For each cbase data type there is a corresponding type name by which the user refers to that data type. The type names are defined in cbase.h. #define t_char (0) /* signed character */ ... #define t_binary (26) /* binary data */ Type names must be defined as integers, starting at zero and increasing in steps of one. The type name for a new data type would be added at the end of this list, and be defined as an integer one greater than the last data type in the list. To avoid possible conflict with future predefined types, user defined type names should not start with t_. The prefix ut_ is recommended for user defined types. #define ut_new (27) /* new data type */ B2. The Comparison Function A data type is characterized primarily by its sort order. Each data type is given a comparison function defining this sort order. Comparison functions are of the form int cmp(const void *p1, const void *p2, size_t n); p1 and p2 are pointers to two data items to be compared, and n is the size of the data items. The value returned must be less than, equal to, or greater than zero if the data item pointed to by p1 is less than, equal to, or greater than, respectively, that pointed to by p2. The C standard library function memcmp would be a valid cbase comparison function. All cbase comparison functions are located in the file cbcmp.c. For a new data type, a comparison function would be added in this file. static int newcmp(const void *p1, const void *p2, size_t n) { ... } Comparison functions are made static because they are accessed by cbase only through an array of function pointers, cbcmpv, also defined in cbcmp.c. This array contains the comparison function for each cbase data type. The integer value of the type name is used by cbase as an index into this array, and so the comparison functions must be in the same order as the type names. A pointer to the comparison function for a new data type would be added at the end of this array. /* cbase comparison function table */ cbcmp_t cbcmpv[] = { charcmp, ... binarycmp, newcmp }; B3. The Export and Import Functions Each data type has an associated export function. This export function takes a data item of the associated type and writes it to a file in a text format. Export functions are of the form int exp(FILE *fp, const void *p, size_t n); p is a pointer to the data item of size n to be exported. The export function converts the data item to a text format, then writes it to the current position in file fp. Upon successful completion, a value of zero is returned. Otherwise, a value of -1 is returned. See cbexport in the cbase Programmer's Reference Manual for a description of the format of exported data. All cbase export functions are located in the file cbexp.c. For a new data type, an export function would be added in this file. static int newexp(FILE *fp, const void *p, size_t n) { ... } Just like comparison functions, export functions are accessed by cbase through an array. This array, cbexpv, is defined in cbexp.c. A pointer to the export function for the new data type would be added at the end of this array. The import function reads a data item from a text file. Import functions are of the form int imp(FILE *fp, const void *p, size_t n); The parameters and return value are the same as for the export function. Import functions are located in cbimp.c. Pointers to the import functions are stored in the array cbimpv. B4. The Type Count The macro CBTYPECNT is defined in cbase_.h as the number of data types defined. It must be incremented by one for each new data type added. After completing these steps, rebuild the cbase library (see Appendix A). The underlying libraries do not need to be rebuilt. Appendix C: Porting to a New Operating System The blkio library provides a means for portable access to structured files just as the stdio library does for text files. blkio is thus the only library requiring modification to port to a new operating system. The steps necessary to perform this port are outlined below. C1. The HOST Macro In the blkio library's private header file blkio_.h, a macro is defined for each supported operating system. When installing the blkio library, the host operating system is selected by defining the HOST macro using one of these system macros. When porting to a new operating system, a new system macro must be defined in blkio_.h. #define UNIX (1) /* UNIX */ #define MS_DOS (2) /* MS-DOS */ #define NEWOS (3) /* new os */ #define HOST NEWOS In some instances it will be necessary to take into account variations among the C compilers available for a system. For MS-DOS, the macro MSDOSC, also defined in blkio_.h, is used to select a compiler in the same way as the HOST macro is used to select the operating system. To port to a new C compiler under MS-DOS, a new compiler macro must be defined. #define TURBOC (1) /* Turbo C */ #define MSC (2) /* Microsoft C */ #define NEWC (3) /* new C compiler */ #define MSDOSC NEWC To port to multiple incompatible compilers under a new operating system, a new compiler selection macro analogous to MSDOSC would need to be defined. C2. The File Descriptor Type In most operating systems, an open file is accessed not by name, but through some type of handle, usually called a file descriptor. File descriptors are normally of type int, but this is not guaranteed. A union is therefore defined (in blkio.h) for the file descriptor. typedef union { /* fildes type */ char cfd; /* character fildes */ short sfd; /* short int fildes */ int ifd; /* int fildes */ } fd_t; This type is used to define the fd member of the BLKFILE structure. typedef struct { /* block file ctl struct */ fd_t fd; /* file descriptor */ ... } BLKFILE; When modifying the code in subsequent sections, the appropriate member of the union fd_t would be used to access a file descriptor. If the file descriptor type for the new system is short, for instance, the file descriptor for BLKFILE *bp would be accessed as bp->fd.sfd. It will be necessary to add a member to the fd_t union if one of the required type does not already exist. C3. File Access System Calls The bulk of the operating system specific code is related to the system calls used to access the file system. These system calls perform basic operations such as opening, reading, and writing a file, and are conceptually the same on most systems. In fact, they can usually be directly translated to a corresponding call on the new system. All system calls accessing the file system are isolated in the file buops.c (blkio unbuffered operations). The HOST macro is used to separate sections of code used for different operating systems. Similarly, compiler selection macros such as MSDOSC are used to separate sections of code for different compilers under the same operating system. #if HOST == UNIX /* code for UNIX */ . . #elif HOST == MS_DOS /* code for MS-DOS */ #if MSDOSC == TURBOC /* code for Turbo C */ . . #elif MSDOSC == MSC /* code for Microsoft C */ . . #endif #endif When porting to a new operating system (or compiler), each of these conditional compilations using HOST (or the relevant compiler selection macro) must be located and an additional #elif added for the new system (or compiler). C4. File Locking System Calls System calls are also used to perform file locking. All system calls for file locking are located in the file lockb.c. This file must be modified as was buops.c. If file locking will not be used on the new system, lockb.c need not be altered. References CALI82 Calingaert, P. Operating System Elements. Englewood Cliffs, NJ: Prentice Hall, 1982. COME79 Comer, D. The Ubiquitous B-tree. ACM Computing Surveys, June 1979. FROS89 Frost, L. A Buffered I/O Library for Structured Files. The C Users Journal, October 1989. HORO76 Horowitz, E. and S. Sahni. Fundamentals of Data Structures. Rockville, MD: Computer Science Press, 1976. KERN88 Kernighan, B. and D. Ritchie. The C Programming Language. Englewood Cliffs, NJ: Prentice Hall, 1988. KNUT68 Knuth D. The Art of Computer Programming Volume 3 / Sorting and Searching. Reading, MA: Addison-Wesley, 1968. ULLM82 Ullman, J. Principles of Database Systems. Rockville, MD: Computer Science Press, 1982.