PARTIAL INDEXING Ira Emus Sep. 19, 1988 BIX irae CIS 76702,672 By now I assume you've all heard of partial indexing, the process of creating an index that contains pointers to only a subset of the records in the database. If not, you might ask why you would want an index that only contains a subset of the records and as a not-so-pleasant side-effect will boot you to DOS if you do not treat it with kid gloves. There are two answers; Speed; the time needed to create a partial index may be as little as 2% of the time needed to create a full index, and Speed; a database with an active partial index only looks as big as a database which contains the number of records included in the index, usually a small subset of the whole file. With the creation time reduced in this fashion, it becomes quite feasible to create an index for ad-hoc queries and reports. The simplest partial indexing routine looks something like this: INDEX ON partial1() TO temp FUNCTION partial1 DO WHILE ! .AND. RECNO() < LASTREC() GOTO RECNO()+1 ENDDO RETURN This is faster if you only extract a small subset of the file since Clipper only has to create an index on the number of returned records. This may cut indexing time up to 50%, depending upon the size of the file and percentage of records placed into the index. The GOTO recno(), instead of SKIP, is necessary since during the creation of an index, it is unclear what order the file is really in, is it in record number order, the order of the last controlling index, or the order of the index that is corrently being built? Since the UDF is executeing in the normal Clipper environment which is possibly at a different state than the internal indexing routine, it is important to realize that things may not be exactly as they seem. An example that shows a much larger time saving is creating an index with a relation set. I will use as an example an order entry system with three files, a name file that contains buyer and receiver information; an order header file that contains order info and an entry that points to the buyer in the name file; and a line item file that contains the item that was ordered and an entry that points to the receiver in the name file. It is not too far-fetched to assume that I might want to generate a report, ordered on the buyer's name or ZIP code, based on the people who purchased a particular item. To accomplish this feat in the past might have required something like this: SELE 0 USE names INDEX custcode SELE 0 USE ordrhedr INDEX sender SET RELATION TO sender INTO names SELE 0 USE lineitem INDEX orderno SET RELATION TO orderno INTO ordrhedr INDEX ON names->zip TO temp This works great, but it's slow, because for every record in the lineitem database, 2 SEEKs were performed, one for each relation. If you only need the index for a report and only want a subset of the records, a partial index can assure that the 2 SEEKs are only performed for the records that are to be included in the report. If the subset to be extracted is a small percentage of the database, you may see reductions in indexing time of up to 90% This is a sample of the code needed to do this type of index: SELE 0 USE names INDEX custcode SELE 0 USE ordrhedr INDEX sender SELE 0 USE lineitem INDEX ON partial2() TO temp FUNCTION partial2 DO WHILE (lineitem->RECNO()) < lineitem->(LASTREC()) SELE lineitem DO WHILE product # "BASKET"; .AND. RECNO() < LASTREC() GOTO RECNO() + 1 ENDDO SELE ordrhedr SEEK lineitem->orderno IF FOUND() SELE names SEEK ordrhedr->sender IF FOUND() EXIT ENDIF ENDIF SELE lineitem GOTO RECNO() + 1 ENDDO SELE lineitem RETURN names->zip A problem that manifests itself while using partial indexes is that the last record is always included in the index, a key must be returned for the last record so that Clipper knows that it's time to stop creating keys and start generating the index, and quite that last record is not a part of the selected set. One solution is to set a filter on the selection criteria you used to create the index; in the last example it would be something like this: SET FILTER TO (lineitem->product) = "BASKET" This will have an almost unnoticeable effect on the speed of the application as long as you only keep the filter set while the partial index is the controlling index, under these conditions you can never be more than a SKIP away from a good record. The only possible record in the index that does not meet the condition is the last record in the file, and since Clipper considers a database file in index order when it analyzes a filter, you can never be more than 1 logical record away. The reason we've done all of this is that we now have an index that contains only the records in the lineitem file that contain the product code "BASKET" that is indexed in ZIP code order on the ZIP code field from a related database. Some caveats that apply when using partial indexes are: 1> Be very careful with GOTOs, because it is easy to go to records that are not in the index. 2> Do not change anything that affects the key expression while the index is open. 3> Do not try to use a partial index any time other than immediately after its creation, or be aware that it may point to records that do not really belong in the set you think it points to. 4> A partial index require the use of a UDF and will bomb if you try to look at them in DBU unless you are very careful to never try to go past the ends of the file, or in some cases, the end of the current screen. Most of the problems that occur with partial indexes occur because Clipper considers that an index that does not contain an entry for every record is corrupt and gets highly confused while trying to update it or figure out where to go next from a record that is not contained in the index. Usually this looks like a "corrupt index" error message, and soon after, the DOS prompt. Another use for a partial index is to view a subset of a file in a DBEDIT window without having to deal with the problems of trying to limit access to non-requested areas of the file. Since Clipper is only aware of the records in the index when looking at a file with an open index, DBEDIT will only display the records in the partial index. The two methods shown above will work just fine for creating a partial index for this use, but if there is already an index on the whole file that allows access to the subset that we wish to look at,i.e. if I have an index on ZIP code and I want to only look at those people who have the ZIP code 90543, there is an even faster way to generate that partial index. The records I want can be easily located in the file, and the following piece of code will easily store them to an array so I can use them later. USE names INDEX zip SEEK "90543" COUNT TO numrecs WHILE zip = "90543" PRIVATE ary_of_recs[numrecs+1] SEEK "90543" FOR i = 1 TO numrecs ary_of_recs = RECNO() SKIP NEXT Now that I've got this array of record numbers I can create an index that not only includes only those records I want in the index, but I can create it by going only to the records I want in the index. First of all, I sort the array using ASORT so that all movement in the file is in the forward direction and so that if by chance the last record in the file is in the array, the creation of the index does not abort prematurely. Clipper knows that as soon as it reaches the last record, it's time to stop scanning the file for keys and time to start building the index from the keys it's found. In the following code I use the information in the array to build the index. The variable "isindexing" is there so that if the index is evaluated outside the INDEX ON, problems with the array or the absence or bogus value of i do not cause problems. I also store the value of LASTREC() into the array so that Clipper will know it's time to stop looking for keys and time to start building the index. PUBLIC isindexing ary_of_recs[numrecs+1] = LASTREC() ASORT(ary_of_recs) isindexing = .T. i = 1 INDEX ON partial3() to temp isindexing = .F. SET FILTER TO zip = "90543" FUNCTION partial3 IF isindexing GOTO ary_of_recs[i] i = i + 1 ENDIF RETURN names->zip This last method of creating partial indexes can be so fast, that using them becomes almost mandatory in those places where they are useful, things like limiting the scope in a DBEDIT() window. And don't forget that this method, getting all of the record numbers I want to put in the index before I start indexing, could be applied to the example function, partial2, though in that case the improvement might not be quite so dramatic due to the time spent looking in other files. And always remember that these ideas can be applied in places and conditions other than these.