PARTIAL INDEXING

          Ira Emus
          Sep. 19, 1988
          BIX irae
          CIS 76702,672

          By now I assume you've all heard of partial indexing, the
          process of creating an index that contains pointers to only
          a subset of the records in the database. If not, you might
          ask why you would want an index that only contains a subset
          of the records and as a not-so-pleasant side-effect will
          boot you to DOS if you do not treat it with kid gloves.
          There are two answers; Speed; the time needed to create a
          partial index may be as little as 2% of the time needed to
          create a full index, and Speed; a database with an active
          partial index only looks as big as a database which contains
          the number of records included in the index, usually a small
          subset of the whole file.  With the creation time reduced in
          this fashion, it becomes quite feasible to create an index
          for ad-hoc queries and reports. The simplest partial
          indexing routine looks something like this:

          INDEX ON partial1() TO temp

          FUNCTION partial1
                DO WHILE ! <condition> .AND. RECNO() < LASTREC()
                        GOTO RECNO()+1
                ENDDO
                RETURN <key>

          This is faster if you only extract a small subset of the
          file since Clipper only has to create an index on the number
          of returned records. This may cut indexing time up to 50%,
          depending upon the size of the file and percentage of
          records placed into the index. The GOTO recno(), instead of
          SKIP, is necessary since during the creation of an index, it
          is unclear what order the file is really in, is it in record
          number order, the order of the last controlling index, or
          the order of the index that is corrently being built? Since
          the UDF is executeing in the normal Clipper environment
          which is possibly at a different state than the internal
          indexing routine, it is important to realize that things may
          not be exactly as they seem.

          An example that shows a much larger time saving is creating
          an index with a relation set. I will use as an example an
          order entry system with three files, a name file that
          contains buyer and receiver information; an order header
          file that contains order info and an entry that points to
          the buyer in the name file; and a line item file that
          contains the item that was ordered and an entry that points
          to the receiver in the name file. It is not too far-fetched
          to assume that I might want to generate a report, ordered on
          the buyer's name or ZIP code, based on the people who
          purchased a particular item. To accomplish this feat in the
          past might have required something like this:

          SELE 0
          USE names INDEX custcode

          SELE 0
          USE ordrhedr INDEX sender
          SET RELATION TO sender INTO names

          SELE 0
          USE lineitem INDEX orderno
          SET RELATION TO orderno INTO ordrhedr

          INDEX ON names->zip TO temp

          This works great, but it's slow, because for every record in
          the lineitem database, 2 SEEKs were performed, one for each
          relation. If you only need the index for a report and only
          want a subset of the records, a partial index can assure
          that the 2 SEEKs are only performed for the records that are
          to be included in the report. If the subset to be extracted
          is a small percentage of the database, you may see
          reductions in indexing time of up to 90%
          This is a sample of the code needed to do this type of
          index:

          SELE 0
          USE names INDEX custcode

          SELE 0
          USE ordrhedr INDEX sender

          SELE 0
          USE lineitem

          INDEX ON partial2() TO temp


          FUNCTION partial2
                DO WHILE (lineitem->RECNO()) < lineitem->(LASTREC())

                        SELE lineitem
                        DO WHILE product # "BASKET";
                        .AND. RECNO() < LASTREC()
                                GOTO RECNO() + 1
                        ENDDO

                        SELE ordrhedr
                        SEEK lineitem->orderno

                        IF FOUND()
                                SELE names
                                SEEK ordrhedr->sender

                                IF FOUND()
                                        EXIT
                                ENDIF

                        ENDIF

                        SELE lineitem
                        GOTO RECNO() + 1

                ENDDO

                SELE lineitem
                RETURN names->zip

          A problem that manifests itself while using partial indexes
          is that the last record is always included in the index, a
          key must be returned for the last record so that Clipper
          knows that it's time to stop creating keys and start
          generating the index, and quite that last record is not a
          part of the selected set. One solution is to set a filter on
          the selection criteria you used to create the index; in the
          last example it would be something like this:

          SET FILTER TO (lineitem->product) = "BASKET"

          This will have an almost unnoticeable effect on the speed of
          the application as long as you only keep the filter set
          while the partial index is the controlling index, under
          these conditions you can never be more than a SKIP away from
          a good record. The only possible record in the index that
          does not meet the condition is the last record in the file,
          and since Clipper considers a database file in index order
          when it analyzes a filter, you can never be more than 1
          logical record away. The reason we've done all of this is
          that we now have an index that contains only the records in
          the lineitem file that contain the product code "BASKET"
          that is indexed in ZIP code order on the ZIP code field from
          a related database.

          Some caveats that apply when using partial indexes are:

          1> Be very careful with GOTOs, because it is easy to
             go to records that are not in the index.

          2> Do not change anything that affects the key expression
             while the index is open.

          3> Do not try to use a partial index any time other than
             immediately after its creation, or be aware that it may
             point to records that do not really belong in the set you
             think it points to.

          4> A partial index require the use of a UDF and will bomb if
             you try to look at them in DBU unless you are very
             careful to never try to go past the ends of the file, or
             in some cases, the end of the current screen.

          Most of the problems that occur with partial indexes occur
          because Clipper considers that an index that does not
          contain an entry for every record is corrupt and gets highly
          confused while trying to update it or figure out where to go
          next from a record that is not contained in the index.
          Usually this looks like a "corrupt index" error message, and
          soon after, the DOS prompt.

          Another use for a partial index is to view a subset of a
          file in a DBEDIT window without having to deal with the
          problems of trying to limit access to non-requested areas of
          the file. Since Clipper is only aware of the records in the
          index when looking at a file with an open index, DBEDIT will
          only display the records in the partial index. The two
          methods shown above will work just fine for creating a
          partial index for this use, but if there is already an index
          on the whole file that allows access to the subset that we
          wish to look at,i.e. if I have an index on ZIP code and I
          want to only look at those people who have the ZIP code
          90543, there is an even faster way to generate that partial
          index.  The records I want can be easily located in the
          file, and the following piece of code will easily store them
          to an array so I can use them later.

          USE names INDEX zip
          SEEK "90543"

          COUNT TO numrecs WHILE zip = "90543"

          PRIVATE ary_of_recs[numrecs+1]

          SEEK "90543"

          FOR i = 1 TO numrecs
                ary_of_recs = RECNO()
                SKIP
          NEXT

          Now that I've got this array of record numbers I can create
          an index that not only includes only those records I want in
          the index, but I can create it by going only to the records
          I want in the index. First of all, I sort the array using
          ASORT so that all movement in the file is in the forward
          direction and so that if by chance the last record in the
          file is in the array, the creation of the index does not
          abort prematurely. Clipper knows that as soon as it reaches
          the last record, it's time to stop scanning the file for
          keys and time to start building the index from the keys it's
          found. In the following code I use the information in the
          array to build the index. The variable "isindexing" is there
          so that if the index is evaluated outside the INDEX ON,
          problems with the array or the absence or bogus value of i
          do not cause problems. I also store the value of LASTREC()
          into the array so that Clipper will know it's time to stop
          looking for keys and time to start building the index.

          PUBLIC isindexing
          ary_of_recs[numrecs+1] = LASTREC()

          ASORT(ary_of_recs)

          isindexing = .T.
          i = 1
          INDEX ON partial3() to temp
          isindexing = .F.

          SET FILTER TO zip = "90543"


          FUNCTION partial3

                IF isindexing
                        GOTO ary_of_recs[i]
                        i = i + 1
                ENDIF
                RETURN names->zip

          This last method of creating partial indexes can be so fast,
          that using them becomes almost mandatory in those places
          where they are useful, things like limiting the scope in a
          DBEDIT() window. And don't forget that this method, getting
          all of the record numbers I want to put in the index before
          I start indexing, could be applied to the example function,
          partial2, though in that case the improvement might not be
          quite so dramatic due to the time spent looking in other
          files. And always remember that these ideas can be applied
          in places and conditions other than these.