armr

Rom List Processing

This paper describes the list processing techniques devised by Arnold Rom for the ARMR database engine.  List processing techniques rank among the most significant tools for speeding up program execution.  Lists allow us to organize data without physically moving it.  They give us the capability to generate dynamic data views.  Use of the basic list structure allows us to greatly simplify the programming of set logic.

The List Structure

Quite simply, the list structure consists of two integer arrays that represent row numbers in a dataset.  The elements of the first array contain starting row numbers.  In the second array, each element represents the next row in a set of rows.  A zero value signifies the end of a row set.

For example, in the list represented below, the first row is 3.  We move to the 3rd element of the Next array to find that the next row is 1.  We move to the 1st element of Next to find that the next row is 4, followed by 2, which is the last row in our set order.

element First Next
1 3 4
2   0
3   1
4   2

It's important to note that the list structure should always be separate from your data structure!  This separation of the list from the data structure allows a degree of flexibility that is unattainable with the classic textbook, linked list structure. 

Sorting

One of the primary uses of lists is to represent data sorts.  ARMR uses the Roots functions mrsort and list1 for creating sorted lists.  Here, we've sorted a sample dataset by the values of the Category and Item columns.  We can examine the list to find that the first row in the sort is 2, followed by 6, followed by 7, followed by 3, etc.  With the list, we didn't need to physically rearrange our data in order to sort it!

element Category Item First Next
1 Vegetable Tree 2 0
2 Animal Bird   6
3 Mineral Diamond   9
4 Vegetable Flower   5
5 Vegetable Grass   1
6 Animal Cat   7
7 Animal Dog   3
8 Mineral Ruby   4
9 Mineral Quartz   8

Grouping

Once our dataset is sorted, we can examine "adjacent" rows and group them based on column values.  A list structure may represent one or more mutually exclusive row sets.  Roots function mcolct is used to perform the grouping.  Starting with our original list, we've now grouped the rows by Category.

element Category Item First Next
1 Vegetable Tree 2 0
2 Animal Bird 3 6
3 Mineral Diamond 4 9
4 Vegetable Flower   5
5 Vegetable Grass   1
6 Animal Cat   7
7 Animal Dog   0
8 Mineral Ruby   0
9 Mineral Quartz   8

Selecting

Another common use of lists is for tracking row selection.  When selecting, we want to identify matching rows, and perhaps non-matching rows as well.

Roots function seqlst is used to prime a selection list.  After a call to seqlst, all the rows of our sample dataset are part of the matching list.

element Category Item First Next
1 Vegetable Tree 1 2
2 Animal Bird   3
3 Mineral Diamond   4
4 Vegetable Flower   5
5 Vegetable Grass   6
6 Animal Cat   7
7 Animal Dog   8
8 Mineral Ruby   9
9 Mineral Quartz   0

Now suppose we want to prune for rows with a category value of "Animal."  Prune is an ARMR idiom that means to select or filter data.  It's easy if you think in terms of gardening.  One prunes to either collect a bouquet of flowers, or to discard weeds.

We might call Roots function txtprn to select all the "Animal" rows.  After pruning, match points to the first matching row, while first points to the start of the leftovers.

element Category Item First Next Match
1 Vegetable Tree 1 3 2
2 Animal Bird   6  
3 Mineral Diamond   4  
4 Vegetable Flower   5  
5 Vegetable Grass   8  
6 Animal Cat   7  
7 Animal Dog   0  
8 Mineral Ruby   9  
9 Mineral Quartz   0  

Bit Vector representation of a list

List sets are easily represented in bit vectors.  A bit vector is an array of bits.  The left to right bit positions of the bit vector represent the row numbers of our dataset.

We use Roots function setbit to store the row numbers of a list in a bit vector.  Setbit sets bitVector[row] to "1" for all rows found in the input list.  Note that bitVector is not cleared first.

Here, we've started with a clear bit vector, and used setbit to store our match list.  Bit positions, 2, 6, and 7 are turned on.

bitVector

0 1 0 0 0 1 1 0 0

Deriving lists from a bit vector

Likewise, we can derive lists from bit vectors.  We start by priming a list with Roots function seqlst.  Then a call to Roots function setprn is used to split our input list.  Members whose bitVector[row] is "1" are moved to an output match list.  Non-matching members, the set complement, remain in the input list.

Set Manipulations

With our lists stored in bit vectors, we can easily perform set manipulations.  The Roots library contains functions to help us out with these tasks.

A union of two bit vectors can be performed with a call to Roots function lgor.  An intersection is performed with lgand.  Roots function lgexcl is used to determine the difference between two bit vectors. 

Input bitVectors
Set A 1 0 0 1 1 0 1 0
 
Set B 0 1 1 0 1 0 1 1

 lgor
 
Set B 1 1 1 1 1 0 1 1
 lgand
 
Set B 0 0 0 0 1 0 1 0
 lgexcl
 
Set B 0 1 1 0 0 0 0 1

Summary

List processing techniques are easy, yet quite versatile.  They can be adapted to suit many different requirements. 

List structures can be generated very cheaply, without compromising the integrity of our data files.  They allow us to rapidly sort, group, and select data.  Most importantly, they allow us to simplify our programming logic.  You'll find them a valuable addition to your programming toolbox.


armr Get ARMR at SourceForge.net. Fast, secure and Free Open Source software downloads