Wed Jun 17 1998

ARMR MODULE

Subject: ARMRTRAN

Call: armrtran -ht specFile

Purpose

Armrtran lets you define a variety of subsets or new views from an existing data base. You may use the new views or subsets as new data bases depending on the your needs.

The armrtran module has two basic functions. First, armrtran may be used to build a summary base, or to rearrange a data base's field sequence. Second, armrtran may be used to convert a linear data base structure to a spread-sheet like matrix structure.

Usage

armrtran -ht specFile

The call arguments are as follows:

-ht	is an option denoting "halt type"
specFile	is the file name of a file in which you have written processing specs for the current armrtran run.

Armrtran Macro File

Armrtran recognizes the following macro statements:

Statement
Description

source= The specified item is the name of the data base the to be read and transposed.
target= The specified item is the name of the output data base.
trans= The specified items are field names of the transfer and/or summary fields.
Note: append a plus to the field name in order to specify summarization. This is, of course, only valid for numeric fields!
You may enter any number of field names on a trans= line. If all of the names do not fit on a line, continue by entering another statement starting with "trans=".
pivot= This item specifies the name of the source pivot field. You may enter a single value only; moreover only one pivot field may be specified for a given transpose operation.
vector= The specified items are numeric vector fields. Any number of fields may be specified per line, and you may enter several vector= lines per transpose.
target_pivot= The specified item will be used as the name of the output pivot field; i.e. the current item overrides the default generic output name of "pivot".
target_total= The specified item will be used as the name of the output totals field; i.e. the current item overrides the default output name of "total".
end= This statement marks the end of the current transpose cycle. You may append any number of additional transpositions. End each cycle by entering an end= statement.

Statement	Description
source=	The specified item is the name of the data base the to be read and transposed.
target=	The specified item is the name of the output data base.
trans=	The specified items are field names of the transfer and/or summary fields. Note: append a plus to the field name in order to specify summarization. This is, of course, only valid for numeric fields! You may enter any number of field names on a trans= line. If all of the names do not fit on a line, continue by entering another statement starting with "trans=".
pivot=	This item specifies the name of the source pivot field. You may enter a single value only; moreover only one pivot field may be specified for a given transpose operation.
vector=	The specified items are numeric vector fields. Any number of fields may be specified per line, and you may enter several vector= lines per transpose.
target_pivot=	The specified item will be used as the name of the output pivot field; i.e. the current item overrides the default generic output name of "pivot".
target_total=	The specified item will be used as the name of the output totals field; i.e. the current item overrides the default output name of "total".
end=	This statement marks the end of the current transpose cycle. You may append any number of additional transpositions. End each cycle by entering an end= statement.

Summary Bases

In its simplest form, transposition allows you to build a summary base from an existing base. Consider the resulting summary base to be an accumulator which summarizes facts and details contained in the source data base. Suppose we wanted to build a base which contains a roster of employees; for each employee we also want to record his or her number of months of service. To do this you provide the following information:

       source= employee
       target= roster
       trans= ssn last first tally+

"source=" identifies the input data base containing the detail to be summarized.

"target=" identifies the name of the output base.

"trans=" specifies which fields are to be moved from source base to target base.

Armrtran will apply the canonization rules to the employee base. The output from this operation will be a brand new data base called "roster"; i.e. Armrtran will generate and write the three files:

       roster     - the tables file
       roster.bas - the base file
       roster.dir - the directory file

The generated directory file of the target base as follows:

       roster
       02 T01 ssn   11
       02 T02 last  14
       02 T03 first 12
       01 B   tally 03

To view the output data base as a single page columnar report we enter:

       arxrep roster base ssn last first tally

The resulting report is shown in figure 1.

             ssn          last         first     tally
         ----------- -------------- ------------ -----
         327-12-3492 WONDERLAND     ALICE           5
         390-32-2331 BLOW           JOE             5
         729-49-9876 HALL           ANNIE           5
         773-21-9321 DOE            JOHN            3

Fig. 1. Generated Roster Base

Note: You can, of course, generate the identical report directly from the employee base. To do so simply enter:

       arxrep employee base ssn last first tally+

Why then bother to build a summary base in the first place? Sometimes the solution of complicated problems can be simplified by introducing summary bases at an intermediate level. This is often the case when several data bases are combined to form a single output base.

The summary function can also be used without specifying a counter. Use it in this manner, when you want to delete one or more fields from a data base; e.g. to remove the tally field from the employee data base specify the following:

       source= employee
       target= employee
       trans= ssn last first month dep rate

Finally, use the summary function simply to rearrange the data base field sequence.

Vectorization

Vectorization is the most interesting transposition function. With the vector function you can convert your data base to an equivalent spread-sheet like data base. An introductory example will show you the essence of the vectorization function. Below we show an input data base whose fields are "f1" "f2" and "f3". The vectorization function transforms the input base to the equivalent output base as shown below:

In the following generic example "f1" was selected as the input pivot; "f2" and "f3" were selected as the input vector fields:

                                                                                                       
           input             output
         -----------      ----------------------                         
         f1  f2  f3       pivot   A  B  C   total
         --  --  --       -----   -- -- --  -----
         A   20  -1        f2     20 30 40    90
         B   30  -2        f3     -1 -2 -3    -6
         C   40  -3

Note: the output field name "pivot" is a default name produced by the trans operation. You may supply your own (less generic) name. The "total" field is an extra bonus produced automatically by the transpose vectorization function.

The transformation can be reversed simply by applying the vectorization function to the output structure as shown below.

                                                                                                       
          input                       output
        -------------------------   -------------                        
        pivot  A   B   C   total    pivot  f2  f3
        -----  --  --  --  -----    -----  --  --
         f2    20  30  40    90     A      20  -1
         f3    -1  -2  -3    -6     B      30  -2
                                    C      40  -3
                                    total  90  -6

Note: we would typically override the generic output "pivot" name with "f1". Note also that we chose to include the generated "total" field when we reversed the structure.

For a more concrete example of vectorization, look ahead and examine the vectorization example.

Now let's look at the rules behind the vectorization function.

The vectorization function requires two types of data fields, a tabular attribute field ("f1" in our example above), and any number of numeric counter fields ("f2" and "f3" above). We refer to the tabular data field as the (input) pivot; the numeric counter fields are called the (input) vector fields.

The values found in the input pivot field will become data field names in the output base; e.g. "f1" spawned the output field named "A", "B" and "C".

The field names of the input vector will appear as values in the output base; e.g. assume the input base contains three costs fields named "material", "labor", and "travel", respectively. Assume that the three cost fields are specified as the input vector. The output base will then contain a generated output pivot field; its content will be "material", "labor", and "travel". The numerical values of the input vector will be distributed such that the original meaning of your data base remains intact.

In general, vectorization converts the input pivot to an output vector; the input vector, on the other hand, is converted to an output pivot.

A Vectorization Example

                                                                                                       
      Employee Base                Matrix1 Base
     ----------------   ----------------------------------------------   
     month  dep tally   dep 01_JAN 02_FEB 03_MAR 04_APR 05_MAY total
     ------ --- -----   --- ------ ------ ------ ------ ------ ------
     01_JAN  x     2     x      2      2      1      1      1      7
     01_JAN  y     1     y      1      1      2      2      2      8
     02_FEB  x     2     z             1      1      1             3
     02_FEB  y     1
     02_FEB  z     1
     03_MAR  x     1
     03_MAR  y     2          Matrix2 Base                            
     03_MAR  z     1          ------------------------                
     04_APR  x     1          month   x   y   z  total
     04_APR  y     2          ------ --- --- --- -----
     04_APR  z     1          01_JAN   2   1        3
     05_MAY  x     1          02_FEB   2   1   1    4
     05_MAY  y     2          03_MAR   1   2   1    4
                              04_APR   1   2   1    4
                              05_MAY   1   2        3
                              total    7   8   3   18

Fig. 2. Employee Head Count And Transposed Matrix Bases

Suppose we have an employee data base. For the employee data base, let "month" be declared to be the input pivot; while "tally" is declared as a single element input vector. The resulting output vector, created from the input pivot, will consist of fields named 01_JAN, 02_FEB, 03_MAR, 04_MAY; the output pivot, created from the input vector, will be named "pivot" by default; its only value will be "tally". The latter is the field name of the single element input vector. The input tally values will be properly summarized and distributed such that the original expression of the data base is retained.

The vectorization function always adds two or three additional output vector fields.

The first of these has a default name of "total". The total field is made to contain the sum of the output vector values.

The remaining generated fields were not shown in the above example! The second field generated is always named "_NULL_"; it represents null input pivot values; e.g. the input employee base may contain records with null month field values.

The third additional output vector field will be present only if the input pivot's table contains a blank entry. The name of this field is always "_BLANK_".

Lets convert our employee data base to an equivalent matrix base. We specify the operation as follows:

       source= employee
       target= matrix1
       pivot= month
       vector= tally
       trans= dep
       end=

The resulting matrix1 data base is a standard ARMR data base. Its directory will be created as part of the transpose operation. The generated matrix1 directory is as follows:

       matrix1
       01 T05 dep     01
       02 T01 pivot   05
       01 B   _NULL_  03
       01 B   01_JAN  03
       01 B   02_FEB  03
       01 B   03_MAR  03
       01 B   04_APR  03
       01 B   05_MAY  03
       01 B   total   03

We can produce a report of matrix1 via the following specifications:

       arxrep matrix1 base dep 01_JAN 02_FEB 03_MAR 04_APR 05_MAY total

The resulting report is shown under Matrix1 Base in figure 2.

Pivot and vectors are the basic inputs required by the vectorization function. In addition to these, you may also specify any number of transfer fields; e.g. in our example we transferred "dep" when we created matrix1. Numeric transfer fields will be summarized if you supply a "+" suffix. The optional summaries are done with respect to the transfer fields so that ARMR's canonical form is realized.

Finally, lets vectorize matrix1 to build matrix2. This exercise illustrates the use of multiple input vector fields.

To build matrix2 from matrix1 specify the following:

       source= matrix1
       target= matrix2
       pivot= dep
       vector= 01_JAN 02_FEB    03_MAR
       vector= 04_APR 05_MAY    total
       target_pivot=    month
       end=

The resulting matrix2 directory is as follows:

       matrix2
       02 T01 month    06
       01 B   _NULL_   03
       01 B   x        03
       01 B   y        03
       01 B   z        03
       01 B   total    03

The data of matrix2 is displayed in figure 2 under Matrix2 Base.

Matrix2 contains row and column totals; while matrix1 contains only row totals. Row totals are automatically formed as part of the vectorization process. By declaring matrix1's "total" field as an input vector field, we created columnar totals for matrix2.

Notes

The input pivot table's width should ideally not exceed 12 characters. Keep in mind that the table entries will be converted to data base field names. The latter can be at most 12 characters wide.

Armrtran will truncate the generated field names if the input pivot table width exceeds 12. This in turn can lead to fatal field name redundancies; i.e. the output field names must be unique.