Subject: ARMRTRAN
Call: armrtran -ht specFile
Armrtran lets you define a variety of subsets or new views from an existing data base. You may use the new views or subsets as new data bases depending on the your needs.
The armrtran module has two basic functions. First, armrtran may be used to build a summary base, or to rearrange a data base's field sequence. Second, armrtran may be used to convert a linear data base structure to a spread-sheet like matrix structure.
The call arguments are as follows:
-ht | is an option denoting "halt type" |
specFile | is the file name of a file in which you have written processing specs for the current armrtran run. |
Armrtran recognizes the following macro statements:
Statement | Description |
---|---|
source= | The specified item is the name of the data base the to be read and transposed. |
target= | The specified item is the name of the output data base. |
trans= | The specified items are field names of the transfer
and/or summary fields.
Note: append a plus to the field name in order to specify summarization. This is, of course, only valid for numeric fields! You may enter any number of field names on a trans= line. If all of the names do not fit on a line, continue by entering another statement starting with "trans=". |
pivot= | This item specifies the name of the source pivot field. You may enter a single value only; moreover only one pivot field may be specified for a given transpose operation. |
vector= | The specified items are numeric vector fields. Any number of fields may be specified per line, and you may enter several vector= lines per transpose. |
target_pivot= | The specified item will be used as the name of the output pivot field; i.e. the current item overrides the default generic output name of "pivot". |
target_total= | The specified item will be used as the name of the output totals field; i.e. the current item overrides the default output name of "total". |
end= | This statement marks the end of the current transpose cycle. You may append any number of additional transpositions. End each cycle by entering an end= statement. |
In its simplest form, transposition allows you to build a summary base from an existing base. Consider the resulting summary base to be an accumulator which summarizes facts and details contained in the source data base. Suppose we wanted to build a base which contains a roster of employees; for each employee we also want to record his or her number of months of service. To do this you provide the following information:
source= employee target= roster trans= ssn last first tally+
"source=" identifies the input data base containing the detail to be summarized.
"target=" identifies the name of the output base.
"trans=" specifies which fields are to be moved from source base to target base.
Armrtran will apply the canonization rules to the employee base. The output from this operation will be a brand new data base called "roster"; i.e. Armrtran will generate and write the three files:
roster - the tables file roster.bas - the base file roster.dir - the directory file
The generated directory file of the target base as follows:
roster 02 T01 ssn 11 02 T02 last 14 02 T03 first 12 01 B tally 03
To view the output data base as a single page columnar report we enter:
arxrep roster base ssn last first tally
The resulting report is shown in figure 1.
ssn last first tally ----------- -------------- ------------ ----- 327-12-3492 WONDERLAND ALICE 5 390-32-2331 BLOW JOE 5 729-49-9876 HALL ANNIE 5 773-21-9321 DOE JOHN 3Fig. 1. Generated Roster Base
Note: You can, of course, generate the identical report directly from the employee base. To do so simply enter:
arxrep employee base ssn last first tally+
Why then bother to build a summary base in the first place? Sometimes the solution of complicated problems can be simplified by introducing summary bases at an intermediate level. This is often the case when several data bases are combined to form a single output base.
The summary function can also be used without specifying a counter. Use it in this manner, when you want to delete one or more fields from a data base; e.g. to remove the tally field from the employee data base specify the following:
source= employee target= employee trans= ssn last first month dep rate
Finally, use the summary function simply to rearrange the data base field sequence.
Vectorization is the most interesting transposition function. With the vector function you can convert your data base to an equivalent spread-sheet like data base. An introductory example will show you the essence of the vectorization function. Below we show an input data base whose fields are "f1" "f2" and "f3". The vectorization function transforms the input base to the equivalent output base as shown below:
In the following generic example "f1" was selected as the input pivot; "f2" and "f3" were selected as the input vector fields:
input output ----------- ---------------------- f1 f2 f3 pivot A B C total -- -- -- ----- -- -- -- ----- A 20 -1 f2 20 30 40 90 B 30 -2 f3 -1 -2 -3 -6 C 40 -3
Note: the output field name "pivot" is a default name produced by the trans operation. You may supply your own (less generic) name. The "total" field is an extra bonus produced automatically by the transpose vectorization function.
The transformation can be reversed simply by applying the vectorization function to the output structure as shown below.
input output ------------------------- ------------- pivot A B C total pivot f2 f3 ----- -- -- -- ----- ----- -- -- f2 20 30 40 90 A 20 -1 f3 -1 -2 -3 -6 B 30 -2 C 40 -3 total 90 -6
Note: we would typically override the generic output "pivot" name with "f1". Note also that we chose to include the generated "total" field when we reversed the structure.
For a more concrete example of vectorization, look ahead and examine the vectorization example.
Now let's look at the rules behind the vectorization function.
The vectorization function requires two types of data fields, a tabular attribute field ("f1" in our example above), and any number of numeric counter fields ("f2" and "f3" above). We refer to the tabular data field as the (input) pivot; the numeric counter fields are called the (input) vector fields.
The values found in the input pivot field will become data field names in the output base; e.g. "f1" spawned the output field named "A", "B" and "C".
The field names of the input vector will appear as values in the output base; e.g. assume the input base contains three costs fields named "material", "labor", and "travel", respectively. Assume that the three cost fields are specified as the input vector. The output base will then contain a generated output pivot field; its content will be "material", "labor", and "travel". The numerical values of the input vector will be distributed such that the original meaning of your data base remains intact.
In general, vectorization converts the input pivot to an output vector; the input vector, on the other hand, is converted to an output pivot.
Employee Base Matrix1 Base ---------------- ---------------------------------------------- month dep tally dep 01_JAN 02_FEB 03_MAR 04_APR 05_MAY total ------ --- ----- --- ------ ------ ------ ------ ------ ------ 01_JAN x 2 x 2 2 1 1 1 7 01_JAN y 1 y 1 1 2 2 2 8 02_FEB x 2 z 1 1 1 3 02_FEB y 1 02_FEB z 1 03_MAR x 1 03_MAR y 2 Matrix2 Base 03_MAR z 1 ------------------------ 04_APR x 1 month x y z total 04_APR y 2 ------ --- --- --- ----- 04_APR z 1 01_JAN 2 1 3 05_MAY x 1 02_FEB 2 1 1 4 05_MAY y 2 03_MAR 1 2 1 4 04_APR 1 2 1 4 05_MAY 1 2 3 total 7 8 3 18Fig. 2. Employee Head Count And Transposed Matrix Bases
Suppose we have an employee data base. For the employee data base, let "month" be declared to be the input pivot; while "tally" is declared as a single element input vector. The resulting output vector, created from the input pivot, will consist of fields named 01_JAN, 02_FEB, 03_MAR, 04_MAY; the output pivot, created from the input vector, will be named "pivot" by default; its only value will be "tally". The latter is the field name of the single element input vector. The input tally values will be properly summarized and distributed such that the original expression of the data base is retained.
The vectorization function always adds two or three additional output vector fields.
The first of these has a default name of "total". The total field is made to contain the sum of the output vector values.
The remaining generated fields were not shown in the above example! The second field generated is always named "_NULL_"; it represents null input pivot values; e.g. the input employee base may contain records with null month field values.
The third additional output vector field will be present only if the input pivot's table contains a blank entry. The name of this field is always "_BLANK_".
Lets convert our employee data base to an equivalent matrix base. We specify the operation as follows:
source= employee target= matrix1 pivot= month vector= tally trans= dep end=
The resulting matrix1 data base is a standard ARMR data base. Its directory will be created as part of the transpose operation. The generated matrix1 directory is as follows:
matrix1 01 T05 dep 01 02 T01 pivot 05 01 B _NULL_ 03 01 B 01_JAN 03 01 B 02_FEB 03 01 B 03_MAR 03 01 B 04_APR 03 01 B 05_MAY 03 01 B total 03
We can produce a report of matrix1 via the following specifications:
arxrep matrix1 base dep 01_JAN 02_FEB 03_MAR 04_APR 05_MAY total
The resulting report is shown under Matrix1 Base in figure 2.
Pivot and vectors are the basic inputs required by the vectorization function. In addition to these, you may also specify any number of transfer fields; e.g. in our example we transferred "dep" when we created matrix1. Numeric transfer fields will be summarized if you supply a "+" suffix. The optional summaries are done with respect to the transfer fields so that ARMR's canonical form is realized.
Finally, lets vectorize matrix1 to build matrix2. This exercise illustrates the use of multiple input vector fields.
To build matrix2 from matrix1 specify the following:
source= matrix1 target= matrix2 pivot= dep vector= 01_JAN 02_FEB 03_MAR vector= 04_APR 05_MAY total target_pivot= month end=
The resulting matrix2 directory is as follows:
matrix2 02 T01 month 06 01 B _NULL_ 03 01 B x 03 01 B y 03 01 B z 03 01 B total 03
The data of matrix2 is displayed in figure 2 under Matrix2 Base.
Matrix2 contains row and column totals; while matrix1 contains only row totals. Row totals are automatically formed as part of the vectorization process. By declaring matrix1's "total" field as an input vector field, we created columnar totals for matrix2.
The input pivot table's width should ideally not exceed 12 characters. Keep in mind that the table entries will be converted to data base field names. The latter can be at most 12 characters wide.
Armrtran will truncate the generated field names if the input pivot table width exceeds 12. This in turn can lead to fatal field name redundancies; i.e. the output field names must be unique.