Subject: ARMRSUB
Call: armrsub -ht -mappend -product -tsplit -zeroup specFile
ARMRSUB is used for updating data bases. It's primary use is for performing simple data substitution. One may also use ARMRSUB to perform a cartesian file product.
-ht | an optional argument denoting "halt type." |
-mappend | use mappend to append one data base to another. Mappend is a special case of the cartesian file product process. With mappend, no link fields are required. |
-product | an optional argument that invokes the cartesian file product operation. |
-tsplit | specifies that the non-matching source data base records are to be written out as SFN TSPLIT. |
-zeroup | specifies that all substitution field values are to be moved from the source data base to the target data base. When -zeroup is not specified, only non-zero field values are moved. |
specName | is the name of a file in which you have written processing specs for the current ARMRSUB run. |
The sample, below, shows how one might update a project data base with data from an employee data base. For each matching ID value, update the project data base with the name, phone, and mailstop data from the employee data base. The MS field in the employee data base is temporarily renamed to mailstop.
source= employee target= project alt= ms mailstop link= id sub= name phone mailstop end=
ARMRSUB recognizes the following statements
Statement | Description |
---|---|
source= sfn | sfn specifies the name of the source data base from which data will be copied. |
target= tfn+ | tfn specifies the name of the target data base that
will receive data copied from the source.
Use plus signs '+' to denote segmentation; e.g. if target base name project++ is specified, the output base names will be project01, project02, and so on. |
alt= f1 n1 | this statement is used for renaming source data base fields. F1 specifies the old field name, and n1 specifies the new field name. Any number of renaming pairs may be entered. |
link= f1 f2 ... | the specified items are the names of the link fields (target base) |
sub= f1 f2 ... | the specified items are the names of the fields whose data will be copied from the source base to the target base. If the sub= statement is ommitted, then by default all matching non-link field names are substituted. One may also code sub= * to specify all fields. |
end= | This statement marks the end of the current ARMRSUB cycle. You may append any number of additional ARMRSUB cycles. Each cycle must be marked with an end= statement. |
input source input target output target f1 f2 f3 f1 f2 f3 f1 f2 f3 -- -- -- -- -- -- -- -- -- A Q V1 ... A B X1 A B X1 B A V2 :.. A Q X2 A Q V1 B P V3 ..... B P X3 B P V3
ARMRSUB's substitution function lets you transfer data from a "source" data base to a "target" data base. In the above example we transfer data from the source's "f3" field to the target's "f3" field. The records associated with source and target are linked based on fields specified by you; e.g. in the example, "f1" and "f2" are the link fields. The substitution function will first identify all matching source-target records. For these, the data of "f3" is moved from source to target. Non matching records are left alone. Notice that the first and third source records match the second and third target records. Therefore, V1 and V3 are moved to the target. The first target record is not matched; therefore X1 remains in "f3".
A general description of the substitution function is as follows:
The substitute operation can be performed with any pair of data bases, provided that they have two or more fields names in common.
When you substitute data from base A to base B you will identify one or more link fields and one or more substitution fields. The source substitution fields may optionally be summarized.
Let the set of specified link fields be "L" and let the substitution fields be "S"; e.g. abstracting from our example above think of "f1" and "f2" as L and "f3" as S. The substitution function, hereafter referred to simply as "sub", proceeds as follows.
Our simple example did not show it, but in practice we have any number of records with identical link field values, either in the source, or in the target, or in both. For this reason we must generalize the sub procedure as follows:
Sub views source base A by L and S such that the canonical form is realized; i.e. sub access all unique combinations of L and S of A. Redundant records are ignored unless you specify summarization of S such that all records are combined!
Sub generates source view groups of the form:
A(L1) A(S1) A(S2) A(S3) . . A(L2) A(S1) A(S2) . .
The above symbolism implies that the source base (A) is organized hierarchically by unique link field values. For each unique link field value we form sub groups by unique substitution field values. When you specify that one or more of the substitution fields are to be summarized, we will summarize them at the second hierarchical level in order to realize ARMR's canonical form.
For example, let's assume that L consists of f1 and f2 and that S consists of a numeric field f3. Two possible source substitution views are as follows:
sub view summarized source (no summary) sub view f1 f2 f3 f1 f2 f3 f1 f2 f3 -- -- -- -- -- -- -- -- -- A Q V1 A Q V1 A Q V1+V2+V3 A Q V2 V2 A Q V3 V3
Please note; the summarized view leads to a straightforward substitution; i.e. whenever we match a target record with f1=A and f2=Q, we move the sum of V1+V2+V3 to the target's f3 field.
On the other hand, when a summary was not specified, we must substitute several source records into a matching target record. By default, an error occurs in ARMRSUB when the link field combinations are not unique.
To override this error condition, one uses the -product option. This function allows the creation of new target records. The proliferation of records resulting from this kind of a sub is performed according to the rules of the Cartesian product. If you recall your High School Algebra, you will find that this is very much like multiplying a pair of factored algebraic expressions. More about this later.
Sub views B by L alone. As was done with the source base we also form data groups with L at the top of the hierarchy. The sub groups are simply the target records associated with the given link field value combination. Think of the target groups as having the following general form:
B(L1) B(1) B(2) B(3) . . B(L2) B(1) B(2) . .
Above we mean to imply that the target base is grouped by unique link field combinations. The members of each link group are simply the data base records which are associated with the given link field values.
The example below illustrates how we organize the target view. Note: only the link fields are important here. All records of a particular link field will be updated whenever a match occurs.
target other fields link of target base f1 f2 f3 f4 f5 f6 -- -- -- -- -- -- A Q . . . . . . . . . . . . B A . . . . . . . . . . . .
Recall that we refer to the source base as "A" and the target base as "B".
Sub compares the link field values of A with the link field values of B. Substitution is performed for all matching groups. Non matching groups are ignored; i.e. source data not matched by target data is ignored. Target data not matched by source data remains unchanged.
Let A(L1) be equal to B(L1); then sub will form the Cartesian product of A(L1) with B(L1). That is to say, over and above the basic data transfer expected from a substitution function, sub also generates new records according to the following rules:
Each group member of A(L1) is combined with all group members of B(L1). For each combination we transfer data from A to B.
Let A(L1) contain N members and B(L1) contain M members; then the sub processes yields N*M output records.
For example, assume we have the following matching group pair:
source group target group ----------- ------------ A(L1) A(S1) B(L1) B(1) A(S2) B(2) A(S3)
Remember the product of two factored Algebraic expressions! Sub combines A(S1) with B(1) and B(2); A(S2) is combined with B(1) and B(2); finally we combine A(S3) with B(1) and B(2).
The resulting output target group can be represented as follows:
B(L1) B(1,S1) B(2,S1) B(1,S2) B(2,S2) B(1,S3) B(2,S3)
In practice one does not always utilize the full generality of the sub function. A typical substitution usually involves single member link groups only. In this case, the sub will not need to generate new records. The ability to generate new records via the Cartesian product is nevertheless needed for certain applications, as will be shown in our example.
Our employee data base contains personnel information. Let's assume that our employees work on a number of projects. These projects in turn are associated with a set of customers.
The employees' work records are tracked in a data base called "wrec"; the directory of wrec is as follows:
wrec 02 T01 ssn 11 01 T02 month 06 03 D date 08 02 T03 project 08 02 T04 remarks 40 04 R02 hours 08
The work record book is constructed on a monthly basis. In it, every employee records the number of hours spent by project code.
Employees work on various jobs or projects during the day and management has the requirement to keep track of the number of hours that each employee works on a given project. Hence each project is given a code so that the total number of hours for each project can be accumulated in order that the actual hours spent can be compared to the hours budgeted.
ssn: 327-12-3492 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/03/86 alpha Phone consultation 1.50 01/03/86 beta Prepare Resume for client 5.00 01/09/86 alpha Prepare Estimate 2.75 |
ssn: 390-32-2331 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/08/86 gamma Prepare legal documents 8.00 01/09/86 gamma Complete legal documents 2.00 01/10/86 gamma Prepare legal documents 8.00 01/11/86 alpha Start document 6.00 01/12/86 alpha Complete document 8.00 |
ssn: 729-49-9876 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/24/86 delta Hold seminar 8.00 01/24/86 delta Train new employees 7.00 01/27/86 beta Review 6.00 |
The project codes for our sample data base sorted in alpha sequence are "alpha", "beta", "delta", and "gamma".
Figure 1 shows the work record book by employee for the month of January.
The customer data are kept in another data base called "project". Its directory is as follows:
project 02 T01 project 08 02 T02 client 12 02 T06 address 20 02 T04 city 12 01 T05 state 06 03 Z zip 05
In this base we carry client data as a function of project code. The project book is shown in figure 2.
project: alpha address: 301 Tuffet Street state: Idaho client: Miss Muffet city: Boise zip: 80329 |
project: beta address: 1205 Carrot Lane state: Neb. client: P. Rabbit city: Platt zip: 70321 |
project: delta address: 1299 Corner Place state: Ohio client: Jack Horner city: Cleveland zip: 25029 |
project: gamma address: 71 Honey Drive state: Alaska client: Ted E. Bear city: Nome zip: 00231 |
It is simple and natural to maintain separate data bases, one of the employees' work records, another of the employees' personnel data, and a third containing client data. Nevertheless, when it comes time to preparing itemized customer billings, we must combine a pertinent subset of information from all three data bases; thus we will generate a fourth data base which we call a reporting base.
If we combine data from the work record base with data from the employee and the project bases we can construct a reporting base. The latter will show us the employees and their hours by client.
Let's name the reporting base to be generated simply "report". Its directory is as follows:
report directory employee wrec project ----------------- -------- ---- ------- report 01 T01 month 06 * * 03 B date 08 * 02 T02 project 08 * * 02 T03 client 12 * 02 T04 address 20 * 02 T05 city 12 * 01 T06 state 06 * 03 B zip 05 * 02 T07 ssn 11 * * 02 T08 last 14 * 02 T10 remarks 40 * 04 R hours 0802 * 04 R rate 0502 * 04 R bill 1002 01 T11 dep 1 *
Beside showing the directory of "report" base, we also show which of its fields are shared by the other data bases. In order to get report's data we must extract data from the other data bases as shown in the above chart; e.g. customer addresses are obtained from the project base, hours come from the wrec base, etc.
The desired end result of our sample exercise is shown figure 3. Our example shows how ARMRSUB can be used to generate the new data base called report from data bases wrec, employee and project.
month: 01_JAN client: Miss Muffet city: Boise project: alpha address: 301 Tuffet Street state: Idaho -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/03/86 WONDERLAND Phone consultation 1.50 30.00 01/09/86 WONDERLAND Prepare Estimate 2.75 30.00 01/11/86 BLOW Start document 6.00 20.50 01/12/86 BLOW Complete document 8.00 20.50 |
month: 01_JAN client: P. Rabbit city: Platt project: beta address: 1205 Carrot Lane state: Neb. -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/03/86 WONDERLAND Prepare Resume for client 5.00 30.00 01/27/86 HALL Review 6.00 32.50 |
month: 01_JAN client: Jack Horner city: Cleveland project: delta address: 1299 Corner Place state: Ohio -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/24/86 HALL Hold seminar 8.00 32.50 01/24/86 HALL Train new employees 7.00 32.50 |
month: 01_JAN client: Ted E. Bear city: Nome project: gamma address: 71 Honey Drive state: Alaska -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/08/86 BLOW Prepare legal documents 8.00 20.50 01/09/86 BLOW Complete legal documents 2.00 20.50 01/10/86 BLOW Prepare legal documents 8.00 20.50 |
Initially, the report base is empty. Since we want to build January's report we enter a single value into report base; i.e. using a data entry module, we set up a single record. In its month field, we enter the value "01_JAN". That is the only required input. The remaining data is extracted with ARMRSUB as follows:
1) fetch wrec's data for January
To do this we specify:
step1.sub source= wrec target= report link= month sub= date project ssn remarks hours end= armrsub -product step1.sub
This step moves all records whose month contains 01_JAN from wrec to report. The Cartesian product is applicable here. Keep in mind that the input report base contains a single record. The output, on the other hand will contain all of wrec's January records; i.e. "wrec" is the source base; "report" is the target base as shown in the following sketch:
source "wrec" target "report" month other data month other fields ------ -- --- --- ------ -- -- -- -- 01_JAN . . . 01_JAN . . . . . . 02_FEB . . . . . . . . .
It should be noted that, the Cartesian product does not come into play in the remaining steps.
2) fetch employee data
The employee data base can supply last name, rate, and department as a function of ssn and month. To get this data we specify the following parameters to the sub function:
step2.sub source= employee target= report link= month ssn sub= last rate dep end= armrsub step2.sub
3) fetch client data
Finally, we insert client information; the latter is obtained from the project base as a function of project code. To get this data we specify the following:
step3.sub source= project target= report link= project sub= client address city state zip end= armrsub step3.sub
Mappend is an ARMR acronym that stands for "map and append." When we append one data base to another, we must first map the index pointer values of tabular fields to fit in the resultant data base. With the -mappend option, Armrsub does this for us.
Suppose our employee data is seperated into two seperate files, one containing data for employees of the West region and another with data for East region employees. We might want to combine data from these two data bases into a single data base for reporting. Here's how.
Make a copy of the west data base. Let's call the copy report.bas.
copy west.bas report.bas
Create a specification file to copy the data from east.bas into report.bas. Note that no link fields are required.
append.sub source= east target= report end=
Append east.bas to report.bas
armrsub -mappend append.sub