Thu Jan 7 1999

ARMR MODULE

Subject: ARMRSUB

Call: armrsub -ht -mappend -product -tsplit -zeroup specFile

Purpose

ARMRSUB is used for updating data bases. It's primary use is for performing simple data substitution. One may also use ARMRSUB to perform a cartesian file product.

Usage

armrsub -ht -mappend -product -tsplit -zeroup specName

-ht	an optional argument denoting "halt type."
-mappend	use mappend to append one data base to another. Mappend is a special case of the cartesian file product process. With mappend, no link fields are required.
-product	an optional argument that invokes the cartesian file product operation.
-tsplit	specifies that the non-matching source data base records are to be written out as SFN TSPLIT.
-zeroup	specifies that all substitution field values are to be moved from the source data base to the target data base. When -zeroup is not specified, only non-zero field values are moved.
specName	is the name of a file in which you have written processing specs for the current ARMRSUB run.

Specification File

The sample, below, shows how one might update a project data base with data from an employee data base. For each matching ID value, update the project data base with the name, phone, and mailstop data from the employee data base. The MS field in the employee data base is temporarily renamed to mailstop.

      source= employee
      target= project
      alt= ms mailstop
      link= id
      sub= name phone mailstop
      end=

ARMRSUB recognizes the following statements

Statement
Description

source= sfn sfn specifies the name of the source data base from which data will be copied.
target= tfn+ tfn specifies the name of the target data base that will receive data copied from the source.
Use plus signs '+' to denote segmentation; e.g. if target base name project++ is specified, the output base names will be project01, project02, and so on.
alt= f1 n1 this statement is used for renaming source data base fields. F1 specifies the old field name, and n1 specifies the new field name. Any number of renaming pairs may be entered.
link= f1 f2 ... the specified items are the names of the link fields (target base)
sub= f1 f2 ... the specified items are the names of the fields whose data will be copied from the source base to the target base. If the sub= statement is ommitted, then by default all matching non-link field names are substituted. One may also code sub= * to specify all fields.
end= This statement marks the end of the current ARMRSUB cycle. You may append any number of additional ARMRSUB cycles. Each cycle must be marked with an end= statement.

Statement	Description
source= sfn	sfn specifies the name of the source data base from which data will be copied.
target= tfn+	tfn specifies the name of the target data base that will receive data copied from the source. Use plus signs '+' to denote segmentation; e.g. if target base name project++ is specified, the output base names will be project01, project02, and so on.
alt= f1 n1	this statement is used for renaming source data base fields. F1 specifies the old field name, and n1 specifies the new field name. Any number of renaming pairs may be entered.
link= f1 f2 ...	the specified items are the names of the link fields (target base)
sub= f1 f2 ...	the specified items are the names of the fields whose data will be copied from the source base to the target base. If the sub= statement is ommitted, then by default all matching non-link field names are substituted. One may also code sub= * to specify all fields.
end=	This statement marks the end of the current ARMRSUB cycle. You may append any number of additional ARMRSUB cycles. Each cycle must be marked with an end= statement.

Data Base Substitution

One may use ARMRSUB to update an existing data base with data from a source data base. A simple illustration of the substitution operation is as follows:


      input source   input target   output target                          

        f1 f2 f3       f1 f2 f3       f1 f2 f3
        -- -- --       -- -- --       -- -- --
        A  Q  V1 ...   A  B  X1       A  B  X1
        B  A  V2   :.. A  Q  X2       A  Q  V1
        B  P  V3 ..... B  P  X3       B  P  V3

ARMRSUB's substitution function lets you transfer data from a "source" data base to a "target" data base. In the above example we transfer data from the source's "f3" field to the target's "f3" field. The records associated with source and target are linked based on fields specified by you; e.g. in the example, "f1" and "f2" are the link fields. The substitution function will first identify all matching source-target records. For these, the data of "f3" is moved from source to target. Non matching records are left alone. Notice that the first and third source records match the second and third target records. Therefore, V1 and V3 are moved to the target. The first target record is not matched; therefore X1 remains in "f3".

A general description of the substitution function is as follows:

The substitute operation can be performed with any pair of data bases, provided that they have two or more fields names in common.

When you substitute data from base A to base B you will identify one or more link fields and one or more substitution fields. The source substitution fields may optionally be summarized.

Let the set of specified link fields be "L" and let the substitution fields be "S"; e.g. abstracting from our example above think of "f1" and "f2" as L and "f3" as S. The substitution function, hereafter referred to simply as "sub", proceeds as follows.

Source Substitution View

Our simple example did not show it, but in practice we have any number of records with identical link field values, either in the source, or in the target, or in both. For this reason we must generalize the sub procedure as follows:

Sub views source base A by L and S such that the canonical form is realized; i.e. sub access all unique combinations of L and S of A. Redundant records are ignored unless you specify summarization of S such that all records are combined!

Sub generates source view groups of the form:

      A(L1) A(S1)
            A(S2)
            A(S3)
            .
            .

      A(L2) A(S1)
            A(S2)
            .
            .

The above symbolism implies that the source base (A) is organized hierarchically by unique link field values. For each unique link field value we form sub groups by unique substitution field values. When you specify that one or more of the substitution fields are to be summarized, we will summarize them at the second hierarchical level in order to realize ARMR's canonical form.

For example, let's assume that L consists of f1 and f2 and that S consists of a numeric field f3. Two possible source substitution views are as follows:


                  sub view    summarized
       source   (no summary)   sub view                           

      f1 f2 f3    f1 f2 f3     f1 f2 f3
      -- -- --    -- -- --     -- -- --
       A Q  V1     A Q  V1      A Q  V1+V2+V3
       A Q  V2          V2
       A Q  V3          V3

Please note; the summarized view leads to a straightforward substitution; i.e. whenever we match a target record with f1=A and f2=Q, we move the sum of V1+V2+V3 to the target's f3 field.

On the other hand, when a summary was not specified, we must substitute several source records into a matching target record. By default, an error occurs in ARMRSUB when the link field combinations are not unique.

To override this error condition, one uses the -product option. This function allows the creation of new target records. The proliferation of records resulting from this kind of a sub is performed according to the rules of the Cartesian product. If you recall your High School Algebra, you will find that this is very much like multiplying a pair of factored algebraic expressions. More about this later.

Target Substitution View

Sub views B by L alone. As was done with the source base we also form data groups with L at the top of the hierarchy. The sub groups are simply the target records associated with the given link field value combination. Think of the target groups as having the following general form:

Above we mean to imply that the target base is grouped by unique link field combinations. The members of each link group are simply the data base records which are associated with the given link field values.

The example below illustrates how we organize the target view. Note: only the link fields are important here. All records of a particular link field will be updated whenever a match occurs.


      target    other fields
      link      of target base
      
      f1  f2    f3  f4  f5  f6
      --  --    --  --  --  --
      A   Q     .   .   .   .
                .   .   .   .
                .   .   .   .

      B   A     .   .   .   .
                .   .   .   .
                .   .   .   .

Match Links And Substitute

Recall that we refer to the source base as "A" and the target base as "B".

Sub compares the link field values of A with the link field values of B. Substitution is performed for all matching groups. Non matching groups are ignored; i.e. source data not matched by target data is ignored. Target data not matched by source data remains unchanged.

The Cartesian Product

Let A(L1) be equal to B(L1); then sub will form the Cartesian product of A(L1) with B(L1). That is to say, over and above the basic data transfer expected from a substitution function, sub also generates new records according to the following rules:

Each group member of A(L1) is combined with all group members of B(L1). For each combination we transfer data from A to B.

Let A(L1) contain N members and B(L1) contain M members; then the sub processes yields N*M output records.

For example, assume we have the following matching group pair:

      source group     target group
      -----------      ------------
      A(L1) A(S1)       B(L1) B(1)
            A(S2)             B(2)
            A(S3)

Remember the product of two factored Algebraic expressions! Sub combines A(S1) with B(1) and B(2); A(S2) is combined with B(1) and B(2); finally we combine A(S3) with B(1) and B(2).

The resulting output target group can be represented as follows:

      B(L1) B(1,S1)
            B(2,S1)
            B(1,S2)
            B(2,S2)
            B(1,S3)
            B(2,S3)

In practice one does not always utilize the full generality of the sub function. A typical substitution usually involves single member link groups only. In this case, the sub will not need to generate new records. The ability to generate new records via the Cartesian product is nevertheless needed for certain applications, as will be shown in our example.

An Example

Our employee data base contains personnel information. Let's assume that our employees work on a number of projects. These projects in turn are associated with a set of customers.

The employees' work records are tracked in a data base called "wrec"; the directory of wrec is as follows:

      wrec
      02 T01 ssn        11
      01 T02 month      06
      03 D   date       08
      02 T03 project    08
      02 T04 remarks    40
      04 R02 hours      08

The work record book is constructed on a monthly basis. In it, every employee records the number of hours spent by project code.

Employees work on various jobs or projects during the day and management has the requirement to keep track of the number of hours that each employee works on a given project. Hence each project is given a code so that the total number of hours for each project can be accumulated in order that the actual hours spent can be compared to the hours budgeted.

ssn: 327-12-3492 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/03/86 alpha Phone consultation 1.50 01/03/86 beta Prepare Resume for client 5.00 01/09/86 alpha Prepare Estimate 2.75

ssn: 390-32-2331 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/08/86 gamma Prepare legal documents 8.00 01/09/86 gamma Complete legal documents 2.00 01/10/86 gamma Prepare legal documents 8.00 01/11/86 alpha Start document 6.00 01/12/86 alpha Complete document 8.00

ssn: 729-49-9876 month: 01_JAN ------------------------------------------------------------------- date project remarks hours -------- -------- ---------------------------------------- -------- 01/24/86 delta Hold seminar 8.00 01/24/86 delta Train new employees 7.00 01/27/86 beta Review 6.00
Fig. 1. Work Records

The project codes for our sample data base sorted in alpha sequence are "alpha", "beta", "delta", and "gamma".

Figure 1 shows the work record book by employee for the month of January.

The customer data are kept in another data base called "project". Its directory is as follows:

      project
      02 T01 project    08
      02 T02 client     12
      02 T06 address    20
      02 T04 city       12
      01 T05 state      06
      03 Z   zip        05

In this base we carry client data as a function of project code. The project book is shown in figure 2.

project: alpha address: 301 Tuffet Street state: Idaho client: Miss Muffet city: Boise zip: 80329

project: beta address: 1205 Carrot Lane state: Neb. client: P. Rabbit city: Platt zip: 70321

project: delta address: 1299 Corner Place state: Ohio client: Jack Horner city: Cleveland zip: 25029

project: gamma address: 71 Honey Drive state: Alaska client: Ted E. Bear city: Nome zip: 00231
Fig. 2. Client Data Arranged By Project Code.

It is simple and natural to maintain separate data bases, one of the employees' work records, another of the employees' personnel data, and a third containing client data. Nevertheless, when it comes time to preparing itemized customer billings, we must combine a pertinent subset of information from all three data bases; thus we will generate a fourth data base which we call a reporting base.

If we combine data from the work record base with data from the employee and the project bases we can construct a reporting base. The latter will show us the employees and their hours by client.

Let's name the reporting base to be generated simply "report". Its directory is as follows:

      report directory  employee  wrec project
      ----------------- --------  ---- -------
      report
      01 T01 month   06    *       *
      03 B   date    08            *
      02 T02 project 08            *      *
      02 T03 client  12                   *
      02 T04 address 20                   *
      02 T05 city    12                   *
      01 T06 state   06                   *
      03 B   zip     05                   *
      02 T07 ssn     11    *       *
      02 T08 last    14    *
      02 T10 remarks 40            *
      04 R   hours   0802          *
      04 R   rate    0502  *
      04 R   bill    1002
      01 T11 dep     1     *

Beside showing the directory of "report" base, we also show which of its fields are shared by the other data bases. In order to get report's data we must extract data from the other data bases as shown in the above chart; e.g. customer addresses are obtained from the project base, hours come from the wrec base, etc.

The desired end result of our sample exercise is shown figure 3. Our example shows how ARMRSUB can be used to generate the new data base called report from data bases wrec, employee and project.

month: 01_JAN client: Miss Muffet city: Boise project: alpha address: 301 Tuffet Street state: Idaho -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/03/86 WONDERLAND Phone consultation 1.50 30.00 01/09/86 WONDERLAND Prepare Estimate 2.75 30.00 01/11/86 BLOW Start document 6.00 20.50 01/12/86 BLOW Complete document 8.00 20.50

month: 01_JAN client: P. Rabbit city: Platt project: beta address: 1205 Carrot Lane state: Neb. -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/03/86 WONDERLAND Prepare Resume for client 5.00 30.00 01/27/86 HALL Review 6.00 32.50

month: 01_JAN client: Jack Horner city: Cleveland project: delta address: 1299 Corner Place state: Ohio -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/24/86 HALL Hold seminar 8.00 32.50 01/24/86 HALL Train new employees 7.00 32.50

month: 01_JAN client: Ted E. Bear city: Nome project: gamma address: 71 Honey Drive state: Alaska -------------------------------------------------------------------- date last remarks hours rate -------- -------------- ----------------------------- -------- ----- 01/08/86 BLOW Prepare legal documents 8.00 20.50 01/09/86 BLOW Complete legal documents 2.00 20.50 01/10/86 BLOW Prepare legal documents 8.00 20.50
Fig. 3. Client Billing Book

Initially, the report base is empty. Since we want to build January's report we enter a single value into report base; i.e. using a data entry module, we set up a single record. In its month field, we enter the value "01_JAN". That is the only required input. The remaining data is extracted with ARMRSUB as follows:

1) fetch wrec's data for January

To do this we specify:

      step1.sub
      source= wrec
      target= report
      link= month
      sub= date project ssn remarks hours
      end=

      armrsub -product step1.sub

This step moves all records whose month contains 01_JAN from wrec to report. The Cartesian product is applicable here. Keep in mind that the input report base contains a single record. The output, on the other hand will contain all of wrec's January records; i.e. "wrec" is the source base; "report" is the target base as shown in the following sketch:


      source "wrec"       target "report"
      
      month  other data   month  other fields
      ------ -- --- ---   ------ -- -- -- --
      01_JAN  .  .   .    01_JAN
              .  .   .
              .  .   .
      02_FEB  .  .   .
              .  .   .
              .  .   .

It should be noted that, the Cartesian product does not come into play in the remaining steps.

2) fetch employee data

The employee data base can supply last name, rate, and department as a function of ssn and month. To get this data we specify the following parameters to the sub function:

      step2.sub
      source= employee
      target= report
      link= month ssn
      sub= last rate dep
      end=

      armrsub step2.sub

3) fetch client data

Finally, we insert client information; the latter is obtained from the project base as a function of project code. To get this data we specify the following:

      step3.sub
      source= project
      target= report
      link= project
      sub= client address city state zip
      end=

      armrsub step3.sub

A Mappend Example

Mappend is an ARMR acronym that stands for "map and append." When we append one data base to another, we must first map the index pointer values of tabular fields to fit in the resultant data base. With the -mappend option, Armrsub does this for us.

Suppose our employee data is seperated into two seperate files, one containing data for employees of the West region and another with data for East region employees. We might want to combine data from these two data bases into a single data base for reporting. Here's how.

Make a copy of the west data base. Let's call the copy report.bas.

      copy west.bas report.bas

Create a specification file to copy the data from east.bas into report.bas. Note that no link fields are required.

      append.sub
      source= east
      target= report
      end=

Append east.bas to report.bas

      armrsub -mappend append.sub