Tue Jan 5 1999

ARMR MODULE

Subject: ARMRLOAD

Call: armrload -ht -append -nocan specName

Purpose

Armrload is a general ARMR data base builder. It accepts input from a file or from standard input. The output is either a single data base or a segmented data base. One may also use Armrload to append new data to an existing data base.

Armrload is typically used to interface ARMR with other systems. One uses Armrload to load data exported from another system.

Usage

armrload -ht -append -nocan specFile
-ht	an optional argument denoting "halt type."
-append	an optional argument that directs Armrload to append to the target data base. Armrload's default behavior is to replace the target data base.
-nocan	By default, duplicate records are removed. When -nocan is specified, Armrload will retain the redundant records.
specName	is the name of a file in which you have written processing specs for the current Armrload run.

Files For Building Data Bases

Armrload Specification File

The specification file defines how the load process should proceed. Here's a sample specification file that we'll use later:

      find.in
      --------
      source= find.txt
      target= find
      org_def= find.org
      end=

Armrload recognizes the following macro statements:

Statement	Description

source= fn	fn is the name of the source data file, or "stdin" to read from standard input.
target= fn+	fn+ represents the name of the data base to be created or appended to. Use plus signs '+' to denote segmentation; e.g. if base name find++ is specified, the output base names will be find01.bas, find02.bas, and so on. Default ARMR naming conventions apply here; i.e. the data base is assumed to have a file extension of "bas" (file type of "BASE" on VM systems); the associated directory's file extension is assumed to be "dir" (file type of "DIRECT" on VM).
org_def= fn	The name of a source field definitions file. If an org file is specified, then it replaces any loading information specified in the data base directory.
counter_fields= f1 f2 ...	A list of fields that are summed before output
segment_size= n	Use this option to control the size of the output segments. By default, Armrload will attempt to make the output segments as large as possible. If this is not desired, use the segment_size= option.
end=	This statement marks the end of a particular load specification. You may load any number of data bases; i.e. a given macro file may have any number of end= statements, one for each load.
You may also include any number of comment statements; these start with a ';' (semi colon) in column 1.
Note: A full directory path may be specified for any of the a file names.

The Input File

The data to be imported via Armrload is assumed to be arranged in fixed columns. Therefore, no delimiters are required between data fields.

Below is the input file that will be used in our sample Armrload exercise. (Some of the spaces were removed)

      $ find * -ls >find.txt
      $ cat find.txt
      512452  1 drwx------  2 lake12 og3860  512 Mar  5  1998 Mail
      747814  2 -rw-rw-rw-  1 lake12 og3860 1318 Jun  6  1995 client.c
      747820  3 -rw-rw-rw-  1 lake12 og3860 2240 Jan 19  1995 config.c
      747819  1 -rw-r-----  1 lake12 og3860  416 Jul 20  1995 cppcmt.l
      747829  1 -rw-rw-rw-  1 lake12 og3860  988 Aug 15  1995 dkpsbold.l
      747815  1 -rw-r-----  1 lake12 og3860  358 Jan  4  1995 f2carx.l
      740053  0 -rw-r-----  1 lake12 og3860    0 Sep 29 16:33 find.txt
      747834  1 -rw-r-----  1 lake12 og3860  110 Dec  6  1995 join.l
      747827  7 -rw-r-----  1 lake12 og3860 6610 Aug 18  1997 lex.yy.c
      666496  1 drwxrwxr-x  2 lake12 og3860  512 Jul  1 16:43 mem
      666609  1 -rw-rw-r--  1 lake12 og3860  860 Jul  1 16:42 mem/calloc.c
      666610  5 -rw-rw-r--  1 lake12 og3860 4473 Jul  1 16:42 mem/free.c
      666611  8 -rw-rw-r--  1 lake12 og3860 7359 Jul  1 16:42 mem/malloc.c
      666652  1 -rw-rw-r--  1 lake12 og3860  886 Jul  1 16:42 mem/morecore.c
      666654  4 -rw-rw-r--  1 lake12 og3860 3483 Jul  1 16:42 mem/realloc.c
      666655  1 -rw-rw-r--  1 lake12 og3860  963 Jul  1 16:42 mem/valloc.c
      666657  3 -rw-rw-r--  1 lake12 og3860 3068 Jul  1 16:42 mem/malloc.h
      666658  0 -rw-r-----  1 lake12 og3860    0 Jul  1 16:43 mem/temp.c
      747817  1 -rw-r-----  1 lake12 og3860   81 Mar 19  1996 mytest.c
      747812  1 -rw-r--r--  1 lake12 og3860  260 Sep  7  1994 star.l
      747828  1 -rw-r-----  1 lake12 og3860  280 Jul 24  1995 tabs.l
      $

Data Base Directory

The directory file defines the structure of the ARMR data base we will be loading. The data base layout is defined by you. To do this you must prepare a data base directory file with your favorite text editor.

Here's our sample directory.

      find.dir
      --------
      find
      2 T1 path    60 (67 60 E
      2 T2 inode    6 ( 1  6 E
      4 B  K        4 ( 8  4 E
      1 T3 type     1 (13  1 E
      2 T4 mode    10 (13 10 E
      2 B  links    2 (24  2 E
      2 T5 user     8 (27  8 E
      2 T5 group    8 (36  8 E
      4 B  size     8 (45  8 E
      3 B  modified 8 (54 12 D
      1 B  tally    5

The first line of the directory file identifies the file name of the associated ARMR tables file. For our example, the tables file will be named "find".

The remaining directory lines define the data base's fields. Each directory line defines one field.

Four items are required to define each field. The field defining items are as follows:

Base field width - this number denotes the byte width of the data field as stored in the ARMR base file.

Data type designator - the first character of this item must be one of the following: B, R, D, Z, T, or E; these correspond to the ARMR data types.

B for binary integers whole numbers like 1, 58, -25, 3000
R for real floating point numbers like 1.245, -3.0, 10.5
D for date for example, 7/22/93, July 22, 1993, etc
Z for zero filled integers fixed width numbers like 0123, 0001, 5432
T for tabular character strings
E for external types character strings, or small binary objects

T-types always contain a numeric suffix; the latter denotes the table number to be associated with the field. R-types may have a numeric suffix. If present, it denotes the number of decimals to display.

Field name - This is an alpha-numeric string up to 12 characters long. The name may contain lower or upper case letters and numbers. The first character of the field name should be an alpha character.
External display width - this number denotes the columnar width to be used when the field is displayed.

The "path" field of the find data base has the following elements:

base field width = 2
data type designator = T1
field name = path
external display width = 60

Directory loading information

The directory file may also contain data base loading information. A field's loading information follows the '(' delimiter. Loading information consists of the following data:

Field-start - This number specifies the starting column position of the field's source data in the input file.
Field-width - This number specifies the width of the source input data.
Source type - Specifies the data conversion that is required.
Armrload currently recognizes 2 conversion types, E and D.
"E" conversion is the default. This implies that Armrload will convert the source character string into an "equivelent" ARMR representation; e.g. if the ARMR field is B-Type, the character string "245" will be converted to the binary integer 245.
"D" represents date conversion. The input character string is converted to ARMR date format.

One need not specify loading information for every field. E-Type and T-Type fields without loading information are set to blank. Numeric B-Type and R-Type fields without loading information are set to 0 if they are not counter fields. Counter fields without loading information are initialized to 1.

Field Origins File (optional)

Optionally, one may use a seperate file in which the field definitions are specified. Let's call it an "org" file for short.

The org file has one entry for each data base field to be loaded. Each entry defines the origin of a particular data field. Below, is a listing of the org file items. The programmer should specify the first 3 items for each entry. The fourth item is optional.

Field-name - The field name must match exactly one of the data base field names.
Field-start - This number specifies the starting column position of the current field's source data in the input file.
Field-width - This number specifies the width of the source input data.
Source type (optional) - Specifies the data conversion that is required. See the "Data Base Directory" description for more details on this item.

To illustrate an org file, we set up a file which describes the source data for the output produced by find. This file will be used in our sample exercise. The entries of find.org are as follows:

      find.org
      --------
      inode       1  6 E
      K           8  4 E
      type       13  1 E
      mode       13 10 E
      links      24  2 E
      user       27  8 E
      group      36  8 E
      size       45  8 E
      modified   54 12 D
      path       67 60 E

The first org file line tells us that the source information for field "inode" starts in column 1 of the input file; the inode field's data is 6 bytes (characters) wide; and inode will be stored in manner "equivalent" to its string representation.

Load fields may be specified in any order.

Several fields may be loaded with the same, or overlapping, source information; for example, as shown above, "type" and "mode" originate from overlapping source fields.

An Armrload Sample Exercise

Given the four files, find.in, find.txt, find.dir, and find.org, we are ready to run the Armrload module.

      armrload find.in 
      1 data base segment(s) written.
      21 lines read.
      21 lines written.
      Elapsed cpu seconds= 0.001 at 16:15:35

The find data base has been created.

Armrload provides diagnostics about the load process. To do a load without the diagnostics we would enter the command:

      armrload -ht find.in

The original find data base is replaced with a new one.

Suppose we create another input file using find on a different directory structure. Assuming this file is also called find.txt, we can append this new data to our find data base using the following command:

      armrload -append find.in 
      1 data base segment(s) written.
      302 lines read.
      323 lines written.
      Elapsed cpu seconds= 0.001 at 16:17:00

302 records are added to the original 21 records in the find data base.

Sample 2

Suppose our data is not in a fixed format. We might write a filter to format the data so that Armrload can use it. Rather than writing a file to disk that we'll throw away later, we might want Armrload to read from standard input instead.

Let's create a load spec file that looks like this:

      sample2.in
      -----------
      source= stdin
      target= find
      org_def= find.org
      end=

The keyword "stdin" tells Armrload to read its input from standard input. When Armrload reads standard input we might want to "pipe" it together with another program like this:


      myfilter input.txt | armrload sample2.in 
      1 data base segment(s) written.
      5630 lines read.
      5630 lines written.
      Elapsed cpu seconds= 0.035 at 10:10:03

The "myfilter" program processes input.txt to create a fixed format version of it. Myfilter writes its output to standard output. When the two are piped together, myfilter's standard output becomes Armrload's standard input. In this example, Armrload reads standard input (myfilter's output) to build the find data base.

B for binary integers	whole numbers like 1, 58, -25, 3000
R for real	floating point numbers like 1.245, -3.0, 10.5
D for date	for example, 7/22/93, July 22, 1993, etc
Z for zero filled integers	fixed width numbers like 0123, 0001, 5432
T for tabular	character strings
E for external types	character strings, or small binary objects