Subject: ARMRLOAD
Call: armrload -ht -append -nocan specName
Armrload is a general ARMR data base builder. It accepts input from a file or from standard input. The output is either a single data base or a segmented data base. One may also use Armrload to append new data to an existing data base.
Armrload is typically used to interface ARMR with other systems. One uses Armrload to load data exported from another system.
armrload -ht -append -nocan specFile | |
---|---|
-ht | an optional argument denoting "halt type." |
-append | an optional argument that directs Armrload to append to the target data base. Armrload's default behavior is to replace the target data base. |
-nocan | By default, duplicate records are removed. When -nocan is specified, Armrload will retain the redundant records. |
specName | is the name of a file in which you have written processing specs for the current Armrload run. |
The specification file defines how the load process should proceed. Here's a sample specification file that we'll use later:
find.in -------- source= find.txt target= find org_def= find.org end=
Armrload recognizes the following macro statements:
Statement | Description | |
---|---|---|
source= fn | fn is the name of the source data file, or "stdin" to read from standard input. | |
target= fn+ | fn+ represents the name of the data base to be created or
appended to.
Use plus signs '+' to denote segmentation; e.g. if base name find++ is specified, the output base names will be find01.bas, find02.bas, and so on. Default ARMR naming conventions apply here; i.e. the data base is assumed to have a file extension of "bas" (file type of "BASE" on VM systems); the associated directory's file extension is assumed to be "dir" (file type of "DIRECT" on VM). | |
org_def= fn | The name of a source field definitions file. If an org file is specified, then it replaces any loading information specified in the data base directory. | |
counter_fields= f1 f2 ... | A list of fields that are summed before output | |
segment_size= n | Use this option to control the size of the output segments. By default, Armrload will attempt to make the output segments as large as possible. If this is not desired, use the segment_size= option. | |
end= | This statement marks the end of a particular load specification. You may load any number of data bases; i.e. a given macro file may have any number of end= statements, one for each load. | |
You may also include any number of comment statements; these start with a ';' (semi colon) in column 1. | ||
Note: A full directory path may be specified for any of the a file names. |
The data to be imported via Armrload is assumed to be arranged in fixed columns. Therefore, no delimiters are required between data fields.
Below is the input file that will be used in our sample Armrload exercise. (Some of the spaces were removed)
$ find * -ls >find.txt $ cat find.txt 512452 1 drwx------ 2 lake12 og3860 512 Mar 5 1998 Mail 747814 2 -rw-rw-rw- 1 lake12 og3860 1318 Jun 6 1995 client.c 747820 3 -rw-rw-rw- 1 lake12 og3860 2240 Jan 19 1995 config.c 747819 1 -rw-r----- 1 lake12 og3860 416 Jul 20 1995 cppcmt.l 747829 1 -rw-rw-rw- 1 lake12 og3860 988 Aug 15 1995 dkpsbold.l 747815 1 -rw-r----- 1 lake12 og3860 358 Jan 4 1995 f2carx.l 740053 0 -rw-r----- 1 lake12 og3860 0 Sep 29 16:33 find.txt 747834 1 -rw-r----- 1 lake12 og3860 110 Dec 6 1995 join.l 747827 7 -rw-r----- 1 lake12 og3860 6610 Aug 18 1997 lex.yy.c 666496 1 drwxrwxr-x 2 lake12 og3860 512 Jul 1 16:43 mem 666609 1 -rw-rw-r-- 1 lake12 og3860 860 Jul 1 16:42 mem/calloc.c 666610 5 -rw-rw-r-- 1 lake12 og3860 4473 Jul 1 16:42 mem/free.c 666611 8 -rw-rw-r-- 1 lake12 og3860 7359 Jul 1 16:42 mem/malloc.c 666652 1 -rw-rw-r-- 1 lake12 og3860 886 Jul 1 16:42 mem/morecore.c 666654 4 -rw-rw-r-- 1 lake12 og3860 3483 Jul 1 16:42 mem/realloc.c 666655 1 -rw-rw-r-- 1 lake12 og3860 963 Jul 1 16:42 mem/valloc.c 666657 3 -rw-rw-r-- 1 lake12 og3860 3068 Jul 1 16:42 mem/malloc.h 666658 0 -rw-r----- 1 lake12 og3860 0 Jul 1 16:43 mem/temp.c 747817 1 -rw-r----- 1 lake12 og3860 81 Mar 19 1996 mytest.c 747812 1 -rw-r--r-- 1 lake12 og3860 260 Sep 7 1994 star.l 747828 1 -rw-r----- 1 lake12 og3860 280 Jul 24 1995 tabs.l $
The directory file defines the structure of the ARMR data base we will be loading. The data base layout is defined by you. To do this you must prepare a data base directory file with your favorite text editor.
Here's our sample directory.
find.dir -------- find 2 T1 path 60 (67 60 E 2 T2 inode 6 ( 1 6 E 4 B K 4 ( 8 4 E 1 T3 type 1 (13 1 E 2 T4 mode 10 (13 10 E 2 B links 2 (24 2 E 2 T5 user 8 (27 8 E 2 T5 group 8 (36 8 E 4 B size 8 (45 8 E 3 B modified 8 (54 12 D 1 B tally 5
The first line of the directory file identifies the file name of the associated ARMR tables file. For our example, the tables file will be named "find".
The remaining directory lines define the data base's fields. Each directory line defines one field.
Four items are required to define each field. The field defining items are as follows:
B for binary integers | whole numbers like 1, 58, -25, 3000 |
R for real | floating point numbers like 1.245, -3.0, 10.5 |
D for date | for example, 7/22/93, July 22, 1993, etc |
Z for zero filled integers | fixed width numbers like 0123, 0001, 5432 |
T for tabular | character strings |
E for external types | character strings, or small binary objects |
T-types always contain a numeric suffix; the latter denotes the table number to be associated with the field. R-types may have a numeric suffix. If present, it denotes the number of decimals to display.
The "path" field of the find data base has the following elements:
The directory file may also contain data base loading information. A field's loading information follows the '(' delimiter. Loading information consists of the following data:
Armrload currently recognizes 2 conversion types, E and D.
"E" conversion is the default. This implies that Armrload will convert the source character string into an "equivelent" ARMR representation; e.g. if the ARMR field is B-Type, the character string "245" will be converted to the binary integer 245.
"D" represents date conversion. The input character string is converted to ARMR date format.
One need not specify loading information for every field. E-Type and T-Type fields without loading information are set to blank. Numeric B-Type and R-Type fields without loading information are set to 0 if they are not counter fields. Counter fields without loading information are initialized to 1.
Optionally, one may use a seperate file in which the field definitions are specified. Let's call it an "org" file for short.
The org file has one entry for each data base field to be loaded. Each entry defines the origin of a particular data field. Below, is a listing of the org file items. The programmer should specify the first 3 items for each entry. The fourth item is optional.
find.org -------- inode 1 6 E K 8 4 E type 13 1 E mode 13 10 E links 24 2 E user 27 8 E group 36 8 E size 45 8 E modified 54 12 D path 67 60 E
The first org file line tells us that the source information for field "inode" starts in column 1 of the input file; the inode field's data is 6 bytes (characters) wide; and inode will be stored in manner "equivalent" to its string representation.
Load fields may be specified in any order.
Several fields may be loaded with the same, or overlapping, source information; for example, as shown above, "type" and "mode" originate from overlapping source fields.
Given the four files, find.in, find.txt, find.dir, and find.org, we are ready to run the Armrload module.
armrload find.in
1 data base segment(s) written.
21 lines read.
21 lines written.
Elapsed cpu seconds= 0.001 at 16:15:35
The find data base has been created.
Armrload provides diagnostics about the load process. To do a load without the diagnostics we would enter the command:
armrload -ht find.in
The original find data base is replaced with a new one.
Suppose we create another input file using find on a different directory structure. Assuming this file is also called find.txt, we can append this new data to our find data base using the following command:
armrload -append find.in
1 data base segment(s) written.
302 lines read.
323 lines written.
Elapsed cpu seconds= 0.001 at 16:17:00
302 records are added to the original 21 records in the find data base.
Suppose our data is not in a fixed format. We might write a filter to format the data so that Armrload can use it. Rather than writing a file to disk that we'll throw away later, we might want Armrload to read from standard input instead.
Let's create a load spec file that looks like this:
sample2.in ----------- source= stdin target= find org_def= find.org end=
The keyword "stdin" tells Armrload to read its input from standard input. When Armrload reads standard input we might want to "pipe" it together with another program like this:
myfilter input.txt | armrload sample2.in 1 data base segment(s) written. 5630 lines read. 5630 lines written. Elapsed cpu seconds= 0.035 at 10:10:03
The "myfilter" program processes input.txt to create a fixed format version of it. Myfilter writes its output to standard output. When the two are piped together, myfilter's standard output becomes Armrload's standard input. In this example, Armrload reads standard input (myfilter's output) to build the find data base.