The indexing and navigation of NEWSTAR 's compound data files

Contributed by Johan Hamaker, November 1994
960104: Add loops vs. aggregates and repetitive processing

Contents

Indexed data-file types

The indexing structure described in this document is common to three types of NEWSTAR files. Each of these is a collection of 'elementary' data blocks under a hierarchical index:

This document describes the generic properties of the index mechanism and the versatile mechanisms in the NEWSTAR program-parameter interface through which sets of data blocks can be defined for processing.

The index structure

The data blocks are organised in a hierarchical index. This structure is isomorphous to the familiar directory structure for files, the data blocks taking the place of the individual files. Unlike directories, however, the indices are not names but numbers. As we shall see below, in some cases these numbers are related to some physical parameter, in other cases they are simply administrative sequence numbers.

Just like a full file path is composed of concatenated directory names, a data block is adressed through its compound index, i.e. its concatenated indices:

<index0>.<index1>.<index2>....

At all levels, the first index value is 0. You may find this uncomfortable at first, but you will soon get used to it

The successive indices convey specific information which is different between file types. The following information levels are used:

<grp> group
<obs> observation
<fld> mosaic subfield
<chn> frequency channel
<pol> polarisation
<typ> type
<ifr> interferometer
<seq> sequence number

The compound indices are:

<grp>.<obs>.<fld>.<chn>.<seq> for the .SCN file;
<grp>.<fld>.<chn>.<pol>.<typ>.<seq> for the .WMP file;
<grp>.<fld>.<chn>.<pol>.<ifr>.<seq> for the .NGF file.

More details about the meaning and use of these levels is given in the documents on the individual file types.

Addressing groups of data blocks

The true power of the indexing mechanism lies in its ability to address data blocks in aggregates of widely varying composition, through shorthand notations that one may type in response to a prompt asking for the data blocks to be processed.

Data-block aggregates

Firstly, a compound index may refer to a set or aggregate of data blocks in several ways:

Instead of a single index value, one may define an aggregate through a range with an increment:

<start value>-<end value>:<step>

or abbreviations thereof (with obvious meaning):

<start value>-<end value> <start value>-:<step> <start value>-

One may also use the wildcard character '*', meaning 'all values'. An index level may be omitted (by putting nothing between the dots that delimit it), and even the dots themselves for the trailing values; in all these cases NEWSTAR will substitute wildcards. Thus

3.*.1.*.* 3..1.*.* 3.*.1 3..1

all mean the same thing.

Secondly, a number of sets in any of these forms may be concatenated with commas in between, e.g.

3.1-5:2, 4, 5.1..3-:3

For all practical purposes, the number of data blocks that can be addressed in these ways through a single specification is unlimited.

Looping

Suppose one wants to make 8 maps from 16 visibility sectors and specifies

3.5.0-15.1.0

for the input. NMAP will take this specification to mean that all 16 sectors must be used in the first map and will then stop because the input sectors are exhausted.

What one needs is a mechanism to tell NMAP that it should use the sectors in pairs. One way to do this is to 'manually' execute it 8 times with sector specifications

SCN_SETS = 3.5.0-1.1.0
SCN_SETS = 3.5.2-3.1.0
etc.

LOOPS provide an efficient shorthand to do just that. With a LOOP specification, one directs the program to execute a program a number of times, incrementing the data-block specification after every cycle. The LOOP specification takes the form

<number of cycles>,<compound-index increment per cycle>

In combination with this, the initial set of data blocks is specified.

Thus for the above example one would specify

SCN_LOOPS = 8,0.0.2
SCN_SETS = 3.1.0-1.1.0

Zeros may be omitted in the increment; thus in the above, the shorter notation SCN_LOOPS = 8,..2 is just as good.

Repetitive execution

The LOOPS construct may also be used for executing the same operation more than once in a single run. viz. by specifying an increment of 0. E.g.

SCN_LOOPS = 8,...1, 3,0

will select 8 sector aggregates, processing each 3 times in a row.

Nested loops, aggregate start sets

It is possible to specify more than one loop simultaneously, e.g.

SCN_LOOPS = 64,..1, 8,...1
SCN_SETS = 2.0.0.1.*

for a mosaic of 64 subfields and 8 frequency channels. This will result in 8 maps being made in succession for each of the 64 subfiels; i.e. a following loop is nested inside the ones specified before it.

It more than one set is specified for the start cycle, incrementation is applied to each set. Thus, e.g. , for a specification

SCN_LOOPS = 8,.1
SCN_SETS = 1.0.3.1, 1.1-3:2.0.7

the data blocks used would be

1.0.3.1, 1.1.0.7, 1.3.0.7 for the first cycle;
1.1.3.1, 1.2.0.7, 1.4.0.7 for the second cycle;
1.1.3.1, 1.3.0.7, 1.5.0.7 for the third cycle;

etc.. (This example merely serves to show how the loop mechanism works; one is not likely in reality to want successive cycles with partly overlapping input data blocks.)

Loops versus data-block aggregates

In those operations where each data block is processaed individually, the blocks to be selected may be specified either as an aggregate or through a loop construct. For instance, Self-calibration operates on each data block ('sector') in a .SCN file in isolation; in this case, e.g. the specifications

SCN_SETS = 0.3-6.0.0-8

and
SCN_LOOPS = 4,.1, 9,...1
SCN_SETS = 0.3.0.0

are equivalent. The only reason why one might want to use a LOOPS specification in this case is to control the order in shich the dat blocks are selected.

There are also applications in which data blocks are combined, for instance when a sky map is made using an aggregate of .SCN-file sectors as input. Here the LOOPS construct results in a separate map for each loop cycle, rather than a single map from all input sectors as an aggregate specification would. See above for an example where a number of maps are made each using a sector aggregate as input.

When one is not sure of the existing index ranges

Data files may become so large and complex that one does not remember precisely the ranges of the indices in all the various branches of the index hierarchy. If one doesn't particularly care, one may use wildcards or deliberately specify an index range that is too large: In most cases a NEWSTAR program, finding that a requested data block does not exist, will simply proceed to the next one. Thus, for example, if one specifies 8 loop cycles but there are only data for the first three, the 4th through 8th cycles will be traversed but nothing will be done in them.

Another possibility one has is to reply with 'L' (for Layout) or 'O' (for Overview) to a _SETS or _LOOPS prompt. This will give you information about the composition of your data file.

Absolute indexing

In addition to the compound indices discussed so far, one may also use the absolute index to address a data block. The absolute index is a sequence number that is allocated sequentially to each new data block created; thus the newest data block invariably has the highest absolute index. Like the other indices, the absolute index starts at 0. To address a data block through its absolute index, prefix the number with a hash sign, e.g.

WMP_SETS = #23-27

Absolute indices can be used throughout in place of compound indices. For example, the loop specification

NGF_LOOPS = 3,#1, 4,#3
MGF_SETS = 3.7.5.1.*

is valid (but is may not be very easy to be sure that it does the right thing).

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.


newstar@nfra.nl