Null Key Specifiers

gsherry1 · Post by **gsherry1** » Wed Jul 20, 2005 12:00 pm

Does DataStage have any concept of a NULL key. Ei. An equivalent to the '{}' key specifier in Ab Initio?

Suppose I wish to perform a column aggregation function such as max or average over all input rows, where all input rows are treated as one group.

Only way I see how to do this in Aggregate Stage is to first add a column with a constant value to all rows, then pass this into Aggregate Stage and specify the generated constant column as the key.

Any other more convenient way to achieve the same effect?

The lack of a key specification in Ab Initio was also useful on joins in giving cartesian product between two sources (one source usually small). Any way to achieve the same in DS?

kumar_s · Post by **kumar_s** » Wed Jul 20, 2005 12:06 pm

Hi,
The way you use is what commonly followed.
"" or '' can be used for null check. IsNull() can also be used for the same.
cheers!!

ray.wurlod · Post by **ray.wurlod** » Wed Jul 20, 2005 3:19 pm

"" is not the same as NULL.

"" is legal for a key value.
NULL is not legal for a key value, because a primary key column is necessarily declared to be NOT NULL.

DataStage does not have Ab Initio's {} construct, or anything like it. Usual approaches are to use a transformation to change any NULL to a known value (such as 0 or "", depending on data type). When summing, 0 has no effect. Your workaround is as good as any.

If you want an average or sum based only on the non-NULL values, filter the input to the Aggregator stage so that only non-NULL values are sent into it.

vmcburney · Post by **vmcburney** » Wed Jul 20, 2005 6:44 pm

The Aggregation stage makes a Group field mandatory which means you do have to create a column with a constant value in order to group by one field. The column generator should be the most efficient way to add this column unless you already have a transformer in your job. The underlying group operator that the Aggregation stage uses will actually accept zero or more key group fields. To quote the old Orchestrate Operators Guide:

If you do not specify any grouping keys, all records are members of a single group and statistical values are computed by the operator over the entire input data set.

You could create a new custom parallel stage that calls the Group operator directly instead of using the Aggregation stage shell.

The syntax for the group operator is shown below. Option values you supply are shown in italic typeface. When your value contains a space or a tab character, you must enclose the value in single quotes.
group
[-collation_sequence locale |collation_file_pathname | OFF]
[-method hash | sort]
[-hash -part partitioning_method | -seq]
[-sort unix | syncsort]
[-coll collection_method]
[-key fieldname [fieldtype] [-param param_string ...] [ci | -cs]
[-reduce source_field suboptions (shown on option table)
[-rereduce source_field suboptions (shown on option table)
[-nul_res]
[-countField count_field_name]
[-weightField weight_field_name]

group -method sort -key gender -key age
-reduce income -min incomeMin -max incomeMax
-mean incomeMean
> outData.ds