Null Key Specifiers

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gsherry1
Charter Member
Charter Member
Posts: 173
Joined: Fri Jun 17, 2005 8:31 am
Location: Canada

Null Key Specifiers

Post by gsherry1 »

Does DataStage have any concept of a NULL key. Ei. An equivalent to the '{}' key specifier in Ab Initio?

Suppose I wish to perform a column aggregation function such as max or average over all input rows, where all input rows are treated as one group.

Only way I see how to do this in Aggregate Stage is to first add a column with a constant value to all rows, then pass this into Aggregate Stage and specify the generated constant column as the key.

Any other more convenient way to achieve the same effect?

The lack of a key specification in Ab Initio was also useful on joins in giving cartesian product between two sources (one source usually small). Any way to achieve the same in DS?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
The way you use is what commonly followed.
"" or '' can be used for null check. IsNull() can also be used for the same.
cheers!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"" is not the same as NULL.

"" is legal for a key value.
NULL is not legal for a key value, because a primary key column is necessarily declared to be NOT NULL.

DataStage does not have Ab Initio's {} construct, or anything like it. Usual approaches are to use a transformation to change any NULL to a known value (such as 0 or "", depending on data type). When summing, 0 has no effect. Your workaround is as good as any.

If you want an average or sum based only on the non-NULL values, filter the input to the Aggregator stage so that only non-NULL values are sent into it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The Aggregation stage makes a Group field mandatory which means you do have to create a column with a constant value in order to group by one field. The column generator should be the most efficient way to add this column unless you already have a transformer in your job. The underlying group operator that the Aggregation stage uses will actually accept zero or more key group fields. To quote the old Orchestrate Operators Guide:
If you do not specify any grouping keys, all records are members of a single group and statistical values are computed by the operator over the entire input data set.
You could create a new custom parallel stage that calls the Group operator directly instead of using the Aggregation stage shell.
The syntax for the group operator is shown below. Option values you supply are shown in italic typeface. When your value contains a space or a tab character, you must enclose the value in single quotes.
group
[-collation_sequence locale |collation_file_pathname | OFF]
[-method hash | sort]
[-hash -part partitioning_method | -seq]
[-sort unix | syncsort]
[-coll collection_method]
[-key fieldname [fieldtype] [-param param_string ...] [ci | -cs]
[-reduce source_field suboptions (shown on option table)
[-rereduce source_field suboptions (shown on option table)
[-nul_res]
[-countField count_field_name]
[-weightField weight_field_name]

group -method sort -key gender -key age
-reduce income -min incomeMin -max incomeMax
-mean incomeMean
> outData.ds
Post Reply