Does DataStage have any concept of a NULL key. Ei. An equivalent to the '{}' key specifier in Ab Initio?
Suppose I wish to perform a column aggregation function such as max or average over all input rows, where all input rows are treated as one group.
Only way I see how to do this in Aggregate Stage is to first add a column with a constant value to all rows, then pass this into Aggregate Stage and specify the generated constant column as the key.
Any other more convenient way to achieve the same effect?
The lack of a key specification in Ab Initio was also useful on joins in giving cartesian product between two sources (one source usually small). Any way to achieve the same in DS?
Null Key Specifiers
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
"" is not the same as NULL.
"" is legal for a key value.
NULL is not legal for a key value, because a primary key column is necessarily declared to be NOT NULL.
DataStage does not have Ab Initio's {} construct, or anything like it. Usual approaches are to use a transformation to change any NULL to a known value (such as 0 or "", depending on data type). When summing, 0 has no effect. Your workaround is as good as any.
If you want an average or sum based only on the non-NULL values, filter the input to the Aggregator stage so that only non-NULL values are sent into it.
"" is legal for a key value.
NULL is not legal for a key value, because a primary key column is necessarily declared to be NOT NULL.
DataStage does not have Ab Initio's {} construct, or anything like it. Usual approaches are to use a transformation to change any NULL to a known value (such as 0 or "", depending on data type). When summing, 0 has no effect. Your workaround is as good as any.
If you want an average or sum based only on the non-NULL values, filter the input to the Aggregator stage so that only non-NULL values are sent into it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
The Aggregation stage makes a Group field mandatory which means you do have to create a column with a constant value in order to group by one field. The column generator should be the most efficient way to add this column unless you already have a transformer in your job. The underlying group operator that the Aggregation stage uses will actually accept zero or more key group fields. To quote the old Orchestrate Operators Guide:
You could create a new custom parallel stage that calls the Group operator directly instead of using the Aggregation stage shell.If you do not specify any grouping keys, all records are members of a single group and statistical values are computed by the operator over the entire input data set.
The syntax for the group operator is shown below. Option values you supply are shown in italic typeface. When your value contains a space or a tab character, you must enclose the value in single quotes.
group
[-collation_sequence locale |collation_file_pathname | OFF]
[-method hash | sort]
[-hash -part partitioning_method | -seq]
[-sort unix | syncsort]
[-coll collection_method]
[-key fieldname [fieldtype] [-param param_string ...] [ci | -cs]
[-reduce source_field suboptions (shown on option table)
[-rereduce source_field suboptions (shown on option table)
[-nul_res]
[-countField count_field_name]
[-weightField weight_field_name]
group -method sort -key gender -key age
-reduce income -min incomeMin -max incomeMax
-mean incomeMean
> outData.ds
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn