Obtaining Last Non-Null Value Per Column
Posted: Tue Jun 24, 2008 1:16 pm
Hi All.
I am trying to determine the best method for obtaining the last non-null value for each column within a key (or group of keys). Could not find anything by searching the forum.
Here's a data scenario:
<pre>
ROW# C1 (KEY) C2 C3 C4
---- -------- ------ ------ ------
1 1000 ACME <NULL> 999
2 1000 <NULL> RED 111
3 1000 ZEKE BLUE <NULL>
4 1000 PLUTO <NULL> <NULL>
5 1000 <NULL> <NULL> 555
</pre>
DESIRED RESULT:
<pre>
ROW# C1 (KEY) C2 C3 C4
---- -------- ------ ------ ------
1 1000 PLUTO BLUE 555
</pre>
The most obvious place in the current product would seem to be a "Last Not Null" and/or "First Not Null' operation to the Aggregator stage since it operates per column / per key, not per record / per key. But it doesn't exist.
Remove Duplicates - doesn't appear to work because it accepts or rejects an entire record
Aggregator - doesn't have a "Last non-null" or equivalent. MIN and MAX will not work because the columns are not sorted individually
Sort/Unique - like Remove Duplicates, accepts/rejects entire record
Join/Merge with "case" statements to choose columns will not work since the number of records per key is unknown amongst other reasons.
A custom solution using stage variables for every column is a potential solution, but there are many jobs and many columns. I am hoping there is a more efficient, more "out of the box" approach.
Appreciate any assistance!!!
I am trying to determine the best method for obtaining the last non-null value for each column within a key (or group of keys). Could not find anything by searching the forum.
Here's a data scenario:
<pre>
ROW# C1 (KEY) C2 C3 C4
---- -------- ------ ------ ------
1 1000 ACME <NULL> 999
2 1000 <NULL> RED 111
3 1000 ZEKE BLUE <NULL>
4 1000 PLUTO <NULL> <NULL>
5 1000 <NULL> <NULL> 555
</pre>
DESIRED RESULT:
<pre>
ROW# C1 (KEY) C2 C3 C4
---- -------- ------ ------ ------
1 1000 PLUTO BLUE 555
</pre>
The most obvious place in the current product would seem to be a "Last Not Null" and/or "First Not Null' operation to the Aggregator stage since it operates per column / per key, not per record / per key. But it doesn't exist.
Remove Duplicates - doesn't appear to work because it accepts or rejects an entire record
Aggregator - doesn't have a "Last non-null" or equivalent. MIN and MAX will not work because the columns are not sorted individually
Sort/Unique - like Remove Duplicates, accepts/rejects entire record
Join/Merge with "case" statements to choose columns will not work since the number of records per key is unknown amongst other reasons.
A custom solution using stage variables for every column is a potential solution, but there are many jobs and many columns. I am hoping there is a more efficient, more "out of the box" approach.
Appreciate any assistance!!!