How to Catch the last record in Stage Variable

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
somu_june
Premium Member
Premium Member
Posts: 439
Joined: Wed Sep 14, 2005 9:28 am
Location: 36p,reading road

How to Catch the last record in Stage Variable

Post by somu_june »

Hi,

I have input records comming as ND1, ND2 and CNT1 and ND1 contains SN, Name and Address and ND2 contains SN, Birthdate, Hiredate and CNT1 contains SN, CN1 and CN2 . For example if there is a input like this

ND1,123, Raju, 1502
ND2,123,19880606,20070707
CNT1,123,0000,0000
ND1,234,Srin,6666
ND2,234,19890606,20070606
CNT1,345,0000,0000

Where 123, 234 and 345 are SN numbers for same SN numbers I need as one output record

output must look like this

123, Raju, 1502, 19880606, 20070707, 0000,0000
234, Srin, 6666, 19890606, 20070606,........,.......
345,......, ........,...............,...............,00000,.......

where ........ is emptystring

I am able to catch the above two 123 and 234 recors in output but Iam not able to catch the 345 record to the output . Can some body tell me how to achieve this using stage variables


Thanks,
Somaraju.
somaraju
say2prabhu1
Participant
Posts: 27
Joined: Mon Jul 09, 2007 1:06 am

Post by say2prabhu1 »

Before processing to the stage variable stage, sort the incoming records based on your requirement.
For example you can sort based on the key value..
Then declare the four stage variable(ST1,ST2,ST3,ST4) in the transformer stage.Pass the first record into third stage variable,2 nd record into second stage variable and 3rd record into third stage variable.
In each stage variable derivation you have to compare with incoming records.
We can declare the condition in 4th stage variable and pass to the output stage.

ST1 ;- If(key1:key2)=ST3 then 1 else 0
ST2 :- If(key1:key2)=ST3 then 1 else 0
ST3 :- key1:key2
ST4:- If St1=0 then output else ST4
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Tell us how you are deriving your stage variables.

Have you considered other stage types, such as Remove Duplicates (which allows for First or Last of each group)? That way you would need no stage variables at all to fulfil your requirement. Simply group by SN.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
say2prabhu1
Participant
Posts: 27
Joined: Mon Jul 09, 2007 1:06 am

Post by say2prabhu1 »

based on our business requirement we need to process the data like remove duplicates,sort etc.. I am considered the incoming records are not duplicates..
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

What you are doing looks like a pivot - the type of pivot not easily supported by DataStage, especially parallel jobs. I would try this design. 1) Sort stage - sort and partition by your key field (eg. 123). This puts the rows in order and keeps key combinations on the same nodes.
2) Transformer stage - here are where things get complex. You need the following stage variables:
- NewKey, the new primary key.
- KeyChange - sets to true if the incoming key is different to the last key.
- LastKey - the key values from the previous row.
- KeyCount - the number of times the current key has shown up.
- PivotValue_1 to 6. The six output fields for each key combination.
So when a row comes in with a new key value you clear out PivotValue 1 to 6 using a conditional statement in each. When the key remains the same you use KeyCount to put a value from the input record into PivotValue 1 through 6 depending on the key count. The derivations on these fields use KeyChange and KeyCount and the input columns to set, reset or remain as they are.
3) Write all rows out with PivotValue 1 to 6 mapped to six output columns plus the key column, this will give you duplicate output rows.
4) Use a remove duplicates stage and keep the last row - the row with all the pivot values in it, discarding the incomplete rows.

So row 1 out of the transformer will look like this:
ND1,123, Raju, 1502, ..., ..., ..., ...
Row 2 will be:
ND1,123, Raju, 1502, 19880606,20070707, ..., ...
Row 3 will be:
ND1,123, Raju, 1502, 19880606,20070707, 0000,0000
Row 4 will be:
234,Srin,6666,...,...,...,...

The remove duplicates will discard rows 1 and 2 and keep 3 and 4. Only works if you have up to 6 of these pivot fields. If you have a dynamic number of pivot fields you need to look at dynamic delimited stage variables writing to a single output field and it gets trickier.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

As described by the other responses, DataStage is not necessarily the best tool to do this type of thing.

Personally, I like to get my data into at least 1NF before I start processing it with DS - there are other languages that are better at this than DS.

If you do want to use DS, then clearly the logic (as described above) is non-trivial and (to a non-DS expert) non-intuitive.

One way to make it more intuitive - but slower - would be to
- Sort the rows on SN
- Use a Transformer to split into 4 links: ND1 to link1, ND2 to link2, CNT1 to link2, and a distinct list of SNs to link4.
- Use a MERGE stage to join all of these back together using Link4 as the Master Link in the merge.

Code: Select all

                          |----|
FILE ---- SORT ---- XFORM +----+ MERGE ---- FILE
                          |----|
                          |----|
Ross Leishman
Post Reply