How to Catch the last record in Stage Variable

somu_june · Post by **somu_june** » Mon Nov 26, 2007 2:50 pm

Hi,

I have input records comming as ND1, ND2 and CNT1 and ND1 contains SN, Name and Address and ND2 contains SN, Birthdate, Hiredate and CNT1 contains SN, CN1 and CN2 . For example if there is a input like this

ND1,123, Raju, 1502
ND2,123,19880606,20070707
CNT1,123,0000,0000
ND1,234,Srin,6666
ND2,234,19890606,20070606
CNT1,345,0000,0000

Where 123, 234 and 345 are SN numbers for same SN numbers I need as one output record

output must look like this

123, Raju, 1502, 19880606, 20070707, 0000,0000
234, Srin, 6666, 19890606, 20070606,........,.......
345,......, ........,...............,...............,00000,.......

where ........ is emptystring

I am able to catch the above two 123 and 234 recors in output but Iam not able to catch the 345 record to the output . Can some body tell me how to achieve this using stage variables

Thanks,
Somaraju.

say2prabhu1 · Post by **say2prabhu1** » Mon Nov 26, 2007 4:21 pm

Before processing to the stage variable stage, sort the incoming records based on your requirement.
For example you can sort based on the key value..
Then declare the four stage variable(ST1,ST2,ST3,ST4) in the transformer stage.Pass the first record into third stage variable,2 nd record into second stage variable and 3rd record into third stage variable.
In each stage variable derivation you have to compare with incoming records.
We can declare the condition in 4th stage variable and pass to the output stage.

ST1 ;- If(key1:key2)=ST3 then 1 else 0
ST2 :- If(key1:key2)=ST3 then 1 else 0
ST3 :- key1:key2
ST4:- If St1=0 then output else ST4

ray.wurlod · Post by **ray.wurlod** » Mon Nov 26, 2007 5:22 pm

Tell us how you are deriving your stage variables.

Have you considered other stage types, such as Remove Duplicates (which allows for First or Last of each group)? That way you would need no stage variables at all to fulfil your requirement. Simply group by SN.

say2prabhu1 · Post by **say2prabhu1** » Mon Nov 26, 2007 6:01 pm

based on our business requirement we need to process the data like remove duplicates,sort etc.. I am considered the incoming records are not duplicates..

vmcburney · Post by **vmcburney** » Mon Nov 26, 2007 6:24 pm

What you are doing looks like a pivot - the type of pivot not easily supported by DataStage, especially parallel jobs. I would try this design. 1) Sort stage - sort and partition by your key field (eg. 123). This puts the rows in order and keeps key combinations on the same nodes.
2) Transformer stage - here are where things get complex. You need the following stage variables:
- NewKey, the new primary key.
- KeyChange - sets to true if the incoming key is different to the last key.
- LastKey - the key values from the previous row.
- KeyCount - the number of times the current key has shown up.
- PivotValue_1 to 6. The six output fields for each key combination.
So when a row comes in with a new key value you clear out PivotValue 1 to 6 using a conditional statement in each. When the key remains the same you use KeyCount to put a value from the input record into PivotValue 1 through 6 depending on the key count. The derivations on these fields use KeyChange and KeyCount and the input columns to set, reset or remain as they are.
3) Write all rows out with PivotValue 1 to 6 mapped to six output columns plus the key column, this will give you duplicate output rows.
4) Use a remove duplicates stage and keep the last row - the row with all the pivot values in it, discarding the incomplete rows.

So row 1 out of the transformer will look like this:
ND1,123, Raju, 1502, ..., ..., ..., ...
Row 2 will be:
ND1,123, Raju, 1502, 19880606,20070707, ..., ...
Row 3 will be:
ND1,123, Raju, 1502, 19880606,20070707, 0000,0000
Row 4 will be:
234,Srin,6666,...,...,...,...

The remove duplicates will discard rows 1 and 2 and keep 3 and 4. Only works if you have up to 6 of these pivot fields. If you have a dynamic number of pivot fields you need to look at dynamic delimited stage variables writing to a single output field and it gets trickier.

rleishman · Post by **rleishman** » Mon Nov 26, 2007 8:32 pm

As described by the other responses, DataStage is not necessarily the best tool to do this type of thing.

Personally, I like to get my data into at least 1NF before I start processing it with DS - there are other languages that are better at this than DS.

If you do want to use DS, then clearly the logic (as described above) is non-trivial and (to a non-DS expert) non-intuitive.

One way to make it more intuitive - but slower - would be to
- Sort the rows on SN
- Use a Transformer to split into 4 links: ND1 to link1, ND2 to link2, CNT1 to link2, and a distinct list of SNs to link4.
- Use a MERGE stage to join all of these back together using Link4 as the Master Link in the merge.

Code: Select all

                          |----|
FILE ---- SORT ---- XFORM +----+ MERGE ---- FILE
                          |----|
                          |----|