Duplicate row issue in transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tsktsk123
Participant
Posts: 32
Joined: Thu Dec 25, 2003 11:59 am

Duplicate row issue in transformer

Post by tsktsk123 »

Hi,

We have the following requirement.

source table struture

e_no date e_n e_code e_catg e_sal src

1 1-Jan-08 a aa 100 abc
2 2-Jan-08 b bb ab abc
3 3-Jan-08 c bc 200 abc

the lookup table structure is

src variable value s_d e_d

abc e_code zz 1-Dec-08 28-Dec-08
abc e_catg zzz 2-Dec-08
abc e_sal 1000 3-Dec-08
def e_code yy 4-Dec-08
def e_catg yyy 5-Dec-08
def e_sal 2000 6-Dec-08
abc e_code aa 29-Dec-08

We need to pass date from source table to the reference data table where date is between s_d and e_d and need to populate the value from the reference when the source column comes as null i.e. in the above example for e_no the e_catg comes as null so we need to pull value 'ZZ' from reference data table and load to target. Similarly we need to load the values from reference data table for all three columns(e_code,e_catg,e_sal) whenever it comes as null from source.

The join condition used is Date between s_d and e_d and src = src, pull variable_name and value from reference data table.

We have tried the following logic

Lkp table(Oracle)

SRC(Oracle)->Joiner->Filter->Transformer->


Lkp table(Oracle) is used to pull all the records from the lookup table where src = 'abc'

joiner is used to pull the s_d,e_d,variable,value from lookup table using the condition src = src

filter is used to apply condition where date is between s_d and e_d

transformer is used to check if the source is null and to populate the value comes from lookup

for explanation i will consider the first row from source i.e.

e_no date e_n e_code e_catg e_sal src

1 1-Jan-08 a aa 100 abc

the o/p of the join will be

e_no date e_n e_code e_catg e_sal src varia val s_d e_d

1 1-Jan-08 a aa 100 abc e_code zz 01-dec-08 28-dec
1 1-Jan-08 a aa 100 abc e_catg zzz 02-dec-08
1 1-Jan-08 a aa 100 abc e_sal 1000 03-dec-08
1 1-Jan-08 a aa 100 abc e_code zz 29-dec-08

the output of the filter stage will be

1 1-Jan-08 a aa 100 abc e_catg zzz 02-dec-08
1 1-Jan-08 a aa 100 abc e_sal 1000 03-dec-08
1 1-Jan-08 a aa 100 abc e_code zz 29-dec-08

the output of the transformer will be

1 1-Jan-08 a aa zzz 100 abc
1 1-Jan-08 a aa zzz 100 abc
1 1-Jan-08 a aa zzz 100 abc

We like to load only one row out of this 3 row, can someone help us to identify how we load only the last row. The above is the output only for 1 e_no, similarly for each e_no we will get 3 rows, so we need to load only one load to the target.

1) Is there any alternate and efficient way of doing the same thing

2) Is there a way to apply conditions in joiner stage?

3) the output of joiner has mXn records n is the no of variable for each source system(s_s), is there an any other way of doing this?

Appreciate any help on this.

regards,
senthil
tsktsk123
Participant
Posts: 32
Joined: Thu Dec 25, 2003 11:59 am

Post by tsktsk123 »

Hi,

I tried using a remove duplicates stage to pull only the last row, it works, can anyone help me is there any other efficient way to doing this the entire logic?

regards,
senthil
Post Reply