ETL Logic

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

ETL Logic

Post by pavan_test »

Hi All,

I have a file 1 column and 3 records.
Input record:

postal code:

77604
60120
77624

expected output: (write 3 records since they are all different)

postal code:

77604
60120
77624


Input record: (1 column, 3 record)

80101
80101
80101

expected output: (write 1 record)

postal code:

80101

I have to compare 77604 <> 60120 <> 77604 then write both the zip codes to output. But 80101 = 80100 then write only 1 record to the output.

Can somoene please let me know how to write this logic in ETL

Thanks
Pavan
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

Pavan,

To me it seems that Remove Duplicate should help.

But, I have a question. What would be the output for the below scenario:

77604
60120
77624
77604
77604
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

ETL Logic

Post by pavan_test »

There are actually 4 columns in the file

Record = 1

ID postal code1 postal code2 postal code3:

101 77604 60120 77624


expected output: (write 3 records to the output since postal code is different in all the 3 records for this ID)

ID postal code:

101 77604
101 60120
101 77624


Record =2:

ID POSTAL Code1 postal code2 postal code3
102 80101 80101 80101



expected output: (write 1 record to the o/p since postal code is same)

Record =3:

ID POSTAL Code1 postal code2 postal code3
103 77562 77504

expected output:

ID postal code: (there is no postal code, the data is empty in the column postal code2)

103 77562
103 77504


To begin with I am sorting the file in ascending order based on "ID".


ID postal code1 postal code2 postal code3:

101 77604 60120 77624
102 80101 80101 80101
103 77562 - 77504

Thanks
Pavan
dvpawankumar
Participant
Posts: 46
Joined: Fri Oct 06, 2006 6:36 am
Location: Tucson

Re: ETL Logic

Post by dvpawankumar »

Hi Pavan,

Need a clarifiacation on your request. For the below given input, what are you expecting as output.

------------------------------
Input
------------------------------
101 77604 60120 77624
102 80101 80101 80101
103 77562 77504 77604

Is your output should be as below:

------------------------------
Output
------------------------------
101 77604
101 60120
101 77624
102 80101
103 77562
103 77504
103 77604

Thanks and Regards,
Pavan.
Thanks and Regards,
Pavan.
dvpawankumar
Participant
Posts: 46
Joined: Fri Oct 06, 2006 6:36 am
Location: Tucson

Re: ETL Logic

Post by dvpawankumar »

Hi Pavan,

As I mentioned above, if your requiremnt is the same, then please follow the below desgin to get the required output.

Stage one
Read your data using appropriate stage.

Stage two
Use Pivot Satage. In the outputs tab, in the column derivation, define two columns. One is ID and the other one is POSTALCODE. The derivations for the columns are as below:
ID = ID
POSTALCODE= postal_code1,postal_code2,postal_code3

Stage Three
Remove duplicates stage. Give both the columns as key columns.

Stage Four
Appropriate Target stage.

Hope this helps you.
Thanks and Regards,
Pavan.
pavan_test
Premium Member
Premium Member
Posts: 263
Joined: Fri Sep 23, 2005 6:49 am

ETL Logic

Post by pavan_test »

-----------------------------
Input
------------------------------
101 77604 60120 77624
102 80101 80101 80101
103 77562 - 77604
104 77562 77562 90210
105 77562 90211 77562
106 77563 90212 90212

the postal code1, postal code2 and postal code3 are nullable columns.

-----------------------------
Desired Output
------------------------------
101 77604
101 60120
101 77624
102 80101
103 77562
103 77604
104 77562
104 90210
105 77562
105 90211
106 77563
106 90212

Thanks
Pavan
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Which doesn't really change the answer you were given. You still need to pivot the data and then remove the duplicates, you just need to add a step to filter out the records with a null postal code between them.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply