ETL Logic

pavan_test · Post by **pavan_test** » Tue Jul 17, 2012 1:05 pm

Hi All,

I have a file 1 column and 3 records.
Input record:

postal code:

77604
60120
77624

expected output: (write 3 records since they are all different)

postal code:

77604
60120
77624

Input record: (1 column, 3 record)

80101
80101
80101

expected output: (write 1 record)

postal code:

80101

I have to compare 77604 <> 60120 <> 77604 then write both the zip codes to output. But 80101 = 80100 then write only 1 record to the output.

Can somoene please let me know how to write this logic in ETL

Thanks
Pavan

ankursaxena.2003 · Post by **ankursaxena.2003** » Tue Jul 17, 2012 2:01 pm

Pavan,

To me it seems that Remove Duplicate should help.

But, I have a question. What would be the output for the below scenario:

77604
60120
77624
77604
77604

pavan_test · Post by **pavan_test** » Tue Jul 17, 2012 2:30 pm

There are actually 4 columns in the file

Record = 1

ID postal code1 postal code2 postal code3:

101 77604 60120 77624

expected output: (write 3 records to the output since postal code is different in all the 3 records for this ID)

ID postal code:

101 77604
101 60120
101 77624

Record =2:

ID POSTAL Code1 postal code2 postal code3
102 80101 80101 80101

expected output: (write 1 record to the o/p since postal code is same)

Record =3:

ID POSTAL Code1 postal code2 postal code3
103 77562 77504

expected output:

ID postal code: (there is no postal code, the data is empty in the column postal code2)

103 77562
103 77504

To begin with I am sorting the file in ascending order based on "ID".

ID postal code1 postal code2 postal code3:

101 77604 60120 77624
102 80101 80101 80101
103 77562 - 77504

Thanks
Pavan

dvpawankumar · Post by **dvpawankumar** » Tue Jul 17, 2012 3:53 pm

Hi Pavan,

Need a clarifiacation on your request. For the below given input, what are you expecting as output.

------------------------------
Input
------------------------------
101 77604 60120 77624
102 80101 80101 80101
103 77562 77504 77604

Is your output should be as below:

------------------------------
Output
------------------------------
101 77604
101 60120
101 77624
102 80101
103 77562
103 77504
103 77604

Thanks and Regards,
Pavan.

dvpawankumar · Post by **dvpawankumar** » Tue Jul 17, 2012 4:34 pm

Hi Pavan,

As I mentioned above, if your requiremnt is the same, then please follow the below desgin to get the required output.

Stage one
Read your data using appropriate stage.

Stage two
Use Pivot Satage. In the outputs tab, in the column derivation, define two columns. One is ID and the other one is POSTALCODE. The derivations for the columns are as below:
ID = ID
POSTALCODE= postal_code1,postal_code2,postal_code3

Stage Three
Remove duplicates stage. Give both the columns as key columns.

Stage Four
Appropriate Target stage.

Hope this helps you.

pavan_test · Post by **pavan_test** » Tue Jul 17, 2012 7:36 pm

-----------------------------
Input
------------------------------
101 77604 60120 77624
102 80101 80101 80101
103 77562 - 77604
104 77562 77562 90210
105 77562 90211 77562
106 77563 90212 90212

the postal code1, postal code2 and postal code3 are nullable columns.

-----------------------------
Desired Output
------------------------------
101 77604
101 60120
101 77624
102 80101
103 77562
103 77604
104 77562
104 90210
105 77562
105 90211
106 77563
106 90212

Thanks
Pavan

chulett · Post by **chulett** » Tue Jul 17, 2012 10:16 pm

Which doesn't really change the answer you were given. You still need to pivot the data and then remove the duplicates, you just need to add a step to filter out the records with a null postal code between them.

DSXchange

ETL Logic

ETL Logic

ETL Logic

Re: ETL Logic

Re: ETL Logic

ETL Logic