Uniqueness of data in a table

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Uniqueness of data in a table

Post by yaminids »

Hello friends,

I am trying to load unique data into a table. The source file contains duplicate data. The formula for uniqueness is the combination of 3 columns in the source file.
My intension is to load the data from source file into two tables A and B in the target.

I am trying to apply the following rules:

1) All the rows from the source go into table A, but only unique rows must go into table B.
2) For every unique row in table B, I have to generate a sequence number which I have to enter in Table A.

Can anyone help me with this.
Thanks a lot in advance

-Yamini
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Is the data coming from the source file sorted by the three columns that define uniqueness of a row? Could you also please give approx number of records you expect to process using this job.

Regards,
-Sumit
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Uniqueness of data in a table

Post by yaminids »

Hello Sumit,

Yes the data in the source is sorted and there are approximately 40 million rows in the source, which I think rules out the use of Hash file stage.

Thanks
-Yamini
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use 3 stage variables in the transformer following the source stage.

1.) newKeyCombination (default 'N')
2.) prevKeyCombination (default '')
3.) seqNo (default 0)

newKeyCombination <- if link.newKey <> prevKeyCombination Or IsNULL(prevKeyCombination) then 'Y' Else 'N'
prevKeyCombination <- link.newKey
seqNo <- if newKeyCombination = 'Y' then seqNo + 1 Else seqNo

For A, move all values

For B, have a constraint saying newKeyCombination = 'Y'
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Re: Uniqueness of data in a table

Post by sumitgulati »

Well since the data is sorted that solves the problem.

Declare four staging variables say UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar, ColBVar and ColCVar to ColA, ColB and ColC from the source.

In the derivation of UniqueCnt have the following code:
If SourceLink.ColA<>ColAVar OR SourceLink.ColB<>ColBVar OR SourceLink.ColC<>ColCVar Then UniqueCnt+1 Else UniqueCnt
Remember the order in which you define the stage variables is important because they get executed in the same order. Since Stage variable ColAvar, ColBVar and ColCVar are defined below stage variable UniqueCnt they hold the previous rows values when used in the derivation of UniqueCnt
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Addition to the previous reply.

Using the same logic you can define a flag. Use this flag to push the values to Table B only when a new combination comes.

Push all the records to Table A

Use the value of UniqueCnt in both TableA and Table B
Aravind
Participant
Posts: 16
Joined: Mon Dec 27, 2004 4:17 pm

Post by Aravind »

But if the source data contains a row which got loaded to Table B by lletz say a previous days load.. Then I doubt if with the above logic
"For every unique row in table B, I have to generate a sequence number which I have to enter in Table A." this requirement will me met or not. I am sorry if I am wrong
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

You are right Arvind but we have no informaion on
1) the frequency of load
2) if the source file contains new set of data everytime or not

Regards,
-Sumit
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Uniqueness of data in a table

Post by yaminids »

Hello all,

Thank you very much for your valuable suggestions. This is a one time load so the question of next load does not arise.
Thank you all once again

-Yamini
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

Why not let the target DB engine do the heavy lifting?
First load all the rows into table A.
Then select unique from A into B, generating your B sequence number as you go according to the capability of your DB (Oracle sequence, etc.)

Carter
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Uniqueness of data in a table

Post by yaminids »

Hello Sumit,

I used the logic you described to define a flag inorder the load the data into table 'B'. The flag is only working when the input columns does not have any null values. Can you explain to me how to deal with null values in this scenario.

Thanks in advance
-Yamini
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Use a IsNULL() function to replace the input link value to some value which will never occur in the input.
sumitgulati
Participant
Posts: 197
Joined: Mon Feb 17, 2003 11:20 pm
Location: India

Post by sumitgulati »

Hi Yamini,

The <> does not work with Null values. You need to do null handling to solve this.

Follow this approach:

Declare seven staging variables say ColA, ColB, ColC, UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar and ColA to
If IsNull(SourceLink.ColA) Then ' ' Else SourceLink.ColA

Similarly map ColB, ColC, ColBVar and ColCVar.

Now use the following expression in the derivation of UniqueCnt:
If ColA<>ColAVar OR ColB<>ColBVar OR ColC<>ColCVar Then UniqueCnt+1 Else UniqueCnt
This should work for you. Again do not forget the sequence of the sequence variables.

Regards,
-Sumit
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Yamini,

Did you try the solution mentioned by me earlier in this post? I assume that will give you a simple and straightforward solution.
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Uniqueness of data in a table

Post by yaminids »

Hello Sumit and Sainath,

Thank you for your help. I got the solution, for which I have been trying for a long time. Sainath, I did try your logic and I think its very easy to implement.

-Yamini
Post Reply