Uniqueness of data in a table
Moderators: chulett, rschirm, roy
Uniqueness of data in a table
Hello friends,
I am trying to load unique data into a table. The source file contains duplicate data. The formula for uniqueness is the combination of 3 columns in the source file.
My intension is to load the data from source file into two tables A and B in the target.
I am trying to apply the following rules:
1) All the rows from the source go into table A, but only unique rows must go into table B.
2) For every unique row in table B, I have to generate a sequence number which I have to enter in Table A.
Can anyone help me with this.
Thanks a lot in advance
-Yamini
I am trying to load unique data into a table. The source file contains duplicate data. The formula for uniqueness is the combination of 3 columns in the source file.
My intension is to load the data from source file into two tables A and B in the target.
I am trying to apply the following rules:
1) All the rows from the source go into table A, but only unique rows must go into table B.
2) For every unique row in table B, I have to generate a sequence number which I have to enter in Table A.
Can anyone help me with this.
Thanks a lot in advance
-Yamini
-
- Participant
- Posts: 197
- Joined: Mon Feb 17, 2003 11:20 pm
- Location: India
Uniqueness of data in a table
Hello Sumit,
Yes the data in the source is sorted and there are approximately 40 million rows in the source, which I think rules out the use of Hash file stage.
Thanks
-Yamini
Yes the data in the source is sorted and there are approximately 40 million rows in the source, which I think rules out the use of Hash file stage.
Thanks
-Yamini
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Use 3 stage variables in the transformer following the source stage.
1.) newKeyCombination (default 'N')
2.) prevKeyCombination (default '')
3.) seqNo (default 0)
newKeyCombination <- if link.newKey <> prevKeyCombination Or IsNULL(prevKeyCombination) then 'Y' Else 'N'
prevKeyCombination <- link.newKey
seqNo <- if newKeyCombination = 'Y' then seqNo + 1 Else seqNo
For A, move all values
For B, have a constraint saying newKeyCombination = 'Y'
1.) newKeyCombination (default 'N')
2.) prevKeyCombination (default '')
3.) seqNo (default 0)
newKeyCombination <- if link.newKey <> prevKeyCombination Or IsNULL(prevKeyCombination) then 'Y' Else 'N'
prevKeyCombination <- link.newKey
seqNo <- if newKeyCombination = 'Y' then seqNo + 1 Else seqNo
For A, move all values
For B, have a constraint saying newKeyCombination = 'Y'
-
- Participant
- Posts: 197
- Joined: Mon Feb 17, 2003 11:20 pm
- Location: India
Re: Uniqueness of data in a table
Well since the data is sorted that solves the problem.
Declare four staging variables say UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar, ColBVar and ColCVar to ColA, ColB and ColC from the source.
In the derivation of UniqueCnt have the following code:
Declare four staging variables say UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar, ColBVar and ColCVar to ColA, ColB and ColC from the source.
In the derivation of UniqueCnt have the following code:
Remember the order in which you define the stage variables is important because they get executed in the same order. Since Stage variable ColAvar, ColBVar and ColCVar are defined below stage variable UniqueCnt they hold the previous rows values when used in the derivation of UniqueCntIf SourceLink.ColA<>ColAVar OR SourceLink.ColB<>ColBVar OR SourceLink.ColC<>ColCVar Then UniqueCnt+1 Else UniqueCnt
-
- Participant
- Posts: 197
- Joined: Mon Feb 17, 2003 11:20 pm
- Location: India
But if the source data contains a row which got loaded to Table B by lletz say a previous days load.. Then I doubt if with the above logic
"For every unique row in table B, I have to generate a sequence number which I have to enter in Table A." this requirement will me met or not. I am sorry if I am wrong
"For every unique row in table B, I have to generate a sequence number which I have to enter in Table A." this requirement will me met or not. I am sorry if I am wrong
-
- Participant
- Posts: 197
- Joined: Mon Feb 17, 2003 11:20 pm
- Location: India
Uniqueness of data in a table
Hello all,
Thank you very much for your valuable suggestions. This is a one time load so the question of next load does not arise.
Thank you all once again
-Yamini
Thank you very much for your valuable suggestions. This is a one time load so the question of next load does not arise.
Thank you all once again
-Yamini
Uniqueness of data in a table
Hello Sumit,
I used the logic you described to define a flag inorder the load the data into table 'B'. The flag is only working when the input columns does not have any null values. Can you explain to me how to deal with null values in this scenario.
Thanks in advance
-Yamini
I used the logic you described to define a flag inorder the load the data into table 'B'. The flag is only working when the input columns does not have any null values. Can you explain to me how to deal with null values in this scenario.
Thanks in advance
-Yamini
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Participant
- Posts: 197
- Joined: Mon Feb 17, 2003 11:20 pm
- Location: India
Hi Yamini,
The <> does not work with Null values. You need to do null handling to solve this.
Follow this approach:
Declare seven staging variables say ColA, ColB, ColC, UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar and ColA to
Similarly map ColB, ColC, ColBVar and ColCVar.
Now use the following expression in the derivation of UniqueCnt:
Regards,
-Sumit
The <> does not work with Null values. You need to do null handling to solve this.
Follow this approach:
Declare seven staging variables say ColA, ColB, ColC, UniqueCnt, ColAvar, ColBVar and ColCVar in the same order. Initialize all the stagevariables to 0. Map ColAvar and ColA to
If IsNull(SourceLink.ColA) Then ' ' Else SourceLink.ColA
Similarly map ColB, ColC, ColBVar and ColCVar.
Now use the following expression in the derivation of UniqueCnt:
This should work for you. Again do not forget the sequence of the sequence variables.If ColA<>ColAVar OR ColB<>ColBVar OR ColC<>ColCVar Then UniqueCnt+1 Else UniqueCnt
Regards,
-Sumit
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Uniqueness of data in a table
Hello Sumit and Sainath,
Thank you for your help. I got the solution, for which I have been trying for a long time. Sainath, I did try your logic and I think its very easy to implement.
-Yamini
Thank you for your help. I got the solution, for which I have been trying for a long time. Sainath, I did try your logic and I think its very easy to implement.
-Yamini