I was thinking on the same lines too..To use an aggregator to get the duplicate set and use as lookup. Wanted to check if there is a simpler way as this requirement is very simple..SriKara wrote:I guess the tricky part here is to define that dummy column with a value when there are duplicates on '0001' . May be you can pull out an extra link from source where the non-key column has '0001' and group by on the 'key' column to find the 'counts'. Count more than 1 would tell you that key has duplicates on '0001'. Use the resultant set as a lookup.kris007 wrote:Nope. You have to define your dummy column in a way that you can identify the values you need drop when they have duplicates.
Or is there an easier way!!!
Pick a particular duplicate record
Moderators: chulett, rschirm, roy
In my mind this is still a simple design unless I am not understanding your requirement completely. I looked at your first post and I don't think the example you provided correctly reflects the issue you have. Can you be more specific with your example, lets say with about 4-5 sample records and how exactly you want your output to look like?
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Sample data with the dummy column defined..Whenver there is 0001 in Field2, Flag=1 Else Flag=0kris007 wrote:In my mind this is still a simple design unless I am not understanding your requirement completely. I looked at your first post and I don't think the example you provided correctly reflects the issue you have. Can you be more specific with your example, lets say with about 4-5 sample records and how exactly you want your output to look like?
Filed1(Key) Field2 Flag
ABC 0001 1
ABC 0002 0
DEF 0003 0
DEF 0004 0
EFG 0001 1
My requirement is to remove the 0001 when it is coming as a dupe on key field Field1.
So output shud be
Filed1(Key) Field2
ABC 0002
DEF 0003
EFG 0001
The desing which I am thinking is
Source---->Transforemer
From transformer, two links.One to aggregator to find the duplicate Field1 list(set1).Second to remove duplicate based on Field1 and Flag (set2).
Then lookup set2 on set1 on Field1.If the lookp matches and Flag=1 thenremove that record..Else pass it on.
Here i will be using aggregator,remove duplicate,lookup,and filter too which I thought is way too much processing for this simple logic.
I dont know how to create psudocode here to make it more clear.
[/img]
Well, that's exactly the design I mentioned would do. In the Transformer Stage create the Dummy column similarly as mentioned earlier.
In the Remove Duplicates Stage, mention Field 1 as the Key on the Stage properties tab. But in the Input >> Partitioning tab after selecting hash partitioning and selecting Perform Sort option select Field 1 and Dummy Column as keys, then on the right hand side, right click on the Dummy Column and select Sort direction as Ascending and usage as only Sorting. Also on the Stage properties tab, make sure you set Duplicate to Retain = First. Now, when you run your job, based upon the examples provided, for Field 1 = ABC you will pick up Field 2 = 0002 because you are sorting your flag in an ascending order and the flag is 0 for this record. For Field 1 = EFG because there is only one record available you will just pass it through and you will get your required output.
This is a very simple design and I have done it in the past multiple times. Let us know how you do.
Code: Select all
If Field2 = '0001' Then 1 Else 0
This is a very simple design and I have done it in the past multiple times. Let us know how you do.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
kris007 wrote:Well, that's exactly the design I mentioned would do. In the Transformer Stage create the Dummy column similarly as mentioned earlier.In the Remove Duplicates Stage, mention Field 1 as the Key on the Stage properties tab. But in the Input >> Partitioning tab after selecting hash partitioning and selecting Perform Sort option select Field 1 and Dummy Column as keys, then on the right hand side, right click on the Dummy Column and select Sort direction as Ascending and usage as only Sorting. Also on the Stage properties tab, make sure you set Duplicate to Retain = First. Now, when you run your job, based upon the examples provided, for Field 1 = ABC you will pick up Field 2 = 0002 because you are sorting your flag in an ascending order and the flag is 0 for this record. For Field 1 = EFG because there is only one record available you will just pass it through and you will get your required output.Code: Select all
If Field2 = '0001' Then 1 Else 0
This is a very simple design and I have done it in the past multiple times. Let us know how you do.
Yes. This should work!. Thanks.
I need to capture the duplicate rows too. SO I think I should use sort stage with keychange.
I think this is taken care by defining the flag column. So in any case the duplicate record with '0001' will be flagged as 1 and will appear as the last row of the group which wil be always eliminated if its a duplicate.SriKara wrote:'0001' was just an example. In his case, there are values greater than and less than that the constant number. So sorting might not actually work for him.
You are right. Thanks.dsusersaj wrote:I think this is taken care by defining the flag column. So in any case the duplicate record with '0001' will be flagged as 1 and will appear as the last row of the group which wil be always eliminated if its a duplicate.SriKara wrote:'0001' was just an example. In his case, there are values greater than and less than that the constant number. So sorting might not actually work for him.
Regards,
SriKara
SriKara
You didn't mention that in your original post. As you mentioned you can do it using the Sort stage with Create Key Change Column set to True and capturing the required records downstream.dsusersaj wrote:I need to capture the duplicate rows too. SO I think I should use sort stage with keychange.
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
Yeah. I just wanted the logic and I knew that to capture dupes, I can use sort stage with keychange option.kris007 wrote:You didn't mention that in your original post. As you mentioned you can do it using the Sort stage with Create Key Change Column set to True and capturing the required records downstream.dsusersaj wrote:I need to capture the duplicate rows too. SO I think I should use sort stage with keychange.
Thank you for your time and effort.