Id Role Duty
100 *Athlete Athlete
100 *Athlete Running
100 *Athlete Swimming
If the Ids are equal then i should check for the columns role and duty :
The rule is if ids are equal then output only the row whose role and duty are equal after trimming off the * from the role.
So in this case first row is the output.
If you're using IS 8.5, you can use the transformer looping functionality to solve this pretty easily.
If you're using 8.0 or 8.1, one option is to use a fork-join structure to add a rows-per-id count to the records, then use a transformer to drop those duplicates.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
I think what jwiles is suggesting is that if rows-per-id is 1 then pass the row through but when rows-per-id is greater than 1 in the transformer then you do your logic to trim off the * and compare role with duty, if equal then pass the row through otherwise drop the row. You can do all that with stage variables and a constraint.
My reasoning behind using rows-per-id is based upon your requirements description:
The rule is if ids are equal then output only the row whose role and duty are equal after trimming off the * from the role.
To me, this means that if you have more than one row for an ID, keep only the row(s?) which have equal values for role and duty (minus the '*'). In order to do this, you need to calculate the number of rows present for each ID ("rows-per-id") as I don't see that value in the example data you provided. The logic qt-ky suggests would be appropriate for the transformer.
Another method that might work involves sorting on ID, then resorting based on the ID and a keychange field assigned in the first sort. The following transformer could use keychange in a similar manner as rowsperid. Either method will provide the results required.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
There are three conditions I do not see represented in your sample data:
1) Only a single row is present for an ID (i.e. no duplicates on ID)
2) There are duplicates present for an ID, but none that meet the keep criteria you mention
3) There are duplicates present for an ID and multiple rows meet the keep criteria you mention
It may be that none of these conditions exist in your data, but as a seasoned developer and data processor, I can't safely assume that. You don't have those conditions listed in the business rules you have provided so far.
#1 as the situation where I see the rows-per-id being used (you have a new ID and there's only 1 row for it...how do you know that in a pre-8.5 transformer and what do you do?)
#2 Do you delete all rows, or keep one of them?
#3 Do you keep all rows that meet the criteria, or only one of them?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
I am lost with the requirement .Also, I am trying to run a query against the db but its taking forever . I need to split this query and design a DS job . I just got some sample data with me. Let me get the actual data from the query first , analyse it and will post the complete requirement . Sorry if i have wasted your time.
Thanks Jwiles with your suggestions given above i would calrify myself and come back here again in a while.
Thanks Sura . your understanding is right as per my requirement. As you said it looked so simple to me. I can first check the condition to see if role = duty and then remove duplicates based on id in the downstream. If role is not matching with duty i need to reject them. These are my assumptions as per the sample data and requirements i have got. I really think there is something missing in this. That is the reason i want to get the actual data from the whole query i have , analyse it and come back.
Your first requirement could be met by a Remove Duplicates stage preserving the First record from each group, assuming the data are sorted correctly.
You specified removal of the asterisk, but the output in your example preserved the asterisk. Removal is most easily effected by a Convert() function.
If you have to have a Transformer stage, then you can effect the removal of duplicates in that stage also, by setting up stage variables to detect a change in ID - one of these becomes your output constraint.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
id name bid_no bname inum custname
100 *sam AC100 sam 64546 Assoc
200 tom AC345 tom 45645 Build
200 tom AC345 tom 45666 Build
200 tom AC345 tom null Build
500 jam AC888 jam 98080 facet
300 can BC399 ban 52344 counter
300 can AC377 can 52344 counter
288 *rom AC623 rom 12345 Adam inc
288 rom AC623 fom 12345 Adam inc
656 gem BC143 len 65434 kroger llc
656 gem BC143 len 65434 kroger
ouput:
id name bid_no bname inum custname
100 *sam AC100 sam 64546 Assoc
200 tom AC345 tom 45645 Build
200 tom AC345 tom 45666 Build
200 tom AC345 tom null Build
500 jam AC888 jam 98080 facet
300 can AC377 can 52344 counter
288 *rom AC623 rom 12345 Adam inc
656 gem BC143 len 65434 kroger llc
First look for duplicates in ids.
CASE 1:I took records with id=100 and 500 becuase of its single occurence w.r.t to id.
CASE 2:Three records with id=200 and bid_no=AC345 because their ids and bid_nos are same.
CASE 3: Took one record with id=300 bacause its bid_no starts with AC (not BC).
CASE 4: Took one record with id=288 because its name=bname when compared after trimming the * from name.
CASE 5: Took one record with id=656 because both the records are same
and retained only one.
Please help me with suggestions to get these in datastage.
Sort the data, use stage variable(s) for different scenario. Use a variable for flag too. You can achive it.
My personal request is not to expect more help from others. Split your needs (For example if you have more than one records and the values are differ then pass it / drop it) and try to apply one by one filter and get the result in the file. It will help you get more control to you.
If you can't get it by yourself, then post what condition you tried to achive what?
so that someone can correct that.
I understood what you need and i can post the total code what you need and it will work, but it will not help you!