Hello All, I have a need to look at groups of records and detect which group is missing data that is in other groups. The number of groups is always unknown. See example below. Is this possible?
Thanks in advance for any help.
Input:
GRP_ID, Fld_1
=-=-=-=-=-=-=-
AAAAA, 11111
AAAAA, 22222
AAAAA, 11111
BBBBB, 22222
BBBBB, 33333
BBBBB, 22222
CCCCC, 66666
CCCCC, 66666
CCCCC, 99999
Output:
GRP_ID, MIS_GR_ID, Mis_Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=
AAAAA, BBBBB, 33333
AAAAA, CCCCC, 66666
AAAAA, CCCCC, 99999
BBBBB, AAAAA, 11111
BBBBB, CCCCC, 66666
Etc.
Find groups of recs that is missing data from other groups
Moderators: chulett, rschirm, roy
I admit I had to look at your example for several minutes before I figured out the exact process it was illustrating.
I'd say this would be difficult to do in DataStage. It would be better to use a programming language with loop constructs. The solution requires multiple passes to process the groups and build the various combinations. Not trivial in any event.
I'd say try using another tool... Though if someone else can come up with a reasonable DataStage solution I'd be interested in hearing it.
I'd say this would be difficult to do in DataStage. It would be better to use a programming language with loop constructs. The solution requires multiple passes to process the groups and build the various combinations. Not trivial in any event.
I'd say try using another tool... Though if someone else can come up with a reasonable DataStage solution I'd be interested in hearing it.
Last edited by asorrell on Tue Dec 03, 2013 10:39 am, edited 1 time in total.
Well, you can make it. but be careful, my suggestion contains a Cartesian product, so if the volume is too big, you will explode your time execution.
I will use your example in my demonstration.
You take all the field Fld_1 and make a remove duplicate, and add a join key
you will have:
you take all your groups and remove duplicate, and add a join key:
You make a Cartesian product based on join key
You will have:
Then you make an left join with those data and your source based on group and field1. You will have:
Then you filter the null value for the end result:
I will use your example in my demonstration.
You take all the field Fld_1 and make a remove duplicate, and add a join key
you will have:
Code: Select all
11111;1
22222;1
33333;1
66666;1
99999;1
Code: Select all
AAAAA;1
BBBBB;1
CCCCC;1
You will have:
Code: Select all
AAAAA;11111;1;
AAAAA;22222;1;
AAAAA;33333;1;
AAAAA;66666;1;
AAAAA;99999;1;
BBBBB;11111;1;
BBBBB;22222;1;
BBBBB;33333;1;
BBBBB;66666;1;
BBBBB;99999;1;
CCCCC;11111;1;
CCCCC;22222;1;
CCCCC;33333;1;
CCCCC;66666;1;
CCCCC;99999;1;
Code: Select all
AAAAA;11111;1;AAAAA, 11111
AAAAA;22222;1;AAAAA, 22222
AAAAA;33333;1;null
AAAAA;66666;1;null
AAAAA;99999;1;null
BBBBB;11111;1;null
BBBBB;22222;1;BBBBB, 22222
BBBBB;33333;1;BBBBB, 33333
BBBBB;66666;1;null
BBBBB;99999;1;null
CCCCC;11111;1;null
CCCCC;22222;1;null
CCCCC;33333;1;null
CCCCC;66666;1;CCCCC, 66666
CCCCC;99999;1;CCCCC, 99999
Code: Select all
AAAAA;33333;null
AAAAA;66666;null
AAAAA;99999;null
BBBBB;11111;null
BBBBB;66666;null
BBBBB;99999;null
CCCCC;11111;null
CCCCC;22222;null
CCCCC;33333;null