Find groups of recs that is missing data from other groups

tbtcust · Post by **tbtcust** » Mon Dec 02, 2013 8:13 pm

Hello All, I have a need to look at groups of records and detect which group is missing data that is in other groups. The number of groups is always unknown. See example below. Is this possible?

Thanks in advance for any help.

Input:
GRP_ID, Fld_1
=-=-=-=-=-=-=-
AAAAA, 11111
AAAAA, 22222
AAAAA, 11111

BBBBB, 22222
BBBBB, 33333
BBBBB, 22222

CCCCC, 66666
CCCCC, 66666
CCCCC, 99999

Output:
GRP_ID, MIS_GR_ID, Mis_Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=
AAAAA, BBBBB, 33333
AAAAA, CCCCC, 66666
AAAAA, CCCCC, 99999
BBBBB, AAAAA, 11111
BBBBB, CCCCC, 66666
Etc.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Mon Dec 02, 2013 8:55 pm

I admit I had to look at your example for several minutes before I figured out the exact process it was illustrating.

I'd say this would be difficult to do in DataStage. It would be better to use a programming language with loop constructs. The solution requires multiple passes to process the groups and build the various combinations. Not trivial in any event.

I'd say try using another tool... Though if someone else can come up with a reasonable DataStage solution I'd be interested in hearing it.

bart12872 · Post by **bart12872** » Tue Dec 03, 2013 10:29 am

Well, you can make it. but be careful, my suggestion contains a Cartesian product, so if the volume is too big, you will explode your time execution.
I will use your example in my demonstration.

You take all the field Fld_1 and make a remove duplicate, and add a join key
you will have:

Code: Select all

you take all your groups and remove duplicate, and add a join key:

Code: Select all

AAAAA;1
BBBBB;1
CCCCC;1

You make a Cartesian product based on join key
You will have:

Code: Select all

AAAAA;11111;1;
AAAAA;22222;1;
AAAAA;33333;1;
AAAAA;66666;1;
AAAAA;99999;1;
BBBBB;11111;1;
BBBBB;22222;1;
BBBBB;33333;1;
BBBBB;66666;1;
BBBBB;99999;1;
CCCCC;11111;1;
CCCCC;22222;1;
CCCCC;33333;1;
CCCCC;66666;1;
CCCCC;99999;1;

Then you make an left join with those data and your source based on group and field1. You will have:

Code: Select all

AAAAA;11111;1;AAAAA, 11111
AAAAA;22222;1;AAAAA, 22222
AAAAA;33333;1;null
AAAAA;66666;1;null
AAAAA;99999;1;null
BBBBB;11111;1;null
BBBBB;22222;1;BBBBB, 22222
BBBBB;33333;1;BBBBB, 33333
BBBBB;66666;1;null
BBBBB;99999;1;null
CCCCC;11111;1;null
CCCCC;22222;1;null
CCCCC;33333;1;null
CCCCC;66666;1;CCCCC, 66666
CCCCC;99999;1;CCCCC, 99999

Then you filter the null value for the end result:

Code: Select all

AAAAA;33333;null
AAAAA;66666;null
AAAAA;99999;null
BBBBB;11111;null
BBBBB;66666;null
BBBBB;99999;null
CCCCC;11111;null
CCCCC;22222;null
CCCCC;33333;null

tbtcust · Post by **tbtcust** » Fri Dec 13, 2013 1:53 am

Thank you bart12872. This seems to work just fine. Thanks again.