Find groups of recs that is missing data from other groups

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Find groups of recs that is missing data from other groups

Post by tbtcust »

Hello All, I have a need to look at groups of records and detect which group is missing data that is in other groups. The number of groups is always unknown. See example below. Is this possible?

Thanks in advance for any help.


Input:
GRP_ID, Fld_1
=-=-=-=-=-=-=-
AAAAA, 11111
AAAAA, 22222
AAAAA, 11111

BBBBB, 22222
BBBBB, 33333
BBBBB, 22222

CCCCC, 66666
CCCCC, 66666
CCCCC, 99999

Output:
GRP_ID, MIS_GR_ID, Mis_Data
=-=-=-=-=-=-=-=-=-=-=-=-=-=
AAAAA, BBBBB, 33333
AAAAA, CCCCC, 66666
AAAAA, CCCCC, 99999
BBBBB, AAAAA, 11111
BBBBB, CCCCC, 66666
Etc.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I admit I had to look at your example for several minutes before I figured out the exact process it was illustrating.

I'd say this would be difficult to do in DataStage. It would be better to use a programming language with loop constructs. The solution requires multiple passes to process the groups and build the various combinations. Not trivial in any event.

I'd say try using another tool... Though if someone else can come up with a reasonable DataStage solution I'd be interested in hearing it.
Last edited by asorrell on Tue Dec 03, 2013 10:39 am, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

Well, you can make it. but be careful, my suggestion contains a Cartesian product, so if the volume is too big, you will explode your time execution.
I will use your example in my demonstration.

You take all the field Fld_1 and make a remove duplicate, and add a join key
you will have:

Code: Select all

11111;1
22222;1
33333;1
66666;1
99999;1
you take all your groups and remove duplicate, and add a join key:

Code: Select all

AAAAA;1
BBBBB;1
CCCCC;1
You make a Cartesian product based on join key
You will have:

Code: Select all

AAAAA;11111;1;
AAAAA;22222;1;
AAAAA;33333;1;
AAAAA;66666;1;
AAAAA;99999;1;
BBBBB;11111;1;
BBBBB;22222;1;
BBBBB;33333;1;
BBBBB;66666;1;
BBBBB;99999;1;
CCCCC;11111;1;
CCCCC;22222;1;
CCCCC;33333;1;
CCCCC;66666;1;
CCCCC;99999;1;
Then you make an left join with those data and your source based on group and field1. You will have:

Code: Select all

AAAAA;11111;1;AAAAA, 11111
AAAAA;22222;1;AAAAA, 22222
AAAAA;33333;1;null
AAAAA;66666;1;null
AAAAA;99999;1;null
BBBBB;11111;1;null
BBBBB;22222;1;BBBBB, 22222
BBBBB;33333;1;BBBBB, 33333
BBBBB;66666;1;null
BBBBB;99999;1;null
CCCCC;11111;1;null
CCCCC;22222;1;null
CCCCC;33333;1;null
CCCCC;66666;1;CCCCC, 66666
CCCCC;99999;1;CCCCC, 99999
Then you filter the null value for the end result:

Code: Select all

AAAAA;33333;null
AAAAA;66666;null
AAAAA;99999;null
BBBBB;11111;null
BBBBB;66666;null
BBBBB;99999;null
CCCCC;11111;null
CCCCC;22222;null
CCCCC;33333;null
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Thank you bart12872. This seems to work just fine. Thanks again.
Post Reply