Required logic for a scenario

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

Required logic for a scenario

Post by SagarMelam »

I have below records coming from source in this scenario

Column 1 Column 2
1 A
1 A
1 B
2 A
2 A
3 Z


For the column1 if it has multiple values it need to be captured seperately.

Output1:

Column 1 Column 2
1 A
1 A
1 B

So if column1 contains multiple values in column2 those should be captured.please let me know solution for this scenario.Appreciate your help
Sagar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... you show "Output1", does that mean there is also an "Output2" in the picture? Do these need to go to one destination and everything else to another or are all you are interested in are the multiples?

Also, why is "1 B" in your output rather than "2 A"? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

Post by SagarMelam »

Chulett,

The rest should be output 2.The logic is that for each value in column1 there should be only one value for column 2.so

1 A
1 A
1 B.

1 shouldnt contain two values,if it has it need to captured as duplicates.

1 A
1 A
1 A

Even though these are duplicates,they are valid.As it contains only one value in column 2
Sagar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, so only one distinct value in column 2 for a value in column 1. Sounds like another 'fork join' scenario where you capture the number of distinct values in column2 per column1 and then decide what to do (i.e. where to go) when that count is 1 or >1.

What is your source?
-craig

"You can never have too many knives" -- Logan Nine Fingers
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

Post by SagarMelam »

Source is sql server
Sagar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK, so the 'distinct' part could be done with a sql query as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I still don't understand the requirement completely. Can you spell it out in words? Could it be as simple as a HAVING COUNT(*) > 1 clause in an SQL statement?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SagarMelam
Participant
Posts: 26
Joined: Mon Apr 21, 2008 4:03 am
Location: Amalapuram

Post by SagarMelam »

Hi ,

The requirement is that 1 value in the column 1 should not contain multiple values in column 2 and similarly 1 value in the column 2 should not contain multiple values in column 1.


Data coming like this from source

Column1 Column 2
1 A
1 A
1 B
2 C
3 C
4 C
5 E
5 E
6 F

All three records should be rejected for below scenario


Column1 Column 2
1 A
1 A
1 B


All three records should be rejected for below scenario too


Column1 Column 2
2 C
3 C
4 C

Valid Scenario:

Column 1 Column 2
5 E
5 E
6 F
Sagar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I was thinking a "count distinct" but the requirements just got opened up a little bit more than what was originally posted. My answer still stands but it seems more like a "double fork join" if there is such a concept, counts on both sides of the aisle that get taken and then compared after rejoining the main flock.
-craig

"You can never have too many knives" -- Logan Nine Fingers
info_ds
Charter Member
Charter Member
Posts: 88
Joined: Thu Feb 10, 2005 4:36 am
Location: B\'lore

Post by info_ds »

you can try the below query.

select column1,column2,
count(distinct column2 ) over (partition by column1) as cnt
from <<table>>
order by 1,2

and then filter by 'cnt' not equal to 1
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

You could probably do it with 2 sql select statements and a lookup.

SQL Statement 1. Select * from table.

SQL Statement 2:

Code: Select all

SELECT *
FROM
  (SELECT column1,
    column2,
    COUNT(*) over(partition BY column1) col1cnt,
    COUNT(*) over(partition BY column2) col2cnt
  FROM
    (SELECT DISTINCT column1, column2 FROM test1
    )
  )
WHERE col1cnt=1
AND col2cnt  =1;
There are probably better ways to write Statement2 (and I am not sure of the exact syntax for over partition by in SQL Server).

Perform a lookup on the outputs of the 2 statements with key columns as Column1 and Column2. If found then the value is valid, if not found (ie a reject) then it is invalid.
info_ds
Charter Member
Charter Member
Posts: 88
Joined: Thu Feb 10, 2005 4:36 am
Location: B\'lore

Post by info_ds »

Ok..There seema to be a liitle change in requirement.Check this one

select column1,column2 from(
select column1,column2,
count(distinct column2 ) over (partition by column1) as cnt1,
count(distinct column1 ) over (partition by column2) as cnt2
from table)
where cnt1<>1 or cnt2<>1
order by 1,2
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It has changed (well, been clarified) with every reply so far.

:idea: Best to start everything off proper and describe your full requirements in your initial post. That let's us get to the heart of the matter right away rather than all this back and forth, teasing out requirements one by one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply