Required logic for a scenario

SagarMelam · Post by **SagarMelam** » Mon Feb 16, 2015 10:36 am

I have below records coming from source in this scenario

Column 1 Column 2
1 A
1 A
1 B
2 A
2 A
3 Z

For the column1 if it has multiple values it need to be captured seperately.

Output1:

Column 1 Column 2
1 A
1 A
1 B

So if column1 contains multiple values in column2 those should be captured.please let me know solution for this scenario.Appreciate your help

chulett · Post by **chulett** » Mon Feb 16, 2015 10:44 am

Hmmm... you show "Output1", does that mean there is also an "Output2" in the picture? Do these need to go to one destination and everything else to another or are all you are interested in are the multiples?

Also, why is "1 B" in your output rather than "2 A"?

SagarMelam · Post by **SagarMelam** » Mon Feb 16, 2015 11:04 am

Chulett,

The rest should be output 2.The logic is that for each value in column1 there should be only one value for column 2.so

1 A
1 A
1 B.

1 shouldnt contain two values,if it has it need to captured as duplicates.

1 A
1 A
1 A

Even though these are duplicates,they are valid.As it contains only one value in column 2

chulett · Post by **chulett** » Mon Feb 16, 2015 11:27 am

OK, so only one distinct value in column 2 for a value in column 1. Sounds like another 'fork join' scenario where you capture the number of distinct values in column2 per column1 and then decide what to do (i.e. where to go) when that count is 1 or >1.

What is your source?

SagarMelam · Post by **SagarMelam** » Mon Feb 16, 2015 11:58 am

Source is sql server

chulett · Post by **chulett** » Mon Feb 16, 2015 1:17 pm

OK, so the 'distinct' part could be done with a sql query as well.

ray.wurlod · Post by **ray.wurlod** » Mon Feb 16, 2015 4:58 pm

I still don't understand the requirement completely. Can you spell it out in words? Could it be as simple as a HAVING COUNT(*) > 1 clause in an SQL statement?

SagarMelam · Post by **SagarMelam** » Tue Feb 17, 2015 8:33 am

Hi ,

The requirement is that 1 value in the column 1 should not contain multiple values in column 2 and similarly 1 value in the column 2 should not contain multiple values in column 1.

Data coming like this from source

Column1 Column 2
1 A
1 A
1 B
2 C
3 C
4 C
5 E
5 E
6 F

All three records should be rejected for below scenario

Column1 Column 2
1 A
1 A
1 B

All three records should be rejected for below scenario too

Column1 Column 2
2 C
3 C
4 C

Valid Scenario:

Column 1 Column 2
5 E
5 E
6 F

chulett · Post by **chulett** » Tue Feb 17, 2015 8:37 am

I was thinking a "count distinct" but the requirements just got opened up a little bit more than what was originally posted. My answer still stands but it seems more like a "double fork join" if there is such a concept, counts on both sides of the aisle that get taken and then compared after rejoining the main flock.

info_ds · Post by **info_ds** » Tue Feb 17, 2015 8:54 am

you can try the below query.

select column1,column2,
count(distinct column2 ) over (partition by column1) as cnt
from <<table>>
order by 1,2

and then filter by 'cnt' not equal to 1

ShaneMuir · Post by **ShaneMuir** » Tue Feb 17, 2015 8:57 am

You could probably do it with 2 sql select statements and a lookup.

SQL Statement 1. Select * from table.

SQL Statement 2:

Code: Select all

SELECT *
FROM
  (SELECT column1,
    column2,
    COUNT(*) over(partition BY column1) col1cnt,
    COUNT(*) over(partition BY column2) col2cnt
  FROM
    (SELECT DISTINCT column1, column2 FROM test1
    )
  )
WHERE col1cnt=1
AND col2cnt  =1;

There are probably better ways to write Statement2 (and I am not sure of the exact syntax for over partition by in SQL Server).

Perform a lookup on the outputs of the 2 statements with key columns as Column1 and Column2. If found then the value is valid, if not found (ie a reject) then it is invalid.

info_ds · Post by **info_ds** » Tue Feb 17, 2015 9:18 am

Ok..There seema to be a liitle change in requirement.Check this one

select column1,column2 from(
select column1,column2,
count(distinct column2 ) over (partition by column1) as cnt1,
count(distinct column1 ) over (partition by column2) as cnt2
from table)
where cnt1<>1 or cnt2<>1
order by 1,2

chulett · Post by **chulett** » Tue Feb 17, 2015 10:29 am

It has changed (well, been clarified) with every reply so far.

Best to start everything off proper and describe your full requirements in your initial post. That let's us get to the heart of the matter right away rather than all this back and forth, teasing out requirements one by one.