validation on records

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
basu.ds
Participant
Posts: 118
Joined: Tue Feb 06, 2007 12:59 am
Location: Bangalore

validation on records

Post by basu.ds »

Hi Can any one help me that i want to know finding the multiple records match with single reord which is transaction record if matches with multple records in master table how should i find using server joobs.
thanks in advacnce
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If there are multiple matches how can you call it a master table?

Please take the time to write out a specification (in English, not in "DataStage") about what the rules are. What is your source, where is the master table, on what basis do you match your source records, what (precisely) do you mean by a "transaction record"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
basu.ds
Participant
Posts: 118
Joined: Tue Feb 06, 2007 12:59 am
Location: Bangalore

Post by basu.ds »

i know that is not called masterdata but i want know if one match with multiple records in with particular table how can i found wether particular record match with multiple record in table.assume that my source is CSV file but i want to validate with table.there is a possiblity that duplicates in my table if that is the case how can i found
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are several de-duplication techinques available. The easiest is probably to run a query against the table

Code: Select all

SELECT keycol,COUNT(*) FROM table GROUP BY keycol HAVING COUNT(*) > 1;
If you now want to compare that against the contents of some CSV file, load the SQL result into a hashed file and compare against that; any lookup that does not fail has duplicates in the target table for that key value.
A slower solution would be to do the lookup directly against the target table.

Code: Select all

SELECT keycol,COUNT(*) FROM table WHERE keycol = ? GROUP BY keycol HAVING COUNT(*) > 1;
or leave out the HAVING clause and test the returned count.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
basu.ds
Participant
Posts: 118
Joined: Tue Feb 06, 2007 12:59 am
Location: Bangalore

Post by basu.ds »

Using data satge i want to reject that data if multiple matc occure
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

bellow job design can help.

OCI stage(1)---->transformer1--->aggregator-------> transformer2---->transformer3--->OCI stage(2).

i aggregation stage you find out duplication like

select deptno,count(deptno) from emp group by deptno.

then in transformer 3 apply condition that count > 1 rows should propagate further, just deptno propagated.

in OCI stage(2) write a user defined sql like below

delete from empsb a where rowid > (select min(rowid) from empsb b where a.deptno = b.deptno and b.deptno = :1)

hope this helps you.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

sachin1, where did you read in the specification from basu.ds that there is a need to delete anything?

"Reject records" is not the same as "delete records".

Rejecting records is simply a matter of having a constraint on a Transformer stage output link that does not allow records to pass if the count is greater than one (using my second method) or if the lookup succeeds (using my first method). If you want to capture the rejected records, simply add a reject-handling output link to the Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
basu.ds
Participant
Posts: 118
Joined: Tue Feb 06, 2007 12:59 am
Location: Bangalore

Post by basu.ds »

hi sachin thanks for response but i want to make a look up with table.
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

sorry for wrong interpretation,

i have taken an input file emp.txt with below records
1,sac,10
2,mark,10
3,pras,20
4,sush,20
6,ritu,30
7,ash,40
8,ddd,50

i have dept table with below records


SQL> select deptno from dept;

DEPTNO
----------
10
20
30
40

SQL>


see that your input record is sorted for lookup.

stage variables are used for counting number greater than 0 for matched records, then in next transformer depending upon count variable you can reject records.
basu.ds
Participant
Posts: 118
Joined: Tue Feb 06, 2007 12:59 am
Location: Bangalore

Post by basu.ds »

no i know this but i want opposite to your answer.like please change your data i mean emp.txt file should be dept table and table should be emp.txt data.if one record match with multiple records in table that record shpuld reject.
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

vice versa works dear !!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Three questions for basu.ds:
1. Please post YOUR input data and lookup table.
2. What output do you require:
(a) if there is no match
(b) if there is exactly one match
(c) if there is more than one match?
3. How have you currently designed this and what's happening in your design?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply