Duplicate records in Source

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Duplicate records in Source

Post by Sreenivasulu »

Hi All,

I am joining 5 tables(using primary key columns of the 5 tables) to get the source data. What are the options you would use within this stage to ensure that this is correctly done i.e (without any duplicate data or cross product). I am presently using
"distinct" to avoid duplicate data but using "distinct" degrades the performance of the query.

Regards
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Re: Duplicate records in Source

Post by sanjay »

u can use duplicate stage which will removes dulpicate records instaed of distinct and lookup stage which will have one primary input and mutiple reference input
and also check for null condition if any column has .

Regards
Sanjay


Sreenivasulu wrote:Hi All,

I am joining 5 tables(using primary key columns of the 5 tables) to get the source data. What are the options you would use within this stage to ensure that this is correctly done i.e (without any duplicate data or cross product). I am presently using
"distinct" to avoid duplicate data but using "distinct" degrades the performance of the query.

Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What stage type are you using? Which database?
Are the columns participating in the joins indexed? (If so, then DISTINCT should work very well.)

PS If this really is a question about PX (parallel jobs) can you please post it on the Parallel Forum?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Ray,

This is a general query regardless of type of jobs(server or parallel).
Database used is Oracle 9i.

Sanjay: I do not find a Duplicate Stage in Datastage 7

Regards
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

In parallel job
u have remove duplicate stage .

Sanjay
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It is not a general query regardless of engine, because the solution is different depending on whether you're using a server job or a parallel job.

As Sanjay points out (in highly annoying text messaging abbreviations - what does your DataStage documentation look like, Sanjay?!!), there is a Remove Duplicates stage available on the parallel canvas. In server jobs you need a different technique, possibly using stage variables, possibly using a hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Its a server job.
We have to remove duplicates from the source query. Is there a way in which the source stage does not process duplicate records(by using keys in the source stage) instead of using a hashed file with stage variables.

Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your original post specified parallel job. It is this that confused us.

If you're doing inner joins between the five tables based on the primary keys, you should not get duplicates (because primary key columns should have a uniqueness, or at least joint-uniqueness, property).

Can you explain exactly what you want to achieve? Indexing any foreign key columns supporting the joins will improve the performance of the query that is extracting the data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Its fine. I got a hunch of how to solve this problem
Thanks a lot for the help

Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The idea of this Forum is to be a sharing place. If your hunch proves to be correct, please post the details of your solution so that others may benefit. I'm sure you'll post again if your hunch turns out not to be correct! :lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Sure Ray!!!

I would post the solution as soon i as solve this.

Regards
Post Reply