Distinct values

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Gurudutt
Participant
Posts: 3
Joined: Tue Aug 12, 2003 8:45 am

Distinct values

Post by Gurudutt »

Hi All,

I have a requirement to get only distinct values from the flat file and to load the distinct into Database. The source file contains multiple records.

GD
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There are a few options. You could extract it into a local hash file then put it into the database. The primary key on the hash file should ensure you only get distinct rows added to the database.

If you are using Oracle you could use the option to only add new rows and not update existing rows.

You could follow the advice in Ray's recent email, presort the file by primary key, use stage variables to identify and ignore duplicate rows.


Vincent McBurney
Data Integration Services
www.intramatix.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Most database stage types have "insert new rows only" as one of the options. These will, however, generate warnings that are logged, so ensure that your warning limit is disabled.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
etlandbi
Participant
Posts: 4
Joined: Tue Feb 18, 2003 1:31 pm

Post by etlandbi »

Hi Ray,

To test this i just added a source ODBC stage to pick data of single column which is not unique followd by a pass through transformer stage which iam just using to connect to the target ODBC stage for writing unique values to oracle 8i database.

I see that the target ODBC stage does have the option "Insert new/update existing row " but no "insert new only option". This is in version 6.0. Please let me know if i am missing something.

Thanx

Rahul
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Do you have an Oracle client installed on your datastage server? If so then you should be able to use the Oracle OCI stage instead of the ODBC stage. This will give you more update options and should have faster performance.

There are some configuration instructions in the PDF Upgrade guide and the install readme that will help you get this stage working. There are also quite a few threads in the forum archives on this subject.


Vincent McBurney
Data Integration Services
www.intramatix.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're using ODBC as source you could, of course, use user-defined SQL including the DISTINCT keyword.
If you have Microsoft Office, you have an ODBC driver for text files; would not like to predict the performance (!) of SELECT DISTINCT though!
I'm very surprised that you say that the 6.0 ODBC stage lacks "insert new rows only" as an update option. Right now I do not have access to a 6.0 system, so cannot check and advise. Somebody else?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Probably because there are other columns that have to be processed but which can't be grouped or otherwise processed by an Aggregator.
Post Reply