Distinct values

Gurudutt · Post by **Gurudutt** » Wed Feb 19, 2003 8:09 pm

Hi All,

I have a requirement to get only distinct values from the flat file and to load the distinct into Database. The source file contains multiple records.

GD

vmcburney · Post by **vmcburney** » Wed Feb 19, 2003 9:51 pm

There are a few options. You could extract it into a local hash file then put it into the database. The primary key on the hash file should ensure you only get distinct rows added to the database.

If you are using Oracle you could use the option to only add new rows and not update existing rows.

You could follow the advice in Ray's recent email, presort the file by primary key, use stage variables to identify and ignore duplicate rows.

Vincent McBurney
Data Integration Services
www.intramatix.com

ray.wurlod · Post by **ray.wurlod** » Thu Feb 20, 2003 1:29 am

Most database stage types have "insert new rows only" as one of the options. These will, however, generate warnings that are logged, so ensure that your warning limit is disabled.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

etlandbi · Post by **etlandbi** » Thu Feb 20, 2003 12:36 pm

Hi Ray,

To test this i just added a source ODBC stage to pick data of single column which is not unique followd by a pass through transformer stage which iam just using to connect to the target ODBC stage for writing unique values to oracle 8i database.

I see that the target ODBC stage does have the option "Insert new/update existing row " but no "insert new only option". This is in version 6.0. Please let me know if i am missing something.

Thanx

Rahul

vmcburney · Post by **vmcburney** » Thu Feb 20, 2003 2:46 pm

Do you have an Oracle client installed on your datastage server? If so then you should be able to use the Oracle OCI stage instead of the ODBC stage. This will give you more update options and should have faster performance.

There are some configuration instructions in the PDF Upgrade guide and the install readme that will help you get this stage working. There are also quite a few threads in the forum archives on this subject.

Vincent McBurney
Data Integration Services
www.intramatix.com

ray.wurlod · Post by **ray.wurlod** » Thu Feb 20, 2003 4:33 pm

If you're using ODBC as source you could, of course, use user-defined SQL including the DISTINCT keyword.
If you have Microsoft Office, you have an ODBC driver for text files; would not like to predict the performance (!) of SELECT DISTINCT though!
I'm very surprised that you say that the 6.0 ODBC stage lacks "insert new rows only" as an update option. Right now I do not have access to a 6.0 system, so cannot check and advise. Somebody else?

ray.wurlod · Post by **ray.wurlod** » Fri Feb 21, 2003 2:50 am

Probably because there are other columns that have to be processed but which can't be grouped or otherwise processed by an Aggregator.