SELECTING ROWS FROM A FILE

yalamanchili · Post by **yalamanchili** » Thu Jun 29, 2006 6:05 am

Hi All,

while loading data from input file to ODBC i want to select rows basing on distinct values from a column.

In oracle data base if we want to select the records from a data base we will use "SELECT DISTINCT COLUMN_NAME FROM TABLE_NAME".

how can i apply this logic while loading data from file to odbc.

Waiting for your reply

sb_akarmarkar · Post by **sb_akarmarkar** » Thu Jun 29, 2006 6:17 am

Use aggregator stage.....

Thanks,
Anupam

kduke · Post by **kduke** » Thu Jun 29, 2006 6:20 am

If you are talking about removing duplicates then you can search for this. There are lots of posts on removing duplicates. You can do it using a hash file or land your data in a staging table and then select distinct.

chulett · Post by **chulett** » Thu Jun 29, 2006 6:22 am

1) Load it into a work table first, then 'select distinct' from there to your target table.

2) Sort your data based on that field, then use stage variables to only pass the row with the first occurance of 'column name'.

3) Load into a hashed file keyed on 'column name' then source to ODBC from there.

4) Use a hashed reference with 'column name' as key. Lookup and also write to it. Miss = record to ODBC and hashed file. Hit = duplicate to pitch.

There's four ways off the top of my head...

yalamanchili · Post by **yalamanchili** » Thu Jun 29, 2006 6:22 am

Hi,

Than You fro your response.Can You Please Expalin to Me in Breif how to use Aggregator Stage.

sb_akarmarkar · Post by **sb_akarmarkar** » Thu Jun 29, 2006 6:26 am

yalamanchili wrote:Hi,

Than You fro your response.Can You Please Expalin to Me in Breif how to use Aggregator Stage.

Allow row pass through aggregator make all column as group by except one column make that as FIRST/LAST/MAX/MIN in stage.....

Thanks,
Anupam

ray.wurlod · Post by **ray.wurlod** » Thu Jun 29, 2006 6:51 am

When reading the file, use sort -u as a filter command in the Sequential File stage.

Or, if you want a really slow solution, use the ODBC driver for text files, and use your SQL statement from an ODBC stage.

DSguru2B · Post by **DSguru2B** » Thu Jun 29, 2006 7:02 am

Or, load the data into a hashed file (in account) with a dummy key thats nothing but a sequential number. Then access that hashed file via universe stage and run the sql query on it.

chulett · Post by **chulett** » Thu Jun 29, 2006 7:04 am

Or... well, never mind. That's about enough different ways for one day.

DSguru2B · Post by **DSguru2B** » Thu Jun 29, 2006 7:26 am

I agree, i guess thats enough options for the OP to pick and choose from.

kumar_s · Post by **kumar_s** » Thu Jun 29, 2006 11:44 am

kduke wrote:If you are talking about removing duplicates then you can search for this. There are lots of posts on removing duplicates. You can do it using a hash file or land your data in a staging table and then select distinct.

Kim has mentioned the option available in both Server and PX. You still left out with manual option. Open in a text editior and pick up the required column.