Page 1 of 1

filter condition

Posted: Fri Jun 05, 2009 12:30 am
by dnat
Hi,

I have a table with around 200 million data. And i have to extract records based on some condtions.

The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.

It runs in a 6 node configiration.

The job is running extremely slow.

Can anyone suggest which would be the best to use?

should we do it at the database level or use filter or use transformer ??..or any other options

Posted: Fri Jun 05, 2009 12:37 am
by ArndW
Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?

Posted: Fri Jun 05, 2009 1:50 am
by dnat
Thanks for the info..I will try doing it at a db level..

Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..

Posted: Fri Jun 05, 2009 2:18 am
by dnat
Thanks for the info..I will try doing it at a db level..

Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..

Posted: Fri Jun 05, 2009 3:40 am
by miwinter
Whilst this seems an obvious question.... Is adding the predicates to an existing or new index not an option in the DB? By far this is where you should be looking to start.

Does the query on it's own run slowly, when run outside of Datastage? If so, straight off you know where your bottleneck is... and if it's an inefficient query, one guess what it is :)

What DB environment do you have?

Re: filter condition

Posted: Fri Jun 05, 2009 6:36 am
by chulett
dnat wrote:The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.
Filtering isn't necessarily helped by indexes and depending on how much of the data would be filtered out, doing it in the job could be the worst approach to take. Best to optimize the query and see how well all that works in the database, it's typically the better solution.

Posted: Fri Jun 05, 2009 9:47 am
by anu123
ArndW wrote:Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?
Howmany millions do you expect after applying filters? And hw long it is taking to read the source table?