filter condition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

filter condition

Post by dnat »

Hi,

I have a table with around 200 million data. And i have to extract records based on some condtions.

The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.

It runs in a 6 node configiration.

The job is running extremely slow.

Can anyone suggest which would be the best to use?

should we do it at the database level or use filter or use transformer ??..or any other options
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

Thanks for the info..I will try doing it at a db level..

Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

Post by dnat »

Thanks for the info..I will try doing it at a db level..

Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Whilst this seems an obvious question.... Is adding the predicates to an existing or new index not an option in the DB? By far this is where you should be looking to start.

Does the query on it's own run slowly, when run outside of Datastage? If so, straight off you know where your bottleneck is... and if it's an inefficient query, one guess what it is :)

What DB environment do you have?
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: filter condition

Post by chulett »

dnat wrote:The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.
Filtering isn't necessarily helped by indexes and depending on how much of the data would be filtered out, doing it in the job could be the worst approach to take. Best to optimize the query and see how well all that works in the database, it's typically the better solution.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

ArndW wrote:Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?
Howmany millions do you expect after applying filters? And hw long it is taking to read the source table?
Thank you,
Anu
Post Reply