Tune a query with where clause that has a function

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Tune a query with where clause that has a function

Post by jerome_rajan »

Hi,

Our Db2 connector is extracting data from a database with a query that looks something like

Code: Select all

Select * from <tablename>
WHERE SUBSTR(ACCT_TYPE) IN (a,b,c,d)
AND bal<0;
The table has more than 50 million records. The connector takes a very long time to read this data. In 1.5 hours, it managed to read only 1.5 million rows. Any advise on how to tune this query? We have an index on ACCT_TYPE.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Jerome, try fetching whole data from the source and then use a filter stage and give the constraints in it.
Thanx and Regards,
ETL User
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

Already tried that and the job is processing. Will report once done :)
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What does SUBSTR with only one argument do? Are you forcing implicit CASTs in the IN list? What are a, b, c and d? Is ACCT_TYPE indexed?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

Post by jerome_rajan »

I meant only to give a generic idea that I'm using a SUBSTR. The actual query consists of SUBSTR(ACCT_TYPE,1,2). Acct Type is a 10 character string. The first 2 characters represent product type which we are trying to filter in the IN list. I don't think we are forcing implicit CASTs since a,b,c,d are strings of length 2. Like I said earlier, ACCT_TYPE is indexed.
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

jerome_rajan wrote:I meant only to give a generic idea that I'm using a SUBSTR.
Just an FYI but that's never a good idea. When people post "generic" examples or "something like" whatever it is that they are actually doing we can spend time chasing our tails until what they are actually doing is clarified.

If DB2 is anything like Oracle, you've invalidated the use of the index by using a function on it. Oracle allows you to build a "Function Based Index" on the table which is a result of (for example) your substring so it can be leveraged by the query. Does DB2 have an equivalent?
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

What percentage of total number of rows satisfy your condition. If the stage is fetching less number of rows then it means there are few records satisfying the condition. If that is not the case why dont you try the negation?
Thanks,
Prasanna
Post Reply