Hi,
I have a table with around 200 million data. And i have to extract records based on some condtions.
The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.
It runs in a 6 node configiration.
The job is running extremely slow.
Can anyone suggest which would be the best to use?
should we do it at the database level or use filter or use transformer ??..or any other options
filter condition
Moderators: chulett, rschirm, roy
Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks for the info..I will try doing it at a db level..
Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
Thanks for the info..I will try doing it at a db level..
Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values? ---I am using non-indexed columns as a part of my where clause. I mean something like col1='B'..here col1 is not an index, hence the query will take much long time..
Whilst this seems an obvious question.... Is adding the predicates to an existing or new index not an option in the DB? By far this is where you should be looking to start.
Does the query on it's own run slowly, when run outside of Datastage? If so, straight off you know where your bottleneck is... and if it's an inefficient query, one guess what it is
What DB environment do you have?
Does the query on it's own run slowly, when run outside of Datastage? If so, straight off you know where your bottleneck is... and if it's an inefficient query, one guess what it is
What DB environment do you have?
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
Re: filter condition
Filtering isn't necessarily helped by indexes and depending on how much of the data would be filtered out, doing it in the job could be the worst approach to take. Best to optimize the query and see how well all that works in the database, it's typically the better solution.dnat wrote:The columns which i am going to include are not indexed in the table. So i thought it would be better to add the conditions in a transformer and filter the data.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Howmany millions do you expect after applying filters? And hw long it is taking to read the source table?ArndW wrote:Is your database on the same server as DataStage? If not, then it might be best to do it in the DB. Even if both applications are on the same server it can be faster to limit the data at the first (database) level. Do you mean that you wish to use non-indexed columns are part of your selection criteria or just as data values?
Thank you,
Anu
Anu