Slow Lookup in Netezza

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
just4geeks
Premium Member
Premium Member
Posts: 644
Joined: Sat Aug 26, 2006 3:59 pm
Location: Mclean, VA

Slow Lookup in Netezza

Post by just4geeks »

I have 3 datasets containing around a 100, 200, 70000 records in each. Each dataset holds a key. The keys are used to lookup a netezza table to pull up a few facts into a Seq file. The table holds 60 million records. The entire job takes around 20 - 25 min. Is there anyway at all to speed up the process? I use a lookup stage to peer into the table and the table is accessed using a ODBC enterprise stage.

Is the ODBC stage responsible for such a slow lookup? We do have a Netezza Enterprise stage but that can only act as a target and not as a source or reference.

I am using a Table Read Method and all options are at default. Will changing Isolation level or Fetch Array Size make a difference? I will try doing that but each run may take upto 25 min.

Any ideas will be appreciated. Thanks for your time.
Attitude is everything....
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: Slow Lookup in Netezza

Post by sud »

Why will the Netezza enterprise stage act only as a target? I dont use netezza now, but as far as I remember, we used to use the netezza enterprise stage as a source. One thing about netezza, it can usually beat tools like datastage in almost all types of transformations, lookup etc so dont apply the general ETL techniques with it. Try to do stuff as much as possible on the database side. For example, in your case in case your datasets' data is coming from the same netezza database, try do all joins in a single SQL and then read the data.

From what it sounds like, you must be doing a cached lookup, hence datastage is reading the whole 60 million records into memory before doing the lookup. Attach the database stage directly to the lookup and choose a sparse lookup.

If even that takes a lot of time, try loading all the three datasets into temporary tables in netezza(which you can drop later) and do the joins in a SQL and then bring the data to datasatge. In case you are about to write to a file, do the whole stuff including export to file within netezza and dont even involve datastage, you will see the lookup and all happening in matter of seconds.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Aruna Gutti
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 21, 2007 9:35 am
Location: Boston

Post by Aruna Gutti »

Netezza Enterprise Stage is just a bulk loader using either 'ET' or 'NZLoad' options. To use Netezza as target I am also using ODBC Enterprise stage. I haven't seen performance issue with Lookup ... but of course I have only around 10 million records on my Netezza table.
just4geeks
Premium Member
Premium Member
Posts: 644
Joined: Sat Aug 26, 2006 3:59 pm
Location: Mclean, VA

Re: Slow Lookup in Netezza

Post by just4geeks »

sud wrote:Why will the Netezza enterprise stage act only as a target?
Following is the error I get when I try to reference Netezza Stage.

Image

I will try the following.
sud wrote: If even that takes a lot of time, try loading all the three datasets into temporary tables in netezza(which you can drop later) and do the joins in a SQL and then bring the data to datasatge.
Attitude is everything....
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Re: Slow Lookup in Netezza

Post by shawn_ramsey »

Have you looked at what is going on with the Netezza? I have found that it is much faster to bulk load the data into a temporary table and use a SQL statement to join it and pull the data back out for stuff like this.

They have added a bulk unload with the new version of the stage but I am not sure that is going to help since it is only available under 8. I am also working on a request for a Netezza bulk join operator.
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
Post Reply