Lookup Stage - wI/O

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Lookup Stage - wI/O

Post by reachmexyz »

hello all

I have a job like Dataset----transformer----lookup---dataset.
Lookup stage has a reference link with only one record. Source dataset has 50 columns with lots of Longvarchar data types.
When i run the job, job was running at 600 records per second ( 4 node system ).
When i check the statistics, job was I/O limited. 60% of I/0 was consumed by the job. I tried to increase the buffer parameters of lookup stage from 3mb to 6mb and buffer free run to 500, but still didnt work.
I was always under the impression that, for a lookup, if reference link has more data, than the data will be moved to I/O after buffer is fully utilized. But in this case, even the source data is being pushed to I/O. Will lookup stage not take huge source data?
What steps can I take to make sure no I/O takes place apart from redesigning the job to include less numer of columns in source.
Does the lookup stage really gets affected by source data too...

Please help me out in this.
chowdhury99
Participant
Posts: 43
Joined: Thu May 29, 2008 8:41 pm

Re: Lookup Stage - wI/O

Post by chowdhury99 »

Removing the transformer stage may increase the rate. Thanks.
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Re: Lookup Stage - wI/O

Post by reachmexyz »

Thanks.

I tried that. But didtn't worked. I am not understanding why the source data is being pushed onto I/O.
Please reply
chowdhury99
Participant
Posts: 43
Joined: Thu May 29, 2008 8:41 pm

Re: Lookup Stage - wI/O

Post by chowdhury99 »

Since you have only one record for lookup, you can remove lookup and use stage variable in trasformer. Thanks.
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Re: Lookup Stage - wI/O

Post by reachmexyz »

Lookup stage is causing the problem. It was able to take the source records in, but unable to send the records out eventhough records on reference link are few. Because of this I/0 is increasing. how can i avoid I/O.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You are not correct. The Lookup stage maps the reference data set into memory before the first row is processed. No subsequent I/O is involved. Look elsewhere for your I/O bottleneck.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Bejeebuz, people - stop quoting entire posts just to add a reply.

Use the perfectly lovely Reply to Topic button down there rather than the seemingly more convienent 'Reply with quote'. Save me from having to come in here and clean things up all the time. Please. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Post by reachmexyz »

Thanks.

I build another job which just reads the dataset---> xformer--->copy--->dataset. No lookup's included. Even this job is consuming 50% of I/O. how can i know where I/O is getting into action. What other steps i can take to avoid I/0.
I ran the same job again, and this time it completed in 2 seconds with 180000 rows per second and no I/0. Is the data saved on cache.
Please let me know how can i remove I/0 and re-duplicate the issue with same job
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the DataStage Performance Analysis tool. It's on the toolbar of the Designer client.

And please stop baiting Craig - don't use "Reply with Quote" - use "Reply to Topic".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:lol:

More cleanup work for me later.... sigh...
-craig

"You can never have too many knives" -- Logan Nine Fingers
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Post by reachmexyz »

Thanks Ray

New built jobs is like dataset---xformer---lookup---dataset. Lookup has three reference links with 5 records on each link. When i run the job rows/secon dataset--xformer--lookup looks 12000 rows/sec which is good but rows/second on looup--dataset link falls to 1000 rows/sec. Lookup stage is not able to take the records, or may be it is unable to write the records fast to final dataset. I have checked the performance analysis tool and found the Heap memory has increased to 3kb for lookup stage. Other stages are working good. And becuase of lthis,lots of I/O is consumed. Couldnt figure out how to reduce heap memory on lookup stage and make my job run faster. Please provide a solution.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

3KB is nothing. Don't search to try to reduce that! Reference data in a Lookup stage are all loaded into memory, so there will be no I/O involved. You will need to look elsewhere to find what is demanding I/O resources - try using the DataStage Performance Analysis tool.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Post by reachmexyz »

Thanks Ray

I found it. Problem is with the Dataset. Job is consuming lot of I/O when writing to Dataset. Job is good when reading a existing dataset but consuming lot of I/O when writing to it. When i replaced the dataset with a peek stage or a flat file, eveything is looking good.

How can i resolve this issue? What measures should i take to make sure writing to dataset is smooth. Please help me out.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you have a lot of Varchar columns with bounded values where the actual contents don't consume a lot of that length (i.e. VarChar(128) where usually only 10 character are used), then you might save on I/O by making those column unbounded.
Apart from that, writing to a Dataset is going to do nothing but I/O; the only way to make it do less I/O is to reduce the data volume.
If this is truly a bottleneck you might be able to distribute the I/O load by splitting your data across directories that are on different controllers or spindles.
reachmexyz
Premium Member
Premium Member
Posts: 296
Joined: Sun Nov 16, 2008 7:41 pm

Post by reachmexyz »

Yes. That was the problem

I have lot of varchar columns with 4000 precision. I made them unbounded and the problem is resolved. Thanks for the help. But still one question is lingering in my mind. When i had a varchar column with 4000, dataset is trying to allocate 4000 bytes eventhough there is not much data. I have tested this by checking the size of the dataset file. Usually a varchar is variable character. 4000 should be the max it should allocate, but should not allocate 4000 always. It is just behaving like a "CHAR" field.
Varchar is just like Char with padded empty strings. Why is it like that. I am under the impression that irrespective of precision of varchar, bytes will be allocted based on the Data Length.
Post Reply