Page 1 of 2

Lookup Stage - wI/O

Posted: Wed Nov 17, 2010 11:26 am
by reachmexyz
hello all

I have a job like Dataset----transformer----lookup---dataset.
Lookup stage has a reference link with only one record. Source dataset has 50 columns with lots of Longvarchar data types.
When i run the job, job was running at 600 records per second ( 4 node system ).
When i check the statistics, job was I/O limited. 60% of I/0 was consumed by the job. I tried to increase the buffer parameters of lookup stage from 3mb to 6mb and buffer free run to 500, but still didnt work.
I was always under the impression that, for a lookup, if reference link has more data, than the data will be moved to I/O after buffer is fully utilized. But in this case, even the source data is being pushed to I/O. Will lookup stage not take huge source data?
What steps can I take to make sure no I/O takes place apart from redesigning the job to include less numer of columns in source.
Does the lookup stage really gets affected by source data too...

Please help me out in this.

Re: Lookup Stage - wI/O

Posted: Wed Nov 17, 2010 11:38 am
by chowdhury99
Removing the transformer stage may increase the rate. Thanks.

Re: Lookup Stage - wI/O

Posted: Wed Nov 17, 2010 12:30 pm
by reachmexyz
Thanks.

I tried that. But didtn't worked. I am not understanding why the source data is being pushed onto I/O.
Please reply

Re: Lookup Stage - wI/O

Posted: Wed Nov 17, 2010 2:05 pm
by chowdhury99
Since you have only one record for lookup, you can remove lookup and use stage variable in trasformer. Thanks.

Re: Lookup Stage - wI/O

Posted: Wed Nov 17, 2010 6:33 pm
by reachmexyz
Lookup stage is causing the problem. It was able to take the source records in, but unable to send the records out eventhough records on reference link are few. Because of this I/0 is increasing. how can i avoid I/O.

Posted: Wed Nov 17, 2010 7:50 pm
by ray.wurlod
You are not correct. The Lookup stage maps the reference data set into memory before the first row is processed. No subsequent I/O is involved. Look elsewhere for your I/O bottleneck.

Posted: Wed Nov 17, 2010 9:43 pm
by chulett
:!: Bejeebuz, people - stop quoting entire posts just to add a reply.

Use the perfectly lovely Reply to Topic button down there rather than the seemingly more convienent 'Reply with quote'. Save me from having to come in here and clean things up all the time. Please. :?

Posted: Thu Nov 18, 2010 11:21 am
by reachmexyz
Thanks.

I build another job which just reads the dataset---> xformer--->copy--->dataset. No lookup's included. Even this job is consuming 50% of I/O. how can i know where I/O is getting into action. What other steps i can take to avoid I/0.
I ran the same job again, and this time it completed in 2 seconds with 180000 rows per second and no I/0. Is the data saved on cache.
Please let me know how can i remove I/0 and re-duplicate the issue with same job

Posted: Thu Nov 18, 2010 4:51 pm
by ray.wurlod
Use the DataStage Performance Analysis tool. It's on the toolbar of the Designer client.

And please stop baiting Craig - don't use "Reply with Quote" - use "Reply to Topic".

Posted: Thu Nov 18, 2010 7:15 pm
by chulett
:lol:

More cleanup work for me later.... sigh...

Posted: Mon Nov 22, 2010 4:45 pm
by reachmexyz
Thanks Ray

New built jobs is like dataset---xformer---lookup---dataset. Lookup has three reference links with 5 records on each link. When i run the job rows/secon dataset--xformer--lookup looks 12000 rows/sec which is good but rows/second on looup--dataset link falls to 1000 rows/sec. Lookup stage is not able to take the records, or may be it is unable to write the records fast to final dataset. I have checked the performance analysis tool and found the Heap memory has increased to 3kb for lookup stage. Other stages are working good. And becuase of lthis,lots of I/O is consumed. Couldnt figure out how to reduce heap memory on lookup stage and make my job run faster. Please provide a solution.

Posted: Mon Nov 22, 2010 5:37 pm
by ray.wurlod
3KB is nothing. Don't search to try to reduce that! Reference data in a Lookup stage are all loaded into memory, so there will be no I/O involved. You will need to look elsewhere to find what is demanding I/O resources - try using the DataStage Performance Analysis tool.

Posted: Tue Nov 23, 2010 3:38 pm
by reachmexyz
Thanks Ray

I found it. Problem is with the Dataset. Job is consuming lot of I/O when writing to Dataset. Job is good when reading a existing dataset but consuming lot of I/O when writing to it. When i replaced the dataset with a peek stage or a flat file, eveything is looking good.

How can i resolve this issue? What measures should i take to make sure writing to dataset is smooth. Please help me out.

Posted: Wed Nov 24, 2010 4:23 am
by ArndW
If you have a lot of Varchar columns with bounded values where the actual contents don't consume a lot of that length (i.e. VarChar(128) where usually only 10 character are used), then you might save on I/O by making those column unbounded.
Apart from that, writing to a Dataset is going to do nothing but I/O; the only way to make it do less I/O is to reduce the data volume.
If this is truly a bottleneck you might be able to distribute the I/O load by splitting your data across directories that are on different controllers or spindles.

Posted: Thu Nov 25, 2010 2:07 pm
by reachmexyz
Yes. That was the problem

I have lot of varchar columns with 4000 precision. I made them unbounded and the problem is resolved. Thanks for the help. But still one question is lingering in my mind. When i had a varchar column with 4000, dataset is trying to allocate 4000 bytes eventhough there is not much data. I have tested this by checking the size of the dataset file. Usually a varchar is variable character. 4000 should be the max it should allocate, but should not allocate 4000 always. It is just behaving like a "CHAR" field.
Varchar is just like Char with padded empty strings. Why is it like that. I am under the impression that irrespective of precision of varchar, bytes will be allocted based on the Data Length.