19 Hashed file look up....

swades · Post by **swades** » Thu Mar 15, 2007 7:51 am

Hi all,
We have a job in that data stream apply to transformer and in Xmer 19 hashed file given for look up.among that 18 hashed file having same kind of meta data ,5 column ,all of it hash key.we comparing 3 of it and taking other 2 us look up output.
I think due to this logic my job is running slow.so what is solution should i apply.
Ans me thanks.

kcbland · Post by **kcbland** » Thu Mar 15, 2007 7:56 am

I have recommended so many times to you to first look at the server resources before stating a job is "slow". Can you PLEASE state your version of Unix? Run prstat, topas, top, or glance and watch your job run. If the CPU for the job is at 100%, your job IS NOT SLOW. You just have a lot of logic and that requires CPU time. Your next steps to improve performance is to tune hashed files, and then incorporate multiple job instances and partition your data and use multi-processing techniques.

chulett · Post by **chulett** » Thu Mar 15, 2007 7:59 am

swades wrote:Ans me thanks.

No reason to ask for an answer.

Pretty sure "we" all have had this conversation before. What ever happened to the resource checks Ken asked you to perform?

(ack, too slow. Waves as the Ken-mobile blows right on by)

swades · Post by **swades** » Thu Mar 15, 2007 8:24 am

[quote="kcbland"] Can you PLEASE state your version of Unix? Run prstat, topas, top, or glance and watch your job run. If the CPU for the job is at 100%, your job IS NOT SLOW.
I have OS: Sun Solaris .

kcbland · Post by **kcbland** » Thu Mar 15, 2007 8:39 am

Go to your DataStage server unix command line. Type in:

Code: Select all

uname -X

Get the number of cpus.
Then type in

Code: Select all

prstat -a

Look at the server utilization. Look at the processes running. Look at the top users listed at the bottom. Each DS process can only use 1/NumCPUS percentage. So 4 cpus means a FULL SPEED JOB will show as 25%.

Start your job running. Watch and see what the uvsh or phantom process achieves while your job is running. If other things are running, your ability to reach 25% will be limited.

Your goal is to get every type of job you write to use a full CPU. When you can reach a full CPU, you'll use partitioned parallel multiple instances to then run more job instances to fully use all CPUs.

swades · Post by **swades** » Thu Mar 15, 2007 8:55 am

Thanks Ken,
I analyze my job, there is 4 CPUS,
it never reach to 100% if you make total of 4.
It reaches only upto 50% (if u make total)

swades · Post by **swades** » Thu Mar 15, 2007 9:33 am

so what i suppose to do to utilize 100% CPUs.

kcbland · Post by **kcbland** » Thu Mar 15, 2007 10:13 am

What about YOUR JOB! Does your job show it's using 25%? If it is, then the job is going as fast as ONE CPU can work.

If it isn't, then there's something interfering with your job. That could be a lack of cpu time. But, since there's 50% free cpu time, it's something else. That could be a disk issue preventing either: memory swapping, quick reference lookup, or writing to disk.

If your source stream is a datafile, there's no significant delay in reading the file. If writing to a file there's usually no significant delay in writing. If your source stream is a database, than your job is waiting on the database to send data. Didn't we just cover this in a previous post with you about writing data to a file instead of sending it directly into intense transformation?

ray.wurlod · Post by **ray.wurlod** » Thu Mar 15, 2007 8:02 pm

Supply and demand again.

Manage your expectations.

Try splitting the 19 lookups across, say, four transformer stages rather than have one as the bottleneck.

Next time you run the job, enable statistics on the Job Run Options dialog for that single transformer stage - learn where it is spending most of its time.

Are the hashed files read-cached?

DSXchange

19 Hashed file look up....

19 Hashed file look up....

Re: 19 Hashed file look up....