analyze.shm

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Just as a test, try to split the job into two.

You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

By doing this, you can identify the area where the problem originates. I believe that this has more to do than the way the Hashed file is organized.
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Just as a test, try to split the job into two.

You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

By doing this, you can identify the area where the problem originates. I believe that this has more to do than the way the Hashed file is organized.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Attu,

What is your source? Database/File?
What is the size of your hashed file in question?
You would want to resize the hashed file only if you suspect you are exceeding the 2 GB size limit.
Also I would not preload to memory if the size of the hashed file is very large.

If your source is not a file, you can try what srinagesh is suggesting to help identify the bottleneck.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
attu
Participant
Posts: 225
Joined: Sat Oct 23, 2004 8:45 pm
Location: Texas

Post by attu »

[quote="srinagesh"]Just as a test, try to split the job into two.

Code: Select all

yes i splitted the job in two parts, the first part was very fast and it had 2 hashed files doing a lookup, but the second part again slowed down. 
You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

Code: Select all

I have done that, the performance of the second part is very poor.
attu
Participant
Posts: 225
Joined: Sat Oct 23, 2004 8:45 pm
Location: Texas

Post by attu »

hi Narasimha,

[quote="narasimha"]Attu,

What is your source? Database/File?

Code: Select all

File with 50 million records 
What is the size of your hashed file in question?

Code: Select all

HF1 32661504 bytes
HF2 700416 bytes
HF3 212992 bytes
HF4 10645504 bytes
HF5 1032192 bytes
You would want to resize the hashed file only if you suspect you are exceeding the 2 GB size limit.

Code: Select all

 okay 
Also I would not preload to memory if the size of the hashed file is very large.

Code: Select all

 i tried that and no success
attu
Participant
Posts: 225
Joined: Sat Oct 23, 2004 8:45 pm
Location: Texas

Post by attu »

I would like to run the same job in a different environment. I already exported the dsx. How do I move hashed files to the other Environment?
Can I just do a Unix copy of hashed file or there is any import/export command for moving hashed files to different servers?

Appreciate your responses.

Thanks
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Yes you will have to copy the hashed files to the new location. There is no import/export utility for this purpose.

From what I see, your hashed files are not verylarge.
Not sure why the performance is so poor.
I would try and use the default options while creating these hashed files and check the performance.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Depends on if they are "pathed" hashed files or were created in an account. For the former, yes you can simple copy them over to the new server using your tool of choice. Make sure you get everything, including the hidden file for a dynamic (Type30) hashed file and the "D_" dictionary file if present.

For the latter, you can still copy them but you will need to handle the VOC record if the hashed files don't already exist on the new server. For that you'll manually need to create it using SETFILE.
-craig

"You can never have too many knives" -- Logan Nine Fingers
attu
Participant
Posts: 225
Joined: Sat Oct 23, 2004 8:45 pm
Location: Texas

Post by attu »

Thank you guys for the suggestions.
Narasimha: I re-created the hashed files using default options, not sure why the perfromance is poor. IBM Engineering is also not able to find the root cause of this issue yet :cry:
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Can you graphically outline the second job.

Try to move the Hashedfile 4 lookup into the First job and check the performance. Move one transformation after the other from Job 2 into Job 1 and at one point you will notice a deterioration in performance of Job 1. This is the bottleneck that you are looking for.
attu
Participant
Posts: 225
Joined: Sat Oct 23, 2004 8:45 pm
Location: Texas

Post by attu »

Okay, I was able to run the job on a different server and it completed successfully. We did not break it into pieces and ran it as it was. the throughput was around 1200 rows/sec, better than what we had on original server ( 12 rows/sec).

It seemed that our server was overloaded and too many processes were running utilizing tremendous amount of cpu cycles.

I just want to know what is the best practice to run jobs having Link Collector, multiple hashed file doing lookup and lots of transformers ?

Thanks for the responses.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

Best practice? I don't think there is as much of a best practice as it is just plain old common sense... match the workload of the application(s) to the capacity of the hardware.

Mike
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's a rule of thumb that suggests a maximum of four hashed file lookups per Transformer stage.

Such a job can usually benefit from inter-process row buffering.

I'm not aware of any "best practices" relating to Link Collector stage apart from don't use it if you don't need to. For example you do need to if you're writing to a sequential file, but you don't need to if you're inserting new rows into a database table (and the keys are correctly partitioned).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply