Page 2 of 2

Posted: Mon Jan 12, 2009 10:41 am
by srinagesh
Just as a test, try to split the job into two.

You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

By doing this, you can identify the area where the problem originates. I believe that this has more to do than the way the Hashed file is organized.

Posted: Mon Jan 12, 2009 10:42 am
by srinagesh
Just as a test, try to split the job into two.

You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

By doing this, you can identify the area where the problem originates. I believe that this has more to do than the way the Hashed file is organized.

Posted: Mon Jan 12, 2009 1:01 pm
by narasimha
Attu,

What is your source? Database/File?
What is the size of your hashed file in question?
You would want to resize the hashed file only if you suspect you are exceeding the 2 GB size limit.
Also I would not preload to memory if the size of the hashed file is very large.

If your source is not a file, you can try what srinagesh is suggesting to help identify the bottleneck.

Posted: Tue Jan 13, 2009 9:54 am
by attu
[quote="srinagesh"]Just as a test, try to split the job into two.

Code: Select all

yes i splitted the job in two parts, the first part was very fast and it had 2 hashed files doing a lookup, but the second part again slowed down. 
You can write to a text file, from Link collector stage in Job 1.
Use the text file as source and process rest of the stages in Job 2.

Code: Select all

I have done that, the performance of the second part is very poor.

Posted: Tue Jan 13, 2009 10:02 am
by attu
hi Narasimha,

[quote="narasimha"]Attu,

What is your source? Database/File?

Code: Select all

File with 50 million records 
What is the size of your hashed file in question?

Code: Select all

HF1 32661504 bytes
HF2 700416 bytes
HF3 212992 bytes
HF4 10645504 bytes
HF5 1032192 bytes
You would want to resize the hashed file only if you suspect you are exceeding the 2 GB size limit.

Code: Select all

 okay 
Also I would not preload to memory if the size of the hashed file is very large.

Code: Select all

 i tried that and no success

Posted: Tue Jan 13, 2009 11:16 am
by attu
I would like to run the same job in a different environment. I already exported the dsx. How do I move hashed files to the other Environment?
Can I just do a Unix copy of hashed file or there is any import/export command for moving hashed files to different servers?

Appreciate your responses.

Thanks

Posted: Tue Jan 13, 2009 11:31 am
by narasimha
Yes you will have to copy the hashed files to the new location. There is no import/export utility for this purpose.

From what I see, your hashed files are not verylarge.
Not sure why the performance is so poor.
I would try and use the default options while creating these hashed files and check the performance.

Posted: Tue Jan 13, 2009 11:46 am
by chulett
Depends on if they are "pathed" hashed files or were created in an account. For the former, yes you can simple copy them over to the new server using your tool of choice. Make sure you get everything, including the hidden file for a dynamic (Type30) hashed file and the "D_" dictionary file if present.

For the latter, you can still copy them but you will need to handle the VOC record if the hashed files don't already exist on the new server. For that you'll manually need to create it using SETFILE.

Posted: Tue Jan 13, 2009 11:54 am
by attu
Thank you guys for the suggestions.
Narasimha: I re-created the hashed files using default options, not sure why the perfromance is poor. IBM Engineering is also not able to find the root cause of this issue yet :cry:

Posted: Thu Jan 15, 2009 7:22 am
by srinagesh
Can you graphically outline the second job.

Try to move the Hashedfile 4 lookup into the First job and check the performance. Move one transformation after the other from Job 2 into Job 1 and at one point you will notice a deterioration in performance of Job 1. This is the bottleneck that you are looking for.

Posted: Thu Jan 15, 2009 2:33 pm
by attu
Okay, I was able to run the job on a different server and it completed successfully. We did not break it into pieces and ran it as it was. the throughput was around 1200 rows/sec, better than what we had on original server ( 12 rows/sec).

It seemed that our server was overloaded and too many processes were running utilizing tremendous amount of cpu cycles.

I just want to know what is the best practice to run jobs having Link Collector, multiple hashed file doing lookup and lots of transformers ?

Thanks for the responses.

Posted: Thu Jan 15, 2009 2:42 pm
by Mike
Best practice? I don't think there is as much of a best practice as it is just plain old common sense... match the workload of the application(s) to the capacity of the hardware.

Mike

Posted: Thu Jan 15, 2009 3:02 pm
by ray.wurlod
There's a rule of thumb that suggests a maximum of four hashed file lookups per Transformer stage.

Such a job can usually benefit from inter-process row buffering.

I'm not aware of any "best practices" relating to Link Collector stage apart from don't use it if you don't need to. For example you do need to if you're writing to a sequential file, but you don't need to if you're inserting new rows into a database table (and the keys are correctly partitioned).