Query regarding join stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
tom
Participant
Posts: 46
Joined: Fri Oct 14, 2005 12:38 am

Query regarding join stage

Post by tom »

Hi Dsxians,

I am joining two tables from an oracle db using join stage, each input link contains 2 million records.Inputs are sorted and partitioned on key columns.In the job monitor i could see that all source records from the input links are read first and then the starts the join operation.

Could you please explain why it is happening so?I need to perform the join operation as soon as the input records
comes to join stage for increasing the performance.Same is happening with change capture also.

Thanks in advance
tom
Devlopers corner
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

As you mentioned, "Sorting and Partitioning on the Key", is the hint.
You input sources have 2 million records, they need to be arranged for the Join stage based on the condition I quoted.
If I were to give you a set of 10 number and ask you to sort it knowing only the first 5, you wont be able to do it, you need the entire 10 number. Works kinda in the same fashion.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tom
Participant
Posts: 46
Joined: Fri Oct 14, 2005 12:38 am

Post by tom »

Thanks for your reply DSguru2B.I understood your point.

Could you please suggest some performance enhancement techniques
for change capture stage which has around 200 and 400 million records in before and after dataset?

Reading these two inputs data to change capture stage is taking abt 4 hrs.

input is a dataset and reference is from sybase db.input is partitioned and sorted on key columns.

Thanks for your time.
Devlopers corner
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How many nodes are you using? What if you do a simple select and load to a , may be, flat file, how much time does it take then? Try monitoring the disk contention and the cpu usage on your server. Search here for various techniques. kcbland usually has a bunch of different utilities mentioned in his post and methods of montoring a job and how to fine tune them.
Check out kcbland's reply in the following post
viewtopic.php?t=103019&highlight=top%2C+prstat
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
jdmiceli
Premium Member
Premium Member
Posts: 309
Joined: Wed Feb 22, 2006 10:03 am
Location: Urbandale, IA

Post by jdmiceli »

Admin - please move this thread to the Parallel forum
Bestest!

John Miceli
System Specialist, MCP, MCDBA
Berkley Technology Services


"Good Morning. This is God. I will be handling all your problems today. I will not need your help. So have a great day!"
tom
Participant
Posts: 46
Joined: Fri Oct 14, 2005 12:38 am

Post by tom »

Thanks DSGuru2B.I will follow up with the thread you provided.
Devlopers corner
Post Reply