Page 1 of 1

Performance Issue In Job

Posted: Tue Feb 07, 2006 9:03 am
by Nageshsunkoji
Hi All,

I have a requirement to add a column called Analog to the job flow, in this process i have two options 1) The source dataset is coming from another job which have column Analog and add the column in the input dataset and dragg the column up to the Target satge through the flow.
2) I can able to get this column in the flow by performing Inner join based on keys at the end before target dataset.
Like wise i have to implememt this logic in 10 jobs.

So, my question is which one is performance wise better if my data in millions, means dragging the Metadata through flow from source to Target (or) add a join and perform inner join based on keys before target stage.Please clarify.

Regards
Nagesh.[/b]

Posted: Wed Feb 08, 2006 4:50 am
by richdhan
Hi Nagesh,

Use Option 1. Changing the metadata is always better(eventhough it is going to take some time) rather than introducing a new stage.

HTH
--Rich

Posted: Wed Feb 08, 2006 5:26 am
by kumar_s
Depends on the column you add, and the number of stage and jobs the column need to pass though from source to target.
It is always better to reduce the number of column in flow inorder to improve the performance, but it dosent ment to join at the later stage. Join may be worth if your length of the Analog is too huge and nubmer of jobs and stage which need to pass may be tediously high.
But i would prefer to avoid unecessary joins (since you have millions of records) and carry along the way till target.

-Kumar

Posted: Wed Feb 08, 2006 10:22 pm
by Nageshsunkoji
Hi Richardan & Kumar,

Thank you for your Inputs. I am selecting the option as dragging the column from Source to Target even though it's a time consuming work to reach better performance.

Regards,
Nagesh

Posted: Wed Feb 08, 2006 10:32 pm
by kumar_s
Drag - Drop is one time that too initially, performance need to be considired for the whole life time (run time of project)

-Kumar