sorting

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
apkselvam
Participant
Posts: 31
Joined: Mon Sep 04, 2006 2:37 am

sorting

Post by apkselvam »

Hi,
1) How sorted i/p file increases the performence.
2) If size of the Master table and update table exceeds the RAM size, then what will happen , it leads to page fault error or not [size=12][/size]
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Assuming that your join/grouping keys are what are sorted (since you did not specify), as soon as any key value change is detected it is known that that value will not recur, therefore that any rows with that value can be put onto the output link and memory freed. Result is less overall demand for memory and faster appearance of rows on the output link.

For the second question, research the topic of scratch disk in the manuals or on-line help.

Are these interview questions?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gootala_ravi
Participant
Posts: 7
Joined: Wed Feb 08, 2006 2:34 am

Re: sorting

Post by gootala_ravi »

apkselvam wrote:Hi,
1) How sorted i/p file increases the performence.
2) If size of the Master table and update table exceeds the RAM size, then what will happen , it leads to page fault error or not
1) Ray's explanation is a perfect reason why it improves performance.

2) Assuming the join is affecting the entire table: On datastage side, if size to update exceeds the RAM size, the update would take too long time to update due to I/O operations - it would not fail (DS would place on disk and process the data, limited by available disk space on which DS server is running before it fails).

However, on database side if this problem happens then it would fail. You would be forced to commit for every "X" rows (X would depend on buffer size set for DB and the number of DML operations happening in parallel), so that the update would not fail. I would like someone who has come across on such situation to validate what I have stated here.
Post Reply