How to process a large sequential file?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
balu536
Premium Member
Premium Member
Posts: 103
Joined: Tue Dec 02, 2008 5:01 am

How to process a large sequential file?

Post by balu536 »

Hi
Everybody,


I have a large sequential file which contains 15 million records.We need to insert this data into a oracle table by performing some transformations.Basically we are doing ETL transformations in our jobs to load a fact table. Our aproach is first load the sequential file data into the staging table by one job.In the Second job we compare staging table data with the fact data to perform an insert operation (if match not found) or updateoperation(if match found).

So in the second job we load the staging table data and fact table data by oracle stage.Then we use a join stage to determine records for update or insert operation by using a dummy column. In the join stage we compare the data by using the unique key columns and left outer join.

After the join stage we send the data to a transformer for satisfying the bussiness rules.Then we perform the insert( if required) and update operation based on the bussiness rules.

So we require some clarification whether this approach will be suitable for
comparing the 15 million records coming from the staging table and 15 million plus records coming from the fact table.

Is it suitable to use join stage or any other stage to compare such huge data?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

You may find it preferable to use the Slowly Changing Dimension stage.

Or you may like to investigate any of the three change detection stages (Difference, Compare, Change Capture).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

You may find it preferable to use the Slowly Changing Dimension stage.

Or you may like to investigate any of the three change detection stages (Difference, Compare, Change Capture).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply