Im trying to merge 2 sequential files using the "merge" stage on a key X.
The "merge" stage sounds perfectly matched what I need to do but it perform extremely slow ( somewhat like
Performance of "Merge" stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
Why not extract file 2 into file 1, adding the extra column. Then you can process the full file.
OR
Process the data in 2 ETL jobs run in sequence.
OR
Create a job for file 1 with the required transformations and outputs. Convert everything after the input stage into a share container. Create a second job for file 2 which converts the data into 4 columns then feeds it to the shared container. You will have two jobs to run however the transformation and output logic will be shared making maintenance easier.
Vincent McBurney
Data Integration Services
www.intramatix.com
OR
Process the data in 2 ETL jobs run in sequence.
OR
Create a job for file 1 with the required transformations and outputs. Convert everything after the input stage into a share container. Create a second job for file 2 which converts the data into 4 columns then feeds it to the shared container. You will have two jobs to run however the transformation and output logic will be shared making maintenance easier.
Vincent McBurney
Data Integration Services
www.intramatix.com
Thanks for the reply.
We have tried using select with data in universe stage but the /tmp is full before we got the result (We are using /tmp as the tmp directory for datastage as I read from uvconfig, can we specify a tmp directory for a particular universe stage instead of changing the datastage-wise default? Where would the normal practive to place the datastage tmp directory?)
wmcburney, could you elaborate more on your third method?
We found an error message "Error occured while deleting temporary file" in a job with only a merge stage(source) and a sequential file stage (target). We have cleaned the specfied temp directory and even changed the job setting to another tmp directory but none of these can make the job to run again, any idea of this?
Thanks a lot.
We have tried using select with data in universe stage but the /tmp is full before we got the result (We are using /tmp as the tmp directory for datastage as I read from uvconfig, can we specify a tmp directory for a particular universe stage instead of changing the datastage-wise default? Where would the normal practive to place the datastage tmp directory?)
wmcburney, could you elaborate more on your third method?
We found an error message "Error occured while deleting temporary file" in a job with only a merge stage(source) and a sequential file stage (target). We have cleaned the specfied temp directory and even changed the job setting to another tmp directory but none of these can make the job to run again, any idea of this?
Thanks a lot.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
You see one of the drawbacks of the merge stage, also true of the aggregation stage, that DataStage builds the merge within the temp directory space which adds a performance overhead. You may find it faster to load file 2 onto file 1.
Getting back to shared containers, if you want to process two files but the fields just don't match up and the transforms are quite complex and you have many lookups you don't want to maintain two jobs that are virtually the same because it's time consuming and creates additional risk. A way around this is to put all the transform and output into a shared container and hook it into two jobs that handle the reading of the files.
Shared containers are a version 5 and 6 feature. The automatic creation of a shared container is just about the most visually interesting thing you'll ever see DataStage do! You create a full job, select those parts of it you want to turn into a Shared Container and generate the container from the menu option. All those items disappear from your job and magically appear in a seperate job leaving a shared container input stage.
Next you create your second job loading in file 2, you add the shared container to your job and link file 2 to the shared container input fields.
So you have two jobs reading input fields with a different file structure sending the data to the same transform and output stages.
Vincent McBurney
Data Integration Services
www.intramatix.com
Getting back to shared containers, if you want to process two files but the fields just don't match up and the transforms are quite complex and you have many lookups you don't want to maintain two jobs that are virtually the same because it's time consuming and creates additional risk. A way around this is to put all the transform and output into a shared container and hook it into two jobs that handle the reading of the files.
Shared containers are a version 5 and 6 feature. The automatic creation of a shared container is just about the most visually interesting thing you'll ever see DataStage do! You create a full job, select those parts of it you want to turn into a Shared Container and generate the container from the menu option. All those items disappear from your job and magically appear in a seperate job leaving a shared container input stage.
Next you create your second job loading in file 2, you add the shared container to your job and link file 2 to the shared container input fields.
So you have two jobs reading input fields with a different file structure sending the data to the same transform and output stages.
Vincent McBurney
Data Integration Services
www.intramatix.com