Merge multiple files into single data set

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
suneeth
Participant
Posts: 11
Joined: Fri Jun 18, 2004 1:06 am

Merge multiple files into single data set

Post by suneeth »

How would one merge multiple files into single data set with columns from each file becoming a new column in the merged data set?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Lots of ways:

1. Load each dataset into a separate table and use a join with one table as the driver.
2. Load each dataset into a separate table and use a UNION with a group-by with MIN/MAX operations on all columns mapping each table into a discrete set of columns. This gives you a full outer join effect across all four datasets.
3. Load 3 datasets into hash files and use a reference lookup for each dataset with one dataset as the primary input stream.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
suneeth
Participant
Posts: 11
Joined: Fri Jun 18, 2004 1:06 am

Post by suneeth »

Hi Kenneth Bland,
Many thanks for the quicky response.
Mainly I am looking at different approaches in achieving that.
1. Load into tables & join them.
2. Merge stage can be used.
Or do we have any other approaches.

cheers,
suneeth--



kcbland wrote:Lots of ways:

1. Load each dataset into a separate table and use a join with one table as the driver.
2. Load each dataset into a separate table and use a UNION with a group-by with MIN/MAX operations on all columns mapping each table into a discrete set of columns. This gives you a full outer join effect across all four datasets.
3. Load 3 datasets into hash files and use a reference lookup for each dataset with one dataset as the primary input stream.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Ken's probably hit the sack by now but fortunately for you it's a 24hr forum. The merge stage is like a database union statement, it does not merge rows, it merges files. Is this what you are trying to achieve? There is also the link collector stage that does a similar merge/union.

If you want to merge rows from different sources into a single row, in effect performing a join, you need to revisit the approaches outlined by Ken.

If you have multiple files then I would favour putting the smallest files into hash files and then processing the largest file as an input. Use hash file lookups to join the lookup files in a transformer and create a combined output row.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Vincent, the merge stage really does combine 2 files seperate lines to 1 line in your output file and makes a join as you can in a DB (it even has a complete set equal to a full outer join).
but there are some disadvantages to using the merge stage(in no particular order):
1. not user freindly (especially when you need to make changes)
2. I've recently came across a problem in large files (over 10 GB each)
3. it can only merge 2 files per merge stage.
4. it has no input link available, forcing you to make different jobs for dependant merge files.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Thanks Roy, I must be getting my stages mixed up, it does indeed merge the input sequential file rows.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

500+ posts and getting your sig wrong, too! :wink: :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

Hi

Merging two datasets (few columns from 1st DS & few columns from 2nd DS) can be done few ways in PX.

1. Use merge stage
2. Join Stage
3. Use primary stream is the source and secondary source is the lookup.

Hope this would help.

Regards
Saravanan
Post Reply