Join more than two sequential files in server jobs

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
balaid
Participant
Posts: 7
Joined: Sun Aug 07, 2005 11:51 pm

Join more than two sequential files in server jobs

Post by balaid »

As I am new to Data Stage, I have a problem with joining three sequential files.

Can any one help me
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
You can use the search utility in the forum to find your answer.
(I'll move your server noted job type to the server forum)
Basically you have several options:
1. Invoke an OS command line to concatenate the files, then process them.
2. Use filter command to cat (unix) or type (dos) the files and use the output from that command as your stage input (using sequntial file stage).
3. Read all files in different streams in parallel.
4. Use multiple instance jobs and passing the file name as a parameter.

My favorite is option 2.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
We could see several option to concatinate files. :!: :!: :!:
balaid ,
Is that what you want or you need some help in joing three file based on a key.

regards
kumar
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Post by rumu »

kumar_s wrote:Hi,
We could see several option to concatinate files. :!: :!: :!:
balaid ,
Is that what you want or you need some help in joing three file based on a key.

regards
kumar
Roy,can u give some details on how can we use filter command ,since I am also suffering same problem,concatenating 3 files using filter commands option checked and using resulting file as my source stage but getting error "cant open <resultingfilename> in dirpath" where dirpath is the directory.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Post your Filter command and what's in your Filename field.
-craig

"You can never have too many knives" -- Logan Nine Fingers
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Funny :shock: typing it generates errors, so I went the ... way here's how to do it step by step:

1. place a sequential stage on your canvas.
2. In your sequential file stage in the stage tab check the stage uses filter command check box.
3. go to the Outputs tab, in the File name brows for 1 of the files.
4. in the Filter command (on the right side of the File name ...) brows for the /usr/bin/cat command.
5. after step 4 add using copy & paste the file you have in the File name (left of the filter command) and any other file names seperated by a space.
6. have your table definition in the columns tab.
7. press the view data button to confirm success configuration.

IHTH,
Last edited by roy on Thu Sep 01, 2005 8:28 am, edited 2 times in total.
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

If you always have the same number of input files to be merged, you can even use the link collector stage to first merge all the individual files into one big file and then process that big file.

If the number of input files changes dynamically, then it is better to use cat or type commands in a before stage subroutine and then use the merged file.
Thanks,
Naveen
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Baring in mind that the file names can be parameterized it can be generated dynamicly and passed as a parameter to the job.
I do think that it should be more easy to do and support wild cards, which in my case it doesn't seem to support.
(feel free to test it yourself, I didn't go passed the seq file stage and output link this time)
I do recall it being more freindly in the past :roll: .
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
balaid
Participant
Posts: 7
Joined: Sun Aug 07, 2005 11:51 pm

Join more than two sequential files in server jobs

Post by balaid »

naveen,
here is the problem,

I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
The result should be populated to target table/file


Thanks in advance....
rumu
Participant
Posts: 286
Joined: Mon Jun 06, 2005 4:07 am

Re: Join more than two sequential files in server jobs

Post by rumu »

balaid wrote:naveen,
here is the problem,

I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
The result should be populated to target table/file


Thanks in advance....
Hi,
Since ur source file structure is different u cant use link collector stage.If u have common key in 3 files ,then u can use mergestage in succession(in first stage u can join 2 files then in next stage use that resultant file to join with the 3 rd one).Or u can use Execsh in before job subroutine and pass three files name and use resultant file as ur source in sequential stage.
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Balaid,

Like rumu pointed out rightly, you can not use any of the techniques mentioned above to merge your source files, if they have differernt structures.
I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
When you say that you want to merge the three files, how do you propose to do it? i.e based on what key fields. If you give us some more details about the exact keys in those three files, we can help you further.

Merge stage can be used to merge two files based on a common key/keys.
Thanks,
Naveen
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Re: Join more than two sequential files in server jobs

Post by PhilHibbs »

balaid wrote:As I am new to Data Stage, I have a problem with joining three sequential files.

Can any one help me
Presuming that you really mean "join" in the SQL sense, then there are at least two approaches.

If your joins are many-to-many, then you have to do this in two steps using the rather horrible Merge stage. Merge two of them, then merge the output of that with the third file. I think this has to be done as two consecutive jobs, but others may correct me on this.

If both of your joins are one-to-many, such as looking up the Vendor and Cost Centre for a Purchase Order, then you can do it with hash files. Use the 'many' file as your primary stream into a Transformer, and feed the other two files into hashes. Use these hashes as lookups in the Transformer. Your job should look something like this:

Code: Select all

  1 --> H
         \
          \
  2 -----> T ----> O
          /
         /
  3 --> H
where 2 is the file that contains one or more rows for each row in files 1 and 3.

If you need a 'full outer' join, then again you are back to needing the execrable Merge stage.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply