Page 1 of 1

Join more than two sequential files in server jobs

Posted: Wed Aug 31, 2005 11:30 pm
by balaid
As I am new to Data Stage, I have a problem with joining three sequential files.

Can any one help me

Posted: Thu Sep 01, 2005 1:25 am
by roy
Hi,
You can use the search utility in the forum to find your answer.
(I'll move your server noted job type to the server forum)
Basically you have several options:
1. Invoke an OS command line to concatenate the files, then process them.
2. Use filter command to cat (unix) or type (dos) the files and use the output from that command as your stage input (using sequntial file stage).
3. Read all files in different streams in parallel.
4. Use multiple instance jobs and passing the file name as a parameter.

My favorite is option 2.

IHTH,

Posted: Thu Sep 01, 2005 3:18 am
by kumar_s
Hi,
We could see several option to concatinate files. :!: :!: :!:
balaid ,
Is that what you want or you need some help in joing three file based on a key.

regards
kumar

Posted: Thu Sep 01, 2005 5:00 am
by rumu
kumar_s wrote:Hi,
We could see several option to concatinate files. :!: :!: :!:
balaid ,
Is that what you want or you need some help in joing three file based on a key.

regards
kumar
Roy,can u give some details on how can we use filter command ,since I am also suffering same problem,concatenating 3 files using filter commands option checked and using resulting file as my source stage but getting error "cant open <resultingfilename> in dirpath" where dirpath is the directory.

Posted: Thu Sep 01, 2005 6:51 am
by chulett
Post your Filter command and what's in your Filename field.

Posted: Thu Sep 01, 2005 8:19 am
by roy
Hi,
Funny :shock: typing it generates errors, so I went the ... way here's how to do it step by step:

1. place a sequential stage on your canvas.
2. In your sequential file stage in the stage tab check the stage uses filter command check box.
3. go to the Outputs tab, in the File name brows for 1 of the files.
4. in the Filter command (on the right side of the File name ...) brows for the /usr/bin/cat command.
5. after step 4 add using copy & paste the file you have in the File name (left of the filter command) and any other file names seperated by a space.
6. have your table definition in the columns tab.
7. press the view data button to confirm success configuration.

IHTH,

Posted: Thu Sep 01, 2005 8:20 am
by pnchowdary
If you always have the same number of input files to be merged, you can even use the link collector stage to first merge all the individual files into one big file and then process that big file.

If the number of input files changes dynamically, then it is better to use cat or type commands in a before stage subroutine and then use the merged file.

Posted: Thu Sep 01, 2005 8:24 am
by roy
Hi,
Baring in mind that the file names can be parameterized it can be generated dynamicly and passed as a parameter to the job.
I do think that it should be more easy to do and support wild cards, which in my case it doesn't seem to support.
(feel free to test it yourself, I didn't go passed the seq file stage and output link this time)
I do recall it being more freindly in the past :roll: .

Join more than two sequential files in server jobs

Posted: Sun Sep 04, 2005 10:37 pm
by balaid
naveen,
here is the problem,

I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
The result should be populated to target table/file


Thanks in advance....

Re: Join more than two sequential files in server jobs

Posted: Mon Sep 05, 2005 1:47 am
by rumu
balaid wrote:naveen,
here is the problem,

I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
The result should be populated to target table/file


Thanks in advance....
Hi,
Since ur source file structure is different u cant use link collector stage.If u have common key in 3 files ,then u can use mergestage in succession(in first stage u can join 2 files then in next stage use that resultant file to join with the 3 rd one).Or u can use Execsh in before job subroutine and pass three files name and use resultant file as ur source in sequential stage.

Posted: Tue Sep 06, 2005 8:19 am
by pnchowdary
Hi Balaid,

Like rumu pointed out rightly, you can not use any of the techniques mentioned above to merge your source files, if they have differernt structures.
I have three table structures with different column names each. I want to join those three tables/files using some stages. can i join using merge.
When you say that you want to merge the three files, how do you propose to do it? i.e based on what key fields. If you give us some more details about the exact keys in those three files, we can help you further.

Merge stage can be used to merge two files based on a common key/keys.

Re: Join more than two sequential files in server jobs

Posted: Tue Sep 06, 2005 1:01 pm
by PhilHibbs
balaid wrote:As I am new to Data Stage, I have a problem with joining three sequential files.

Can any one help me
Presuming that you really mean "join" in the SQL sense, then there are at least two approaches.

If your joins are many-to-many, then you have to do this in two steps using the rather horrible Merge stage. Merge two of them, then merge the output of that with the third file. I think this has to be done as two consecutive jobs, but others may correct me on this.

If both of your joins are one-to-many, such as looking up the Vendor and Cost Centre for a Purchase Order, then you can do it with hash files. Use the 'many' file as your primary stream into a Transformer, and feed the other two files into hashes. Use these hashes as lookups in the Transformer. Your job should look something like this:

Code: Select all

  1 --> H
         \
          \
  2 -----> T ----> O
          /
         /
  3 --> H
where 2 is the file that contains one or more rows for each row in files 1 and 3.

If you need a 'full outer' join, then again you are back to needing the execrable Merge stage.