Page 1 of 1

Concatenating datasets

Posted: Tue May 08, 2007 11:01 pm
by Karine
I have a requirement to concatenate multiple data sets into a single data set for downstream processing. Can the datasets be 'cat' together in Unix or do have I have to do it datastage? It can be any number of datasets and I would like to concatenate them based on some file pattern. What would be the most appropriate stage to use if it has to be done in datastage?

TIA.

Posted: Tue May 08, 2007 11:20 pm
by swades
Funnel Stage can be appropriate . you can go for Continuous ,Sort or Sequence funnel.

Posted: Wed May 09, 2007 12:20 am
by Karine
Thank you for the prompt response.
To use the funnel stage would require me knowing a predetermined number of input datasets beforehand, but I can have any number. My requirement is to concatenate them, whether it's 1 or 1000.

Posted: Wed May 09, 2007 7:11 pm
by ray.wurlod
Just specify Append as the Update Policy property value in the Data Set stage.

Posted: Thu May 10, 2007 4:35 am
by Karine
Ray,
I don't have the update policy property value in my data set stage. Can you explain to me where it can be found?

May be I'll explain what I'm trying to achieve better: I'm working on this design where I'm getting files from upstream. There could be any number of these files(1,10 or 100 per day) and the file names have the similar file pattern. My quandray is whether I should ask for them in seq files or datasets. If they are sequential files, I can 'cat' them into single file and do continue my processing in datastage. Obviously I prefer to have them in datasets for performance reasons. But I don't know how to concatenate data sets together inside or outside datastage...please help.

Karine

Posted: Thu May 10, 2007 4:54 pm
by ray.wurlod
Yes you do, it's immediately under the File property. And it's a mandatory property, so it will be there. It's only possible values are Overwrite and Append.

Posted: Thu May 10, 2007 6:32 pm
by ravi468
I had the same situation.

we had a SEQ files which worked fine.i dont know how it works for the datasets.
but the filenames should be similar.

for eg: test1.A
test1.B are the 2 files.

so in the file name of properties tab give `ls test1*`
and select read method as filepattern

this is a bourne shell command which lets you cat the files.
Try with data sets.

so the data from test1.A and data from test1.B is the output of the command.

hope this helps .