Page 1 of 2

Splitting One File to Multiple Files

Posted: Tue Dec 19, 2006 11:54 pm
by 120267
Hi,

I want to split the Product Sales File, Which is having 100 Products.I have to split that file and store it as 100 Flat files with Product name as the File Name.We are not supposed to use the loop activity to trigger the same job for 100 times.Is there any other way to do it in the same datastage job.

Posted: Tue Dec 19, 2006 11:58 pm
by ray.wurlod
Heck, yes. A Filter stage or a Transformer stage with 100 output links. Easy.

Posted: Wed Dec 20, 2006 12:00 am
by narasimha
Giving a sample of your Product Sales File, can help.
Also give a sample of how you output file should look.

Posted: Wed Dec 20, 2006 12:23 am
by 120267
Ray,

The count of the products are not Defined, It is dynamic.If we get 60 products we have to split as 60 files with product name as the file name.We may get more than 100 products also.Is there any solution with out implementing using loop.

Posted: Wed Dec 20, 2006 12:41 am
by chulett
You could always code something up in BASIC to read the input and write the output to different filenames... or you could use the Folder stage. :wink:

Read the Server Job Developer's Guide section on it to see how it can dynamically write to many files in a directory.

Posted: Wed Dec 20, 2006 12:44 am
by 120267
narasimha,

It should be like this

Input File :Product.txt

Product_name Region Level Sales
A Adc 1 2300$
A Adb 1 2300$
A Ad1 1 2300$
A Ad2 1 2300$
B Adc 1 2300$
B Adb 1 2300$
B Ad1 1 2300$
B Ad2 1 2300$
C Adc 1 2300$
C Adb 1 2300$
C Ad1 1 2300$
C Ad2 1 2300$

I want the out put as 3 files:


Out Put Files:

A.txt

A Adc 1 2300$
A Adb 1 2300$
A Ad1 1 2300$
A Ad2 1 2300$

B.txt

B Adc 1 2300$
B Adb 1 2300$
B Ad1 1 2300$
B Ad2 1 2300$

C.txt

C Adc 1 2300$
C Adb 1 2300$
C Ad1 1 2300$
C Ad2 1 2300$

Posted: Wed Dec 20, 2006 12:45 am
by chulett
(pssst... Folder stage)

Posted: Wed Dec 20, 2006 1:47 am
by chulett
Did you look in the Server Jobs Developer's Guide pdf as I mentioned?

In my 7.5.1A version, Chapter 11 is dedicated to the Folder stage and the Folder Stage Input Data section tells you what you need to know to use the folder stage to write multiple files into a directory and how you can control the names of those files.

I'm not about to transcribe that chapter into this forum, so... go give it a read, try to use it in your job and if you've done that and you have some specific questions - come on back with them.

Posted: Wed Dec 20, 2006 3:00 am
by 120267
Great Thanks chulett,

I have tried it. It's working fine.But the file is having the latest record.Is there any property to set "Append the File"

If i gave the input File as...

Input File :Product.txt
Product_name Region Level Sales
A Adc 1 2300$
A Adb 1 2300$
A Ad1 1 2300$
A Ad2 1 2300$
B Adc 1 2300$
B Adb 1 2300$
B Ad1 1 2300$
B Ad2 1 2300$
C Adc 1 2300$
C Adb 1 2300$
C Ad1 1 2300$
C Ad2 1 2300$

The out put of 3 files are


Out Put Files:

A.txt

A Ad2 1 2300$

B.txt

B Ad2 1 2300$

C.txt

C Ad2 1 2300$

But it should not be like this, it should contain all the records.

Posted: Wed Dec 20, 2006 4:22 am
by ray.wurlod
The Sequential File stage does have append and overwrite as write methods. But the question remains: how are you parsing the results transmitted by the Folder stage into separate files?

Posted: Wed Dec 20, 2006 8:39 am
by chulett
120267 wrote:I have tried it. It's working fine. But...
In other words, it's not working fine. :wink:

Let me be the first to admit I've never actually used the Folder stage as a target, never had a need to. But I seem to recall others using it here and reporting success, hence the recommendation.

The docs say the first column must be marked as a Key and contain the filename. Ah... they then go on to say that the remaining columns:
are written to the named file, each column separated by a newline. Data to be written to a directory would normally be delivered in a single column.
And the example shows a single LongVarchar field. So it sounds like to use that stage you'd have to reverse what it does when it reads a file - put everything in one field for each Product. :(

To anyone whom has done this in the past - is that correct?

Worst case you could bone up on the sequential file processing functions that BASIC has (OPENSEQ,CLOSESEQ,etc) and write up some custom job control code to do this... wouldn't be all *that* hard.

Posted: Wed Dec 20, 2006 9:14 am
by DSguru2B
Lets think outside of DataStage shall we. You can do this via a unix script. Here is what i can offer

Code: Select all

#!/usr/bin/ksh

export filepath=/Data/SFDCDEV/scripts/dsx.txt
export tempFile=/Data/SFDCDEV/scripts/my.tmp
export newFileDir=/Data/SFDCDEV/scripts

cat $filepath | sort | awk -F"\ " '{print $1}' | uniq > $tempFile
cat $tempFile | while read filename
do
  cat $filepath | grep -w $filename > $newFileDir/$filename.txt
done
rm -f $tempFile
echo "All done"
Change the variables according to your environment. Basically
filepath is the text file that you want to manipulate
tempFile is a temporary file needed for manipulation. It will be deleted at the end of the script.
newFileDir is where you want your new files to be created.

Posted: Wed Dec 20, 2006 9:40 am
by chulett
DSguru2B wrote:Lets think outside of DataStage shall we.
Ha! What do you think this is... AwkXchange? :P

Posted: Wed Dec 20, 2006 9:41 am
by DSguru2B
Ha Ha Ha. Three Ha's from me :wink:
THis is more like, GettingMyWorkDoneNoMatterWhatXchange. How about that ?

Posted: Wed Dec 20, 2006 1:59 pm
by Ultramundane
Awk keeps track of open files.

awk '{ print $0 >"/outfilepath/"$1".txt"; }' <infile.txt