Dynamic Files Creation

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Dynamic Files Creation

Post by samit_9999 »

Hi,

I have a file with data in the following way

Employee Sal
10 100
10 200
10 40
20 400
20 20
20 10
20 100

I want to create files dynamically based on the employee number . In the above case the output should be directed to 2 different files one for each employee.
I do not know the employee numbers in advance ,for me to use constraints and direct the output.

Please let me know how can this be acheived in DataStage . If not in DataStage can this be done in Unix , or any other way.

Thanks Much in Advance

Sam
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I know we've had this conversation before here, so a search may turn something up. I'm sure you've already figured out that the Sequential File stage isn't up to this particular task.

One way would be to bone up on your BASIC and write a job control routine to accomplish this. There you can control the naming, closing and opening of your output files based on the data.

Odd thought, but I wonder if the XML Output stage could do this as well? It doesn't need to output XML per se as it allows 'pass through' columns and can automatically switch to a new output filename when the value in a particular column changes. Of course, you wouldn't have the fine grain control over the actual name used, but it wouldn't require any hand coding. I wonder...

I'm sure there's other ways to approach this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Use the power of unix. Sort it by the first column, get unique values of the first column using unix uniq command to build the files. Then use the grep to get all the values for a particular employee. Something like

Code: Select all

sort myfile.txt | uniq | awk '{print $1} > filenames.txt
cat filenames | while read FileNames
do
   cat myfile.txt | grep $FileNames > ${FileNames}.txt
done
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Post by samit_9999 »

Hi DSGuru!!!

Thanks it worked great.

There were however some minor problems which i want to get clarified

Here is the file myfile.txt i created
ab cd
10 100
10 50
20 30
10 40
10 50
20 60

srt.sh
sort myfile.txt | uniq | awk '{print $1}' > filenames.txt
cat filenames.txt | while read FileNames
do
cat myfile.txt | grep $FileNames > ${FileNames}.txt
done

I execute it as follows
srt.sh ab

It create 4 different files
10.txt
20.txt
filenames.txt
ab.txt

filenames.txt ideally should have had unique values , but it has the following values
10
10
10
20
20
ab

Is there a way to fix this.

Thanks once again!!

Sam
uegodawa
Participant
Posts: 71
Joined: Thu Apr 27, 2006 12:46 pm

Post by uegodawa »

If you change the unix scripts as follows, you may not seen any duplicates

sort myfile.txt | awk '{print $1}' | uniq
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I think uegodawa is right. Try it. I dont have access to unix at the moment, otherwise i would have tested it out for you.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

If you want all columns.
sort yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
sort yourfile.txt | awk '{print $2 >$1".txt";}'
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

Ultramundane wrote:If you want all columns.
sort yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
sort yourfile.txt | awk '{print $2 >$1".txt";}'
Ultra,

You actually don't need to do the sort. awk keeps track of what files it has opened.

If you want all columns.
cat yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
cat yourfile.txt | awk '{print $2 >$1".txt";}'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Sort is needed for the uniq. And uniq is needed for the file names. And yes then the entire record pertaining to that particular Employee willl be loaded to that file which is taken care off by the grep.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

DSguru2B wrote:Sort is needed for the uniq. And uniq is needed for the file names. And yes then the entire record pertaining to that particular Employee willl be loaded to that file which is taken care off by the grep.
I absolutely agree with you that your algorithm must do this. However, with just awk you don't even need to sort. Just another solution. They both work well.
samit_9999
Participant
Posts: 20
Joined: Thu Oct 06, 2005 12:23 pm

Post by samit_9999 »

Thanks to all you guys for the suggestions.

That did solve my problem , except that ,if i have a header in my original file and i want to ensure that the header is part of all the sub-files that are created , how do i do that
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

how many lines does the header occupy
You can do a head -2, store it in a file and at the end, right after the creation of the sub files, concatenate them.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply