Dynamic Files Creation

samit_9999 · Post by **samit_9999** » Mon Aug 14, 2006 6:28 pm

Hi,

I have a file with data in the following way

Employee Sal
10 100
10 200
10 40
20 400
20 20
20 10
20 100

I want to create files dynamically based on the employee number . In the above case the output should be directed to 2 different files one for each employee.
I do not know the employee numbers in advance ,for me to use constraints and direct the output.

Please let me know how can this be acheived in DataStage . If not in DataStage can this be done in Unix , or any other way.

Thanks Much in Advance

Sam

chulett · Post by **chulett** » Mon Aug 14, 2006 8:12 pm

I know we've had this conversation before here, so a search may turn something up. I'm sure you've already figured out that the Sequential File stage isn't up to this particular task.

One way would be to bone up on your BASIC and write a job control routine to accomplish this. There you can control the naming, closing and opening of your output files based on the data.

Odd thought, but I wonder if the XML Output stage could do this as well? It doesn't need to output XML per se as it allows 'pass through' columns and can automatically switch to a new output filename when the value in a particular column changes. Of course, you wouldn't have the fine grain control over the actual name used, but it wouldn't require any hand coding. I wonder...

I'm sure there's other ways to approach this.

DSguru2B · Post by **DSguru2B** » Mon Aug 14, 2006 11:45 pm

Use the power of unix. Sort it by the first column, get unique values of the first column using unix uniq command to build the files. Then use the grep to get all the values for a particular employee. Something like

Code: Select all

sort myfile.txt | uniq | awk '{print $1} > filenames.txt
cat filenames | while read FileNames
do
   cat myfile.txt | grep $FileNames > ${FileNames}.txt
done

samit_9999 · Post by **samit_9999** » Tue Aug 15, 2006 8:17 am

Hi DSGuru!!!

Thanks it worked great.

There were however some minor problems which i want to get clarified

Here is the file myfile.txt i created
ab cd
10 100
10 50
20 30
10 40
10 50
20 60

srt.sh
sort myfile.txt | uniq | awk '{print $1}' > filenames.txt
cat filenames.txt | while read FileNames
do
cat myfile.txt | grep $FileNames > ${FileNames}.txt
done

I execute it as follows
srt.sh ab

It create 4 different files
10.txt
20.txt
filenames.txt
ab.txt

filenames.txt ideally should have had unique values , but it has the following values
10
10
10
20
20
ab

Is there a way to fix this.

Thanks once again!!

Sam

uegodawa · Post by **uegodawa** » Tue Aug 15, 2006 12:01 pm

If you change the unix scripts as follows, you may not seen any duplicates

sort myfile.txt | awk '{print $1}' | uniq

DSguru2B · Post by **DSguru2B** » Tue Aug 15, 2006 12:07 pm

I think uegodawa is right. Try it. I dont have access to unix at the moment, otherwise i would have tested it out for you.

Ultramundane · Post by **Ultramundane** » Tue Aug 15, 2006 12:49 pm

If you want all columns.
sort yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
sort yourfile.txt | awk '{print $2 >$1".txt";}'

Ultramundane · Post by **Ultramundane** » Tue Aug 15, 2006 12:57 pm

Ultramundane wrote:If you want all columns.
sort yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
sort yourfile.txt | awk '{print $2 >$1".txt";}'

Ultra,

You actually don't need to do the sort. awk keeps track of what files it has opened.

If you want all columns.
cat yourfile.txt | awk '{print $0 >$1".txt";}'

If you just want sal in each file
cat yourfile.txt | awk '{print $2 >$1".txt";}'

DSguru2B · Post by **DSguru2B** » Tue Aug 15, 2006 1:07 pm

Sort is needed for the uniq. And uniq is needed for the file names. And yes then the entire record pertaining to that particular Employee willl be loaded to that file which is taken care off by the grep.

Ultramundane · Post by **Ultramundane** » Tue Aug 15, 2006 1:50 pm

DSguru2B wrote:Sort is needed for the uniq. And uniq is needed for the file names. And yes then the entire record pertaining to that particular Employee willl be loaded to that file which is taken care off by the grep.

I absolutely agree with you that your algorithm must do this. However, with just awk you don't even need to sort. Just another solution. They both work well.

samit_9999 · Post by **samit_9999** » Wed Aug 16, 2006 11:45 am

Thanks to all you guys for the suggestions.

That did solve my problem , except that ,if i have a header in my original file and i want to ensure that the header is part of all the sub-files that are created , how do i do that

DSguru2B · Post by **DSguru2B** » Wed Aug 16, 2006 1:06 pm

how many lines does the header occupy
You can do a head -2, store it in a file and at the end, right after the creation of the sub files, concatenate them.