Need to create files for two column combination

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Need to create files for two column combination

Post by gssr »

I am querying three columns from a table, i have to create a file for every distinct combination of first two column

For eg,
Let the columns be..
col1:col2:col3
23:aa:56325
23:bb:45787
24:aa:54875
24:aa:78755

i need files as,

file1 : file_23_aa
23:aa:56325

file2 : file_23_bb
23:bb:45787

file3 : file_24_aa
24:aa:54875
24:aa:78755


Any solutions!!!!
RAJ
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Being in Unix, you can use shell scripting to perform that.

You can try something like

Code: Select all

awk -F":" '{print > $1_$2}' yourFileName
Or

Code: Select all

for i in `awk -F":" { print "$1:$2" } yourFileName | sort -u`
do
  grep "^$i" yourFileName > $i
done
Note - This is written in notepad on the fly and not been tested. So you may have to test and modify significantly.
In case of second option, you will have to ensure that your OS does not consider the filename as a remote host name.
gssr
Participant
Posts: 243
Joined: Fri Jan 09, 2009 12:51 am
Location: India

Post by gssr »

Is there any way that we can achieve this in DAtastage??
RAJ
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, you can use a server job with a Folder stage.

You may be able to do something with XML, but that will require a bit more thought.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Isn't your DataStage running on Unix or can't you call shell from your sequence ? :roll:

Do you know the different possible combinations ? If yes, you can do in DataStage.

Also you can do external stage and use the Unix code.

Remember seeing a blog to split files using C code. It is really simple but no time to write now.
hemachandra.m
Participant
Posts: 27
Joined: Wed Jan 03, 2007 1:29 am

Post by hemachandra.m »

Create a jobs as below

Job 1:
1. First get the unique records of the two columns from actual table and have it in two files (COL1.txt) and (COL2.txt)

Job 2:
1. Get the count of the records from (COL1.txt) and (COL2.txt) and load it into COUNT.txt file. Count of (COL1.txt) and (COL2.txt) should be equal.

Job3: (Run Multiple Instance)
1. Follow the below steps for creating this job:
a. Select records from your actual table by using any database stage and having user defined SQL query Like..
Select col1,col2,col3 From xxxxxxx Where Col1= #Param1# And Col2 = #Param2#.
b. Apply your transformation rules by using different stages.
c. Finally you can have SeqFile/Dataset. Create the file Like : File_#Param1#_#Param2#

Note : Param1 and Param2 are job parameters that need to be created in the job properties and have the Unique invocation_Id for each run.

Create Sequence :
1. MainSequence: Call job1 and job2 and Subsequence1

2. SubSequence1 : This sequence will read the value from COUNT.txt file and will be triggered 'n' time.(n = total count from COUNT.txt file).
a. Use Executecommand activity stage to cat the COUNT.txt file (Job2) and get the count (Example..5)
b. Use Variablestage activity to get the count of records (Ex:5) and assign it to one variable.
c. Use Start Loop Activity , start by value of 1, for ending assign the end by value ="Variable Stage Variable"
d. Use Job Activity to Call Subsequence1
e. Use End Loop Activity to End the sequence

3. Subsequence2: This Subsequence1 will read the values from COL1.txt and COL2.txt files and assign it to "Param1" and "Param2".
a. Use Executecommand activity stage to read the values from COL1.txt file (Job1) one by one of every run
b. Use Variablestage activity to get the value of COL1 assign it to variable called var_Col1
Same for COL2 also and Create variable Called var_COL2

c. Use jobactivity stage to call the actual job (Job3) and assign the values for the job parameters to : Param1 = var_Col1 and Param2 = var_Col2.

There are different ways to achieve this as this time I got this idea. If this is not smart enough please excuse.
Hemachandra
Post Reply