Page 1 of 1

Can we change the parameter value at run time?

Posted: Mon Jan 10, 2011 3:57 am
by Md Dawar Mughni
Hi All,

The requirement is to split the file into multiple files based on
1.Each file should contain specific number of records or less then that
and
2.All the key columns should be in single file

Eg:
source file :

col1 col2
1 a
1 b
1 c
2 d
3 k
3 g
4 b
5 x
5 b


if number of records in each o/p file should be <= 4
then outpu files would be

file1:

col1 col2
1 a
1 b
1 c
2 d

file2:

col1 col2
3 k
3 g
4 b

file3:
col1 col2
5 x
5 b

How to impliment it in Datastage?

Appreciate the help in advance.

Warm Regards,
-Dawar

Posted: Mon Jan 10, 2011 8:22 am
by chulett
Can you explain, please, how your choice of subject is related to the question that you asked? :?

Posted: Mon Jan 10, 2011 12:20 pm
by dsx999
What should happen if you have more than 4 same key values (as per your example)?

Posted: Mon Jan 10, 2011 8:26 pm
by chulett
I only asked to make sure we're answering the right question, that we know everything that was behind bringing you here. Off the bat, I don't see the connection between them... but maybe that's just me.

Posted: Tue Jan 11, 2011 12:35 am
by Kirtikumar
Can you please explain what if - the no. of records mentioned are <=4 and there are more than 4 records for a particular key. E.g. instead of 3 records for key 1, you have 9 records?

Posted: Tue Jan 11, 2011 2:31 am
by Kirtikumar
Try the below. I think it would work.

Aggregate the data on key col and get count for keys. so in your case it would be:
1 - 3
2 - 1
3 - 2
4 - 1
5 - 2

Sort this data for transformer input without partitioning. Then stage vars should be:
RecordCount = If CurrFile = PrevFile or PrevFile = file0 Then RecordCount + CurrCount Else CurrCount
CurrRecCount = Incoming RecCount
PrevFile = CurrFile
CurrFile = If (RecordCount + CurrRecCount) <= 4 then CurrFile Else CurrFile + 1

It would work as below - first line with 0 rownum shows the initial values for stage vars:

Code: Select all

RowNum   RecordCount       CurrRecCount     PrevFile             CurrFile
0        0                0                 file0                file1
1        0                3                 file1                file1
2        3                1                 file1                file1
3        4                2                 file1                file2
4        2                1                 file2                file2
5        3                2                 file2                file3
Now join this with your original data and you have record and to which file it should go.

Posted: Tue Jan 11, 2011 3:43 am
by Md Dawar Mughni
dsx999 wrote:What should happen if you have more than 4 same key values (as per your example)?
Assumption is there would not be more than 4 same key.

Posted: Tue Jan 11, 2011 4:22 am
by Md Dawar Mughni
chulett wrote:I only asked to make sure we're answering the right question, that we know everything that was behind bringing you here. Off the bat, I don't see the connection between them... but maybe that's just me.
Requirement is to slpit a file into multiple files based on a fixed number of records ( That is in my example is 4) and provided all the same keys should be in the same file.

If there is any dould please let me know we clarify more.

Posted: Tue Jan 11, 2011 5:36 am
by Kirtikumar
Whatever I mentioned was to be done in the first job. Then call the second job multiple times for no. of rows created by first job.

During each call, pass the filename from the file created in the first job. In the job using join with the original file and created file from first job, you can get the desired result.

Posted: Tue Jan 11, 2011 7:51 am
by chulett
Md Dawar Mughni wrote:
chulett wrote:I only asked to make sure we're answering the right question, that we know everything that was behind bringing you here. Off the bat, I don't see the connection between them... but maybe that's just me.
Requirement is to slpit a file into multiple files based on a fixed number of records ( That is in my example is 4) and provided all the same keys should be in the same file.

If there is any dould please let me know we clarify more.
Sorry, but this still doesn't do anything to answer my question - what does this requirement (which you are clarifying here) have to do with your subject of "Can we change the parameter value at run time?". However, I'm just going to let that go and stop worrying about it now.

Carry on.

Posted: Tue Jan 11, 2011 10:36 am
by Md Dawar Mughni
Well let me clarify that,
I was thinking of changing the parameter value at run time
1. Have two parameters

a.FileName(value as "SplitedFile")
b.Suffix (value as 1)

And use it in the File stage where all the files should get created
#File_Path/#FileName#Suffix#.txt

2. The value of the Parameter Suffix should be changed inside the trnasformer as per the number of records, Eg: for first 4 recorde it will be 1 for next 5 it will be 2 and so on

(But I dont know wheter we can do it or not because in transformer I didn get any thing related )


-------- ------------------ ---------------------
Input File Stage ----> Trnasformer --------> OutputFile Stage
-------- ------------------- ----------------------

But Not very sure how can we do this....

Please assist...

Posted: Tue Jan 11, 2011 11:05 am
by jwiles
You can't do this the way you are currently envisioning for two main reasons:
1) You can't change job parameters on the fly while the job is running (they are resolved at job submission time only)
2) SeqFile doesn't support closing and opening multiple files during a job run.

As the number of files/number of records per file may change from run to run, one potential option is to use a BuildOp, custom operator or external target to handle the file writes. Your transformer could pass the filename as a column. I would envision something like this:

Input File->Transformer->Column Export->[BuildOp or CustomOp or ExtTarget]

The purpose of the Column Export would be to create your final output record and place it in a single column to the file-handling stage.

Regards,