Automation process in Datastage for File Splitting

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
santanu84
Participant
Posts: 14
Joined: Mon May 26, 2008 7:17 am
Location: Kolkata

Automation process in Datastage for File Splitting

Post by santanu84 »

Hi All

I am trying to create one job which will split one incoming xml file into several xml files depending on the record length and I need to do this job in an automated fashion.

Can any body help me in this, how I can achive this?

The scenario is,

Say I have two text files.......... "test1" which have one field RIF as record and "test2" which have three fields RIF, SKU and SELL_YEAR as record.
But the point is we don't know what is the length of these two file, they could be of any length.

Now, depending on that RIF field value I need to join those two files to create one XML output.

Since we do not know the record length of those text files, so after join what could be the length of that output XML "XML_total" file we can not predict.

Now, I will read that output XML file named "XML_total" as input and will split that file into some number of xml files.

The only thing here specified is those splitted output xml files will have atmost 60000 records.

But the problem which I am facing is, since I can not predict what is the length of "XML_total" file, so we can not predict what could be the number of splitted output xml files.

Say if today, "XML_total" file has some 600000 records then each splitted file having 60000, we can create 10 files. But if tomorrow it has 1200000 records then the file number will turn into 20.

In datastage we need to do this in an automated fashion. Where it will look for the record length of input xml and will split the file into certain number of files each having 60000 records.

Can we do it in this fashion in datastage?

If yes, then how?

If no, then is there any other solution available in datastage itself for such scenario?

Thanking You
Hi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Look into the use of the 'Trigger Column' Output mode on the Transformation Settings tab. Each time the value in that column changes, it splits off a new XML file. Arrange so it changes every 60,000 records.
-craig

"You can never have too many knives" -- Logan Nine Fingers
santanu84
Participant
Posts: 14
Joined: Mon May 26, 2008 7:17 am
Location: Kolkata

Post by santanu84 »

Hi

I have tried with the solution but still it is not clear. It seems that XML Input stage does not have this trigger column and output mode......it is present in XML Output stage.

So while designing the stages how and where can I code that 60000 record length to split that file?
Hi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It is present only in the XML Output stage as it is the only one actually producing XML. And recognize the fact that it would be driven by the number of records going into the stage, not any kind of output record length. You seem to be using the two concepts interchangeably... it confuseth me. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
santanu84
Participant
Posts: 14
Joined: Mon May 26, 2008 7:17 am
Location: Kolkata

Post by santanu84 »

Hi

I got your point.......but the thing is how can I incorporate the logic that each file will have 60000 records, can I code it any way in an automated fashion?

Or is there any stage available where I can code this out?


Thanking You

:oops:

:D
Hi
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

By using the Trigger Column as noted. It doesn't need to be used in the XML, just exist on the link feeding the XML Output stage. Then code (in a Transformer, for example) for it to change every 60000 records. Perhaps a mod() or a simple if-then-else bit of logic to increment it based on @OUTROWNUM. The stage will do the rest automatically.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply