To split a Sequential File in to two horizantally

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raviyn
Participant
Posts: 57
Joined: Mon Dec 16, 2002 6:03 am

Post by raviyn »

U can split on the basis of the @INROWNUM System variable.
In the transformer put constraints on the above variable for the two links.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That assumes, of course, that you know the number of rows coming in. If you don't, you are going to need to do something to capture that number (assuming that it is a variable) and send it in to your job. You could write a job to count the rows and pass that information, suitably divided, to the splitting job. On Unix, you can take advantage of commands launched by Job Control to do "wc -l" on the file to get the number of records. Depending on what needs to happen with the two files, you could script something using both "wc" and "split" and then process the split files with DataStage.

As they say, more than one way to skin a cat.

-craig
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

FWIW, it is also possible, using DataStage BASIC code, to obtain the number of lines in a file opened for sequential access. This might be relevant, for example, on a Windows platform where you don't have access to the wc command.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Aggregate (rule "Last") with output into a hashed file record with key of "X".
Refer to this record in subsequent Transformer stage, in which its output link contstraints are drive by "< (value/2)" and ">= (value/2)"


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Does it HAVE to be first half and second half?
Would odd rows and even rows be an acceptable split?
In this case, you don't need to pre-count the rows, and can use the Mod(@INROWNUM,2) and Not(Mod(@INROWNUM,2)) constraint expressions.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
raviyn
Participant
Posts: 57
Joined: Mon Dec 16, 2002 6:03 am

Post by raviyn »

I think ray has explained the first part of it
What i meant by splitting with the help of @INROWNUM is something like in one link in transformer the constraint would be @INROWNUM < 500 and in another @INROWNUM >=500....Now if plan to split it exactly into two with the no. of records unkown first u need to get the count of records which if u are not comfortable with routines might look in two jobs :
1) in this job u will count the no. of records with the help of aggregator possibly then pass a parameter(VALUE) to next job with this count
2) in the other job ur constraints might be like < Value/2 and >=Value/2 which is as told by ray earlier only.

But if u are just trying to split based on the way the way the i/p sequential file is coming then whatz the second part of the question u are asking?[?]
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It CAN be done all in one job, because DataStage cannot open the output link from the Hashed File stage until its input link is closed (by the fact of receiving an "end of data" token).


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
Post Reply