Seggregating Input based on the input column

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
avenki77
Participant
Posts: 25
Joined: Wed Jul 07, 2004 2:55 pm

Seggregating Input based on the input column

Post by avenki77 »

Hi,

I have a requirement like this:

My input is a few million rows. The file sometimes contain only one day's data and some other times, it has two day's data. So, based on the date column in the input row, I want to direct it to one of my two output links.

I can read the entire file before I start the DS job to find out the dates it contains and pass them as parameters to the job, so that I can compare each of the incoming rows against these job-parameters to direct them accordingly. But the input file is too huge and so am looking for an alternate way other than reading the file twice. So, are there any other ways to do this comparison in the transfomer itself?

Can I store the data that I read from the file into some global variables (not stage variables which gets over-written as I read each row)?

Thanks in advance!
Venkatesh
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If you are sure you will have less than 3 possible different date values, this solution will work. Initialize 3 stage variables in a transformer to some known value, like "-999999", each stage variable will form the constraint for an output link. Have three output links, each with a constraint that compares your date value to links stage variable (inlink.date = stagevar1 for outputlink 1).

The derivation for the first stage variable will be something like "IF stagevar1="-999999" Then inlink.date Else stagevar1". The second second variable will be something like "If stagevar2="-999999" AND inlink.date <> stagevar1 Then inlink.date Else stagevar2". The third link will be "If stagevar3="-999999" AND inlink.date <> stagevar2 AND inlink.date <> stagevar1 Then inlink.date Else stagevar3".

Make sure the stage variables initialize to -999999 and the variables are done in this order.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

As you are running in Unix, you can try something like

for date_val in `cut -cfrom-to filename | uniq -d'
do
grep $date_val filename > filename.$date_val
done

Obviously you need to change the from and to values and check the grep to run successfully.
Post Reply