Page 1 of 1

Seggregating Input based on the input column

Posted: Fri Jan 28, 2005 2:36 pm
by avenki77
Hi,

I have a requirement like this:

My input is a few million rows. The file sometimes contain only one day's data and some other times, it has two day's data. So, based on the date column in the input row, I want to direct it to one of my two output links.

I can read the entire file before I start the DS job to find out the dates it contains and pass them as parameters to the job, so that I can compare each of the incoming rows against these job-parameters to direct them accordingly. But the input file is too huge and so am looking for an alternate way other than reading the file twice. So, are there any other ways to do this comparison in the transfomer itself?

Can I store the data that I read from the file into some global variables (not stage variables which gets over-written as I read each row)?

Thanks in advance!
Venkatesh

Posted: Fri Jan 28, 2005 2:47 pm
by kcbland
If you are sure you will have less than 3 possible different date values, this solution will work. Initialize 3 stage variables in a transformer to some known value, like "-999999", each stage variable will form the constraint for an output link. Have three output links, each with a constraint that compares your date value to links stage variable (inlink.date = stagevar1 for outputlink 1).

The derivation for the first stage variable will be something like "IF stagevar1="-999999" Then inlink.date Else stagevar1". The second second variable will be something like "If stagevar2="-999999" AND inlink.date <> stagevar1 Then inlink.date Else stagevar2". The third link will be "If stagevar3="-999999" AND inlink.date <> stagevar2 AND inlink.date <> stagevar1 Then inlink.date Else stagevar3".

Make sure the stage variables initialize to -999999 and the variables are done in this order.

Posted: Fri Jan 28, 2005 2:47 pm
by Sainath.Srinivasan
As you are running in Unix, you can try something like

for date_val in `cut -cfrom-to filename | uniq -d'
do
grep $date_val filename > filename.$date_val
done

Obviously you need to change the from and to values and check the grep to run successfully.