Page 1 of 1

Remove Header Section (variable num rows)

Posted: Wed Apr 30, 2014 4:21 pm
by eli.nawas_AUS
I would like to read in a file and remove the header section and footer section (beginning and end of data indicated by begin-data row and end-data row). There is nothing in the non-data rows to mark them as non-data. Can this be done in DataStage?

Posted: Wed Apr 30, 2014 4:36 pm
by ray.wurlod
Welcome aboard.

The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.

Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.

Constrain your output link so that the value of this stage variable must be @TRUE.

Make sure to run this Transformer stage in sequential mode.

Posted: Wed Apr 30, 2014 10:20 pm
by ssnegi
Make the transformer sequential.
Stage Variable
svFlag Initial Value 0
Derivation : if input.COL = 'BEGIN DATA' then 1 else if input.COL = 'END DATA' then 2 else if svFlag = 1 then 1 else if svFlag = 2 then 2 else 0

Constraint : svFlag = 1 and input.COL <> 'BEGIN DATA'

Posted: Thu May 01, 2014 5:01 am
by qt_ky
ssnegi:

0, 1, 2 don't carry any meaning; points lost for lack of readability...

@TRUE and @FALSE can be understood by anyone who touches the job.

Start to think beyond 3 days and beyond yourself for ongoing maintenance.

Posted: Thu May 01, 2014 5:26 am
by ssnegi
you can modify it to @NULL, @TRUE and @FALSE instead of 0,1,2

Posted: Thu May 01, 2014 10:17 am
by eli.nawas_AUS
I am concerned about bucketing and reordering of rows, since I don't fully understand how DataStage handles these things. What is required to guarantee that the rows coming in from a file (probably BDFS stage, or maybe Sequential File stage) are processed all together in a bucket and in the same (original, not sorted) order? And what kind of pitfalls might show up to cause unexpected behavior?

Posted: Thu May 01, 2014 1:54 pm
by qt_ky
Ray's earlier suggestion keeps the order.
ray.wurlod wrote:Make sure to run this Transformer stage in sequential mode.
When a parallel job runs on multiple nodes, the data gets partitioned (buckets) and sort order is not guaranteed.

Posted: Thu May 01, 2014 2:06 pm
by chulett
You could also look into using a Sort stage set to whatever the "Don't sort, already sorted" option is actually called in order to preserve the order. Or perhaps a stable sort. Or using that APT variable that shuts off sort insertions...

Or just use a Server job if nothing about this really needs to be a Parallel job. :wink: