Remove Header Section (variable num rows)

eli.nawas_AUS · Post by **eli.nawas_AUS** » Wed Apr 30, 2014 4:21 pm

I would like to read in a file and remove the header section and footer section (beginning and end of data indicated by begin-data row and end-data row). There is nothing in the non-data rows to mark them as non-data. Can this be done in DataStage?

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2014 4:36 pm

Welcome aboard.

The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.

Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.

Constrain your output link so that the value of this stage variable must be @TRUE.

Make sure to run this Transformer stage in sequential mode.

ssnegi · Post by **ssnegi** » Wed Apr 30, 2014 10:20 pm

Make the transformer sequential.
Stage Variable
svFlag Initial Value 0
Derivation : if input.COL = 'BEGIN DATA' then 1 else if input.COL = 'END DATA' then 2 else if svFlag = 1 then 1 else if svFlag = 2 then 2 else 0

Constraint : svFlag = 1 and input.COL <> 'BEGIN DATA'

qt_ky · Post by **qt_ky** » Thu May 01, 2014 5:01 am

ssnegi:

0, 1, 2 don't carry any meaning; points lost for lack of readability...

@TRUE and @FALSE can be understood by anyone who touches the job.

Start to think beyond 3 days and beyond yourself for ongoing maintenance.

ssnegi · Post by **ssnegi** » Thu May 01, 2014 5:26 am

you can modify it to @NULL, @TRUE and @FALSE instead of 0,1,2

eli.nawas_AUS · Post by **eli.nawas_AUS** » Thu May 01, 2014 10:17 am

I am concerned about bucketing and reordering of rows, since I don't fully understand how DataStage handles these things. What is required to guarantee that the rows coming in from a file (probably BDFS stage, or maybe Sequential File stage) are processed all together in a bucket and in the same (original, not sorted) order? And what kind of pitfalls might show up to cause unexpected behavior?

qt_ky · Post by **qt_ky** » Thu May 01, 2014 1:54 pm

Ray's earlier suggestion keeps the order.

ray.wurlod wrote:Make sure to run this Transformer stage in sequential mode.

When a parallel job runs on multiple nodes, the data gets partitioned (buckets) and sort order is not guaranteed.

chulett · Post by **chulett** » Thu May 01, 2014 2:06 pm

You could also look into using a Sort stage set to whatever the "Don't sort, already sorted" option is actually called in order to preserve the order. Or perhaps a stable sort. Or using that APT variable that shuts off sort insertions...

Or just use a Server job if nothing about this really needs to be a Parallel job.