Page 1 of 1
Remove Header Section (variable num rows)
Posted: Wed Apr 30, 2014 4:21 pm
by eli.nawas_AUS
I would like to read in a file and remove the header section and footer section (beginning and end of data indicated by begin-data row and end-data row). There is nothing in the non-data rows to mark them as non-data. Can this be done in DataStage?
Posted: Wed Apr 30, 2014 4:36 pm
by ray.wurlod
Welcome aboard.
The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.
Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.
Constrain your output link so that the value of this stage variable must be @TRUE.
Make sure to run this Transformer stage in sequential mode.
Posted: Wed Apr 30, 2014 10:20 pm
by ssnegi
Make the transformer sequential.
Stage Variable
svFlag Initial Value 0
Derivation : if input.COL = 'BEGIN DATA' then 1 else if input.COL = 'END DATA' then 2 else if svFlag = 1 then 1 else if svFlag = 2 then 2 else 0
Constraint : svFlag = 1 and input.COL <> 'BEGIN DATA'
Posted: Thu May 01, 2014 5:01 am
by qt_ky
ssnegi:
0, 1, 2 don't carry any meaning; points lost for lack of readability...
@TRUE and @FALSE can be understood by anyone who touches the job.
Start to think beyond 3 days and beyond yourself for ongoing maintenance.
Posted: Thu May 01, 2014 5:26 am
by ssnegi
you can modify it to @NULL, @TRUE and @FALSE instead of 0,1,2
Posted: Thu May 01, 2014 10:17 am
by eli.nawas_AUS
I am concerned about bucketing and reordering of rows, since I don't fully understand how DataStage handles these things. What is required to guarantee that the rows coming in from a file (probably BDFS stage, or maybe Sequential File stage) are processed all together in a bucket and in the same (original, not sorted) order? And what kind of pitfalls might show up to cause unexpected behavior?
Posted: Thu May 01, 2014 1:54 pm
by qt_ky
Ray's earlier suggestion keeps the order.
ray.wurlod wrote:Make sure to run this Transformer stage in sequential mode.
When a parallel job runs on multiple nodes, the data gets partitioned (buckets) and sort order is not guaranteed.
Posted: Thu May 01, 2014 2:06 pm
by chulett
You could also look into using a Sort stage set to whatever the "Don't sort, already sorted" option is actually called in order to preserve the order. Or perhaps a stable sort. Or using that APT variable that shuts off sort insertions...
Or just use a Server job if nothing about this really needs to be a Parallel job.