Remove Header Section (variable num rows)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

Remove Header Section (variable num rows)

Post by eli.nawas_AUS »

I would like to read in a file and remove the header section and footer section (beginning and end of data indicated by begin-data row and end-data row). There is nothing in the non-data rows to mark them as non-data. Can this be done in DataStage?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.

Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.

Constrain your output link so that the value of this stage variable must be @TRUE.

Make sure to run this Transformer stage in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Make the transformer sequential.
Stage Variable
svFlag Initial Value 0
Derivation : if input.COL = 'BEGIN DATA' then 1 else if input.COL = 'END DATA' then 2 else if svFlag = 1 then 1 else if svFlag = 2 then 2 else 0

Constraint : svFlag = 1 and input.COL <> 'BEGIN DATA'
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

ssnegi:

0, 1, 2 don't carry any meaning; points lost for lack of readability...

@TRUE and @FALSE can be understood by anyone who touches the job.

Start to think beyond 3 days and beyond yourself for ongoing maintenance.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

you can modify it to @NULL, @TRUE and @FALSE instead of 0,1,2
eli.nawas_AUS
Premium Member
Premium Member
Posts: 39
Joined: Tue Apr 15, 2014 9:14 am

Post by eli.nawas_AUS »

I am concerned about bucketing and reordering of rows, since I don't fully understand how DataStage handles these things. What is required to guarantee that the rows coming in from a file (probably BDFS stage, or maybe Sequential File stage) are processed all together in a bucket and in the same (original, not sorted) order? And what kind of pitfalls might show up to cause unexpected behavior?
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Ray's earlier suggestion keeps the order.
ray.wurlod wrote:Make sure to run this Transformer stage in sequential mode.
When a parallel job runs on multiple nodes, the data gets partitioned (buckets) and sort order is not guaranteed.
Choose a job you love, and you will never have to work a day in your life. - Confucius
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could also look into using a Sort stage set to whatever the "Don't sort, already sorted" option is actually called in order to preserve the order. Or perhaps a stable sort. Or using that APT variable that shuts off sort insertions...

Or just use a Server job if nothing about this really needs to be a Parallel job. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply