Remove Header Section (variable num rows)
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 39
- Joined: Tue Apr 15, 2014 9:14 am
Remove Header Section (variable num rows)
I would like to read in a file and remove the header section and footer section (beginning and end of data indicated by begin-data row and end-data row). There is nothing in the non-data rows to mark them as non-data. Can this be done in DataStage?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard.
The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.
Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.
Constrain your output link so that the value of this stage variable must be @TRUE.
Make sure to run this Transformer stage in sequential mode.
The short answer is Yes. Use stage variable to indicate whether you are in a detail row or not, initialized to @FALSE.
Set the stage variable to @TRUE when the previous row (preserved in a later stage variable) contains BEGIN DATA, and set it to @FALSE when the current row contains END DATA.
Constrain your output link so that the value of this stage variable must be @TRUE.
Make sure to run this Transformer stage in sequential mode.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ssnegi:
0, 1, 2 don't carry any meaning; points lost for lack of readability...
@TRUE and @FALSE can be understood by anyone who touches the job.
Start to think beyond 3 days and beyond yourself for ongoing maintenance.
0, 1, 2 don't carry any meaning; points lost for lack of readability...
@TRUE and @FALSE can be understood by anyone who touches the job.
Start to think beyond 3 days and beyond yourself for ongoing maintenance.
Choose a job you love, and you will never have to work a day in your life. - Confucius
-
- Premium Member
- Posts: 39
- Joined: Tue Apr 15, 2014 9:14 am
I am concerned about bucketing and reordering of rows, since I don't fully understand how DataStage handles these things. What is required to guarantee that the rows coming in from a file (probably BDFS stage, or maybe Sequential File stage) are processed all together in a bucket and in the same (original, not sorted) order? And what kind of pitfalls might show up to cause unexpected behavior?
Ray's earlier suggestion keeps the order.
When a parallel job runs on multiple nodes, the data gets partitioned (buckets) and sort order is not guaranteed.ray.wurlod wrote:Make sure to run this Transformer stage in sequential mode.
Choose a job you love, and you will never have to work a day in your life. - Confucius
You could also look into using a Sort stage set to whatever the "Don't sort, already sorted" option is actually called in order to preserve the order. Or perhaps a stable sort. Or using that APT variable that shuts off sort insertions...
Or just use a Server job if nothing about this really needs to be a Parallel job.![Wink :wink:](./images/smilies/icon_wink.gif)
Or just use a Server job if nothing about this really needs to be a Parallel job.
![Wink :wink:](./images/smilies/icon_wink.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers