Read sequentially

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ajay.vaidyanathan
Participant
Posts: 53
Joined: Fri Apr 18, 2008 8:13 am
Location: United States

Read sequentially

Post by ajay.vaidyanathan »

Hi,

My requirement is to read the lines of data sequentially from the source file.
My source structure would be like following:

STARTMSG..........<msg_id1>
1.-----------
2.-----------
3.-----------
4.-----------
5.-----------
STARTMSG..........<msg_id2>
6.-----------
7.-----------
8.-----------
9.-----------
10.----------

The data lines following the STARTMSG tag belongs to that particular <msg_id>. I have to read the data so that I do not fetch the data line from some other <msg_id> (which will happen in case of parallelism)
(i.e) I want to read 1 till 5 continuously to relate it with <msg_id1> and should not skip between lines.

I'm reading this file using a sequential file stage with default sequential mode.

I want to ensure that the data is read sequentially only and does not go for parallelism even if I use a multi-node configuration file.

Can you confirm me that always the file will be read sequentially only?

Note: Using a one-node configuration is not feasible since, further process in this job involves about 5 million records which needs to be worked out using multi-node configuration.
Regards
Ajay
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

You should be more worried about partitioning. Because reading is sequential, but further down stream, it would be partitioned if you are running on multiple nodes.
"given enough eyeballs, all bugs are shallow" - Eric S. Raymond
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

YOU are in control. If you need everything to run in sequential mode, you can specify this in a number of ways. The default is otherwise, so you will need to take some action.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsusr
Premium Member
Premium Member
Posts: 104
Joined: Sat Sep 03, 2005 11:30 pm

Post by dsusr »

Either run the job in sequential mode or insert a transformer after the sequential file to copy the msg_id with each if the messages and later you can partition all the data based on msg_id.
Post Reply