Read sequentially

ajay.vaidyanathan · Post by **ajay.vaidyanathan** » Tue Sep 09, 2008 5:22 am

Hi,

My requirement is to read the lines of data sequentially from the source file.
My source structure would be like following:

STARTMSG..........<msg_id1>
1.-----------
2.-----------
3.-----------
4.-----------
5.-----------
STARTMSG..........<msg_id2>
6.-----------
7.-----------
8.-----------
9.-----------
10.----------

The data lines following the STARTMSG tag belongs to that particular <msg_id>. I have to read the data so that I do not fetch the data line from some other <msg_id> (which will happen in case of parallelism)
(i.e) I want to read 1 till 5 continuously to relate it with <msg_id1> and should not skip between lines.

I'm reading this file using a sequential file stage with default sequential mode.

I want to ensure that the data is read sequentially only and does not go for parallelism even if I use a multi-node configuration file.

Can you confirm me that always the file will be read sequentially only?

Note: Using a one-node configuration is not feasible since, further process in this job involves about 5 million records which needs to be worked out using multi-node configuration.

mahadev.v · Post by **mahadev.v** » Tue Sep 09, 2008 5:29 am

You should be more worried about partitioning. Because reading is sequential, but further down stream, it would be partitioned if you are running on multiple nodes.

ray.wurlod · Post by **ray.wurlod** » Tue Sep 09, 2008 6:05 am

YOU are in control. If you need everything to run in sequential mode, you can specify this in a number of ways. The default is otherwise, so you will need to take some action.

dsusr · Post by **dsusr** » Tue Sep 09, 2008 8:25 am

Either run the job in sequential mode or insert a transformer after the sequential file to copy the msg_id with each if the messages and later you can partition all the data based on msg_id.