Reading Sequential File is Parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Reading Sequential File is Parallel

Post by gagan8877 »

I am trying to read a big Flat file, that takes 20 min to open up by Notepad++.

The question is related to "No. Of Nodes per Node" Vs. "Read from Multiple Nodes" properties. From posts I gathered:

1. Sequential Files can only be read sequential i.e. 1 file per node.
2. "Read from Multiple Nodes" property works only when multiple files are read.
3. A single file can be read in parallel with "No. of readers per Node" set to greater then 1. but only 1 Node can read.

The Parallel Job Developer's Guide says that this can be done only for Fixed length files, but in one of the posts someone experimented with delimited files and achieved a gain in performance.

And the two properties are mutually exclusive. So if only one can be set at one time: how can we read multiple files using multiple nodes and have multiple readers on each node, as some of the posts say or did I comprehend incorrectly?

Only one can happen at one time - either read multiple files with mutiple nodes w/one reader each or read one file with one node and multiple readers.

Parallel Guide has a diagram on Page 5-31, that actually contradicts the above - One file is being read by multiple nodes w/ one reader each.

Wanted to know which one is correct? Is the above statement true or false?

Can someone please explain which ones are correct?

1. One file per node - True/False

2. Only one can happen at one time - either read multiple files with mutiple nodes w/one reader each or read one file with one node and multiple readers. - True/False

3. Is the daigram wrong?

Thanks
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. False.
2. False.
3. No.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Confusion Rising....

Post by gagan8877 »

ray.wurlod wrote:1. False.
2. False.
3. No.
Thanks Ray - I am actually more confused now.

The posts that I read earlier were opposite of this:

viewtopic.php?t=104589
viewtopic.php?t=120434

Or may be I misunderstood. :?

Do you mean that

1. If I specify multiple Sequential files and number of readers >1 - Parallel engine will automatically read 1 file per node with >1 readers in each node? Y/N

2. If I specify a single file and "Read Multiple Nodes" to true all configured nodes will read 1 file with single readers? Y/N

3. If I specify a single file and "number of readers" > 1 then it will be read by one node with >1 readers? Y/N

Thanks again.
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Maybe. It depends where the files are and how many of them there are.

2. There is no "read multiple nodes" - there is "multiple readers per node". In this case, if you specify N readers per node for one sequential file, only one node gets used, and each reader on that node reads 1/N of the lines in the file.

3. See answer to 2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

ray.wurlod wrote:1. Maybe. It depends where the files are and how many of them there are.

2. There is no "read multiple nodes" - there is "multiple readers per node". In this case, if you specify N readers per node for one sequential file, only one node gets used, and each reader on that node reads 1/N of the lines in the file.

3. See answer to 2.
Hi Ray

In DS version 7.5x2 there is a property in the sequential file stage - "Read from Multiple Nodes" (Output Page -> properties -> options -> explanation: Set to Yes to read the file in sections from multiple nodes). If I choose YES for this property the "Multiple Readers per node" property disappears. Thats what I meant to ask in question 2.
Gary
"A journey of a thousand miles, begins with one step"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ah, read FROM multiple nodes.

In that case the answer to your question 2 is "yes", provided that all the nodes can see the large file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gagan8877
Premium Member
Premium Member
Posts: 77
Joined: Mon Jun 19, 2006 1:30 pm

Post by gagan8877 »

ray.wurlod wrote:Ah, read FROM multiple nodes.

In that case the answer to your question 2 is "yes", provided that all the nodes can see the large file.
Thanks Ray - u rock :)
Gary
"A journey of a thousand miles, begins with one step"
Post Reply