Number of readers per node --

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Number of readers per node --

Post by Satwika »

Hi Every one

I have one parallel job designed as below :

File1 ---> Removeduplicate --> JOIN1 --->tnf- -> JOIN2------>Ouptput

File 2-->RmvDuplicate--------> JOIN1
File3 --> RmvDuplicate-------------------------------->JOIN2

Case 1 :

File1, file2, file3 are reading normally (not enabled the property 'number of records per node' )

I was getting no.of output records are 1000.

-------------------------------------------------------------
Case 2:
The property 'number of records per node' is enabled for all 3 files (File1, file2, File3).
I was getting no.of output records are 5000.

The jobs in the above two cases are replicas but only change in file property.
Can anyone please let me know why the difference in receiving number of output records.

Thansks & Regards
Satwika
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many parallel nodes are running when you execute this job and what partitioning algorithm have you chosen between your files and the remove duplicates stage. This might be your root problem.
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

We are running on two node configuration and using Hash partition with internal sorting in remove duplicate stage and 'auto' is using in both joins. (JOIN1, JOIN2)

The above configurations/logic is same in two jobs (cases) as mentioned in question.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What column are you hashing on and what column(s) are you using for your join?
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Hi ArndW

Remove Duplicate stage - hash partition with internal sort option - on the below columns :

Col1
Col2
Col3
Col4

Performing Inner Join on the below columns

Col1
Col2
Col3
Col5
Col6

i.e. hash performing on 4 key columns, and join performs on 5 columns in which 3 columns are common as shown above.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Try the hash on the first 1, 2 or 3 columns (must be identical for Left and Right links).
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

I have tried with the common columns ( Col1, Col2, Col3) by giving hash partition in join stage. ... :( which is not helpful.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Both input links must have identical hash columns and ordering as well as being sorted on the join key.

To keep things simple, hash both input links to the join on "Col1" and then sort both on Col1, Col2, Col3, Col5 and Col6.

Alternatively run the job with a 1-node configuration and see if the problems persist. If they do, then you have a problem with your sorting, if they don't then your issue is with partitioning.
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Thanks AndrW

The size of File3 is 6GB. is it cause problem ? Why i'm asking is- with small file size (Prepared testdata) , without doing any modifications, the data coming out correctly from join.

( File1 :- 2GB , File2 :- 2GB and File3 :- 6GB)
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

how many no. of readers per node can be declared at max. in a job ? Is there any limit ?
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Can anyone know this issue.... :?:
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Can anyone know this issue.... :?:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There is no limit. The GUI may limit the number you can enter.

However there are stupid values (for example 4000000 readers for 2000000 rows).

Think about the architecture. Sequential File stage uses the STREAMS I/O module under the cover so, even with only one reader per node, you are going to be reading data at a pretty fast rate.

More than one reader per node will not generally help, except for very, very large numbers of rows. Typically the consumer stages in the job will limit the speed at which rows can be processed, not the Sequential File stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Satwika
Participant
Posts: 45
Joined: Mon Jan 02, 2012 11:29 pm

Post by Satwika »

Thanks Ray..I Understood .. but my basic problem has not solved.
Can anyone face this type issue. ? :oops:
Please Refer my post.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Run your job in a 1-node configuration and see if the error remains. If it is still there with 1-node then your partitioning is not at the root of the problem.
Post Reply