Method of partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Random just means that no effort is made to keep rows in a certain order. A single node system will still process rows serially. You need to make sure you're using a multiple node configuration arrangement as well as sufficient enough data to show a random dispersal during partitioning.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

Isn't the sequential file read in order if you do not use multiple readers?
Michael Gohl
igorbmartins
Participant
Posts: 161
Joined: Mon Mar 17, 2008 10:33 am

Post by igorbmartins »

Kenneth Bland I edited the file /home/dsadm/Ascential/DataStage/Configurations/default.apt

When I open this file for the first time he had the following content:{
node "node1"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}

Then I added 3 more nodes file then was this:
{
node "node1"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node2"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node3"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node4"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}

After the amendment of that file I restarted the server, runing the JOBs and noticed that nothing has changed the outcome remained the same.

Igor Bastos Martins
http://www.oportunidadesembi.com.br
Last edited by igorbmartins on Sun Jul 20, 2008 8:59 am, edited 1 time in total.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Good, now get more test data, maybe a million rows.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

Even with more rows, I think you will get the same behavior. The file is read in order, and sent down the random partitions. The auto collector eagerly reads any row that is ready from any partition. Since they are read in order, they will be ready in order.
Michael Gohl
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

mikegohl wrote:Since they are read in order, they will be ready in order.
If you have 4 independent processes handling row, couldn't you conceivably have one of those processes "randomly" get more rows than another? If that same process just so happens to get fewer cpu cycles than another, couldn't it fall behind a little bit in getting its rows out? Wouldn't that then show up in rows out of order? Conceivably, the more rows that you process the more likely that this will happen.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mikegohl
Premium Member
Premium Member
Posts: 97
Joined: Fri Jun 13, 2003 12:50 pm
Location: Chicago
Contact:

Post by mikegohl »

Ok, I agree. Without doing some kind of transformation, I doubt that the performance would be much different on any of the nodes.
Michael Gohl
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

mikegohl wrote:Ok, I agree. Without doing some kind of transformation, I doubt that the performance would be much different on any of the nodes.
Absolutely. If there were transformations occuring on a node, one node could get a row that takes longer to derive and another row on another node could conceivable race ahead. That would be in a case were all processes were started at exactly the same time and get exactly the same cpu cycles allocated in exactly the same manner. But, the operating system isn't that considerate and multiple independent processes get cpu time allocated in an ever escalating fashion. As processes need more time the OS will allocate higher priorities and more time. That it would do this in an unequal fashion isn't evident until more data flows thru.

Random doesn't mean different. You can randomly get rows sent to the same node, just like a hashed method can hash everything to the same node if a poor partition key is chosen, same with range. Only Round-Robin guarantees that ALL rows DON'T go to the same node.

But that is still all a separate discussion from forcing a randomizing effect.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
igorbmartins
Participant
Posts: 161
Joined: Mon Mar 17, 2008 10:33 am

Post by igorbmartins »

Friends I created a file containing 1.000.000 lines and the problem continues. I was in the Director and saw a message that log

*******************************************************************************

Transformer_1: Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing.

*******************************************************************************


Do you know what is wrong?
Last edited by igorbmartins on Fri Mar 28, 2008 11:45 am, edited 1 time in total.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Did you try searching on the error message :?:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Writing a sequential file necessarily can only use one node.

How many nodes are mentioned in your configuration file (whose pathname is set by APT_CONFIG_FILE environment variable)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
igorbmartins
Participant
Posts: 161
Joined: Mon Mar 17, 2008 10:33 am

Post by igorbmartins »

Friends I put here the Log JOB to help.

ataStage Report - Summary Log for job: Round_Robin_Peek
Produced on: 28/03/2008 14:55:34
Project: local Host system: 192.168.193.1
Items: 1 - 25
Sorted on: Date Sorter

Occurred: 16:42:17 On date: 27/03/2005 Type: Reset
Event: Log cleared by user

Occurred: 16:42:53 On date: 27/03/2005 Type: Control
Event: Starting Job Round_Robin_Peek. (...)

Occurred: 16:42:53 On date: 27/03/2005 Type: Info
Event: Environment variable settings: (...)

Occurred: 16:42:53 On date: 27/03/2005 Type: Info
Event: Parallel job initiated

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: Ascential DataStage(tm) Enterprise Edition 7.5.1A (...)

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: orchgeneral: loaded (...)

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: APT configuration file: /home/dsadm/Ascential/DataStage/Configurations/default.apt (...)

Occurred: 16:42:55 On date: 27/03/2005 Type: Warning
Event: Peek_11: When checking operator: Operator of type "APT_PeekOperator": will partition despite the (...)

Occurred: 16:42:55 On date: 27/03/2005 Type: Warning
Event: Sequential_File_2: When checking operator: A sequential operator cannot preserve the partitioning (...)

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 10 percent.

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 20 percent.

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 30 percent.

Occurred: 16:42:57 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 40 percent.

Occurred: 16:42:57 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 50 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 60 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 70 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 80 percent.

Occurred: 16:42:59 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 90 percent.

Occurred: 16:43:01 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Import complete; 1000000 records imported successfully, 0 rejected.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Peek_11,0: CODIGO: 0103479. COD2: 1. DSC:666666666 (...)

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Sequential_File_2,0: Export complete; 1000000 records exported successfully, 0 rejected.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: main_program: Step execution finished with status = OK.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: main_program: Startup time, 0:01; production run time, 0:11.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Parallel job reports successful completion

Occurred: 16:43:06 On date: 27/03/2005 Type: Control
Event: Finished Job Round_Robin_Peek.

End of report.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So what's the problem? All the rows were exported successfully.

The "will partition despite..." and "cannot preserve..." messages have been discussed at length on the forum, a simple search will find them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply