Page 1 of 1

Posted: Wed Mar 26, 2008 2:06 pm
by kcbland
Random just means that no effort is made to keep rows in a certain order. A single node system will still process rows serially. You need to make sure you're using a multiple node configuration arrangement as well as sufficient enough data to show a random dispersal during partitioning.

Posted: Wed Mar 26, 2008 2:17 pm
by mikegohl
Isn't the sequential file read in order if you do not use multiple readers?

Posted: Wed Mar 26, 2008 2:24 pm
by igorbmartins
Kenneth Bland I edited the file /home/dsadm/Ascential/DataStage/Configurations/default.apt

When I open this file for the first time he had the following content:{
node "node1"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}

Then I added 3 more nodes file then was this:
{
node "node1"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node2"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node3"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}
{
node "node4"
{
fastname "dsee.estudo"
pools ""
resource disk "/home/dsadm/Ascential/DataStage/Datasets" {pools ""}
resource scratchdisk "/home/dsadm/Ascential/DataStage/Scratch" {pools ""}
}
}

After the amendment of that file I restarted the server, runing the JOBs and noticed that nothing has changed the outcome remained the same.

Igor Bastos Martins
http://www.oportunidadesembi.com.br

Posted: Wed Mar 26, 2008 2:53 pm
by kcbland
Good, now get more test data, maybe a million rows.

Posted: Wed Mar 26, 2008 2:58 pm
by mikegohl
Even with more rows, I think you will get the same behavior. The file is read in order, and sent down the random partitions. The auto collector eagerly reads any row that is ready from any partition. Since they are read in order, they will be ready in order.

Posted: Wed Mar 26, 2008 3:26 pm
by kcbland
mikegohl wrote:Since they are read in order, they will be ready in order.
If you have 4 independent processes handling row, couldn't you conceivably have one of those processes "randomly" get more rows than another? If that same process just so happens to get fewer cpu cycles than another, couldn't it fall behind a little bit in getting its rows out? Wouldn't that then show up in rows out of order? Conceivably, the more rows that you process the more likely that this will happen.

Posted: Wed Mar 26, 2008 3:41 pm
by mikegohl
Ok, I agree. Without doing some kind of transformation, I doubt that the performance would be much different on any of the nodes.

Posted: Wed Mar 26, 2008 3:59 pm
by kcbland
mikegohl wrote:Ok, I agree. Without doing some kind of transformation, I doubt that the performance would be much different on any of the nodes.
Absolutely. If there were transformations occuring on a node, one node could get a row that takes longer to derive and another row on another node could conceivable race ahead. That would be in a case were all processes were started at exactly the same time and get exactly the same cpu cycles allocated in exactly the same manner. But, the operating system isn't that considerate and multiple independent processes get cpu time allocated in an ever escalating fashion. As processes need more time the OS will allocate higher priorities and more time. That it would do this in an unequal fashion isn't evident until more data flows thru.

Random doesn't mean different. You can randomly get rows sent to the same node, just like a hashed method can hash everything to the same node if a poor partition key is chosen, same with range. Only Round-Robin guarantees that ALL rows DON'T go to the same node.

But that is still all a separate discussion from forcing a randomizing effect.

Posted: Thu Mar 27, 2008 7:45 am
by igorbmartins
Friends I created a file containing 1.000.000 lines and the problem continues. I was in the Director and saw a message that log

*******************************************************************************

Transformer_1: Input dataset 0 has a partitioning method other than entire specified; disabling memory sharing.

*******************************************************************************


Do you know what is wrong?

Posted: Thu Mar 27, 2008 8:02 am
by DSguru2B
Did you try searching on the error message :?:

Posted: Thu Mar 27, 2008 5:31 pm
by ray.wurlod
Writing a sequential file necessarily can only use one node.

How many nodes are mentioned in your configuration file (whose pathname is set by APT_CONFIG_FILE environment variable)?

Posted: Fri Mar 28, 2008 12:04 pm
by igorbmartins
Friends I put here the Log JOB to help.

ataStage Report - Summary Log for job: Round_Robin_Peek
Produced on: 28/03/2008 14:55:34
Project: local Host system: 192.168.193.1
Items: 1 - 25
Sorted on: Date Sorter

Occurred: 16:42:17 On date: 27/03/2005 Type: Reset
Event: Log cleared by user

Occurred: 16:42:53 On date: 27/03/2005 Type: Control
Event: Starting Job Round_Robin_Peek. (...)

Occurred: 16:42:53 On date: 27/03/2005 Type: Info
Event: Environment variable settings: (...)

Occurred: 16:42:53 On date: 27/03/2005 Type: Info
Event: Parallel job initiated

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: Ascential DataStage(tm) Enterprise Edition 7.5.1A (...)

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: orchgeneral: loaded (...)

Occurred: 16:42:54 On date: 27/03/2005 Type: Info
Event: main_program: APT configuration file: /home/dsadm/Ascential/DataStage/Configurations/default.apt (...)

Occurred: 16:42:55 On date: 27/03/2005 Type: Warning
Event: Peek_11: When checking operator: Operator of type "APT_PeekOperator": will partition despite the (...)

Occurred: 16:42:55 On date: 27/03/2005 Type: Warning
Event: Sequential_File_2: When checking operator: A sequential operator cannot preserve the partitioning (...)

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 10 percent.

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 20 percent.

Occurred: 16:42:56 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 30 percent.

Occurred: 16:42:57 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 40 percent.

Occurred: 16:42:57 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 50 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 60 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 70 percent.

Occurred: 16:42:58 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 80 percent.

Occurred: 16:42:59 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Progress: 90 percent.

Occurred: 16:43:01 On date: 27/03/2005 Type: Info
Event: ArqOri,0: Import complete; 1000000 records imported successfully, 0 rejected.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Peek_11,0: CODIGO: 0103479. COD2: 1. DSC:666666666 (...)

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Sequential_File_2,0: Export complete; 1000000 records exported successfully, 0 rejected.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: main_program: Step execution finished with status = OK.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: main_program: Startup time, 0:01; production run time, 0:11.

Occurred: 16:43:06 On date: 27/03/2005 Type: Info
Event: Parallel job reports successful completion

Occurred: 16:43:06 On date: 27/03/2005 Type: Control
Event: Finished Job Round_Robin_Peek.

End of report.

Posted: Fri Mar 28, 2008 5:02 pm
by ray.wurlod
So what's the problem? All the rows were exported successfully.

The "will partition despite..." and "cannot preserve..." messages have been discussed at length on the forum, a simple search will find them.