Problem with sorting 14M records
Moderators: chulett, rschirm, roy
Problem with sorting 14M records
Hi All,
I am using DS 6.0 with Parallel extender. I have around 14M records to transform. Stages involved are File-set, Transformer, Sort, lookup and sequential file in the same order. Looking at the log, when it reads around 40% of data, an error message comes for sort stage that: "Scratch space full". Any idea how to handle this problem?
thanks in advance.
Regards
Pinkesh
I am using DS 6.0 with Parallel extender. I have around 14M records to transform. Stages involved are File-set, Transformer, Sort, lookup and sequential file in the same order. Looking at the log, when it reads around 40% of data, an error message comes for sort stage that: "Scratch space full". Any idea how to handle this problem?
thanks in advance.
Regards
Pinkesh
-
- Premium Member
- Posts: 385
- Joined: Tue Oct 07, 2003 4:55 am
Hi,
There are 2 places you should look at; the sort temp directory - the one you configured in the sort stage properties and the UV temp dir from your DSENV file .
And another advice - instead of using the sort stage let the unix sort your file (using syncsort or whatever ), if you are reading from a sequential file use SORT in the filter command.
HTH,
Amos
There are 2 places you should look at; the sort temp directory - the one you configured in the sort stage properties and the UV temp dir from your DSENV file .
And another advice - instead of using the sort stage let the unix sort your file (using syncsort or whatever ), if you are reading from a sequential file use SORT in the filter command.
HTH,
Amos
Problem with sorting 14M records
Scratch Space Full means that the physical disk that the Scratchdisk is pointing to in your config file is filling up. You need to point your scratchdisk to a physical disk with more space.
- BP
- BP
-
- Participant
- Posts: 133
- Joined: Wed Mar 05, 2003 4:19 pm
- Location: Lima - Peru. Sudamerica
- Contact:
Re: Problem with sorting 14M records
Hi Pinkesh
You can use the unix sort is more eficient than the datastage sort ...
In the unix command you can redefine the temporary area sort with the option -T
> sort -T /home/temp ......
:D
You can use the unix sort is more eficient than the datastage sort ...
In the unix command you can redefine the temporary area sort with the option -T
> sort -T /home/temp ......
:D
Saludos,
Miguel Seclén
Lima - Peru
Miguel Seclén
Lima - Peru
Re: Problem with sorting 14M records
*sigh*
This is a PX issue. Server solutions does not resolve PX problems.
Now back to the "Stratch Disk Full"
You do know that you can point to multiple locations for the same node for your configuration file?
Are you even aware of your own configuration file? You should have something like this:
You can do something like this:
This file also handle how you do things in parallel for DataStage. The more nodes there are, the more DataStage throw up new processes for the same stages. Set correctly, your job will FLY.
Doing MPP? This is the same file you have to tweak. Doing SMP? Same file.
See Page 10-1 "The Parallel Extender Configuration File" on the DataStage Manager Guide online documentation which should be included on your DataStage Client installation.
As for the Sort efficiency -- 7.0.1 made a major improvement on performance for this, along with Lookup, and other issues. By designing your job to use the Unix prompt, you limit yourself to one CPU, and you also limit yourself to not taking advantage of this new version when you upgrade.
-T.J.
This is a PX issue. Server solutions does not resolve PX problems.
Now back to the "Stratch Disk Full"
You do know that you can point to multiple locations for the same node for your configuration file?
Are you even aware of your own configuration file? You should have something like this:
Code: Select all
{
node "node1"
{
fastname "mybigcomputer"
pools ""
resource disk "/mountA/Dataset" {pools ""}
resource scratchdisk "/mountA/Scratch" {pools ""}
}
}
Code: Select all
{
node "node1"
{
fastname "mybigcomputer"
pools ""
resource disk "/mountA/Dataset" {pools ""}
resource disk "/mountB/Dataset" {pools ""}
resource scratchdisk "/mountA/Scratch" {pools ""}
resource scratchdisk "/mountB/Scratch" {pools ""}
}
}
Doing MPP? This is the same file you have to tweak. Doing SMP? Same file.
See Page 10-1 "The Parallel Extender Configuration File" on the DataStage Manager Guide online documentation which should be included on your DataStage Client installation.
As for the Sort efficiency -- 7.0.1 made a major improvement on performance for this, along with Lookup, and other issues. By designing your job to use the Unix prompt, you limit yourself to one CPU, and you also limit yourself to not taking advantage of this new version when you upgrade.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Agree totally with *sigh*
Folks, this is why new posts require you to indicate whether the post is about server, parallel or mainframe. Please heed what's there!
Folks, this is why new posts require you to indicate whether the post is about server, parallel or mainframe. Please heed what's there!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.