Problems with scratch space while using JOIN stage in DS 8.5

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kwak
Participant
Posts: 2
Joined: Fri Nov 18, 2011 5:09 am

Problems with scratch space while using JOIN stage in DS 8.5

Post by kwak »

Hi, to everyone!!!
I am newer in Dastage.

How to config scratch space when trying to join 2 table with big amount of data. Each table has more then 200 million records in it.

Config file:

Code: Select all

{
	node "node1"
	{
		fastname "srv-etl-dev"
		pools ""
		resource disk "C:/IBM/InformationServer/Server/Datasets/node1" {pools ""}
		resource scratchdisk "C:/IBM/InformationServer/Server/Scratch/node1" {pools ""}
	}
	node "node2"
	{
		fastname "srv-etl-dev"
		pools ""
		resource disk "C:/IBM/InformationServer/Server/Datasets/node2" {pools ""}
		resource scratchdisk "C:/IBM/InformationServer/Server/Scratch/node2" {pools ""}
	}
}
Configuration:
resource disk size : 500 gb
OS: Windows 2003 SP2 x64
Datastage version: 8.5
Job type: parallel

Using current configuration system has not got enought free space for tsort files in scratch directory.

Is it possible somehow to remove tsort operator?

Thank you!!!
Best regards, Kwak
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Re: Problems with scratch space while using JOIN stage in DS

Post by manoj_23sakthi »

Hi,
You have to increase the scratch disk space..
Else You can achive the same join condition in database Query...
Before that try to remove unwanted datasets in unix path..
Thanks
Manoj kumar
kwak
Participant
Posts: 2
Joined: Fri Nov 18, 2011 5:09 am

Re: Problems with scratch space while using JOIN stage in DS

Post by kwak »

Hi,
Thank you for response!

And if there is no possibility to increase scratch space is there any other cases to obtain current issue? Can we somehow to configure job and tell to JOIN stage not to use tsort operator, because actually scratch directory contains tsort files.
Achiving join condition in data base query is possible when the tables from one db schema. In my situation tables from one schema, so i did it in one query like a work around, but when schemas are different then it does not work if we have not dblinks, synonyms, ...
Best regards, Kwak
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The Join stage requires sorted data, so you can either let the job do the sorting for you or you can ensure the data arrives sorted. Since you don't seem to have the space to do the former, let the original database sources do the sorting by ordering the output properly. Then you can put a Sort stage before the Join and tell it Don't Sort, Already Sorted so the job knows you've pre-sorted the data.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply