Page 1 of 1

Problems with scratch space while using JOIN stage in DS 8.5

Posted: Fri Nov 18, 2011 5:35 am
by kwak
Hi, to everyone!!!
I am newer in Dastage.

How to config scratch space when trying to join 2 table with big amount of data. Each table has more then 200 million records in it.

Config file:

Code: Select all

{
	node "node1"
	{
		fastname "srv-etl-dev"
		pools ""
		resource disk "C:/IBM/InformationServer/Server/Datasets/node1" {pools ""}
		resource scratchdisk "C:/IBM/InformationServer/Server/Scratch/node1" {pools ""}
	}
	node "node2"
	{
		fastname "srv-etl-dev"
		pools ""
		resource disk "C:/IBM/InformationServer/Server/Datasets/node2" {pools ""}
		resource scratchdisk "C:/IBM/InformationServer/Server/Scratch/node2" {pools ""}
	}
}
Configuration:
resource disk size : 500 gb
OS: Windows 2003 SP2 x64
Datastage version: 8.5
Job type: parallel

Using current configuration system has not got enought free space for tsort files in scratch directory.

Is it possible somehow to remove tsort operator?

Thank you!!!

Re: Problems with scratch space while using JOIN stage in DS

Posted: Fri Nov 18, 2011 5:54 am
by manoj_23sakthi
Hi,
You have to increase the scratch disk space..
Else You can achive the same join condition in database Query...
Before that try to remove unwanted datasets in unix path..
Thanks
Manoj kumar

Re: Problems with scratch space while using JOIN stage in DS

Posted: Fri Nov 18, 2011 6:24 am
by kwak
Hi,
Thank you for response!

And if there is no possibility to increase scratch space is there any other cases to obtain current issue? Can we somehow to configure job and tell to JOIN stage not to use tsort operator, because actually scratch directory contains tsort files.
Achiving join condition in data base query is possible when the tables from one db schema. In my situation tables from one schema, so i did it in one query like a work around, but when schemas are different then it does not work if we have not dblinks, synonyms, ...

Posted: Fri Nov 18, 2011 8:20 am
by chulett
The Join stage requires sorted data, so you can either let the job do the sorting for you or you can ensure the data arrives sorted. Since you don't seem to have the space to do the former, let the original database sources do the sorting by ordering the output properly. Then you can put a Sort stage before the Join and tell it Don't Sort, Already Sorted so the job knows you've pre-sorted the data.