No records returned from Join & Sort stages

sohasaid · Post by **sohasaid** » Sat Dec 06, 2014 4:32 am

Hi,

I met with a strange behavior in one of my jobs. Attached is the job design, it extracts around 100 million records from one source and around 75 million records from another one, then trying to inner join them.

I was using Hash partitioning on the join keys and 'Stable Sort' option in the join stage itself, but data were not coming out from the join stage!! Then I replaced the stable sort option with a standalone sort Stage to enhance the sorting before join, but now data is not coming out of sort stage as well! (as you can see in job design image)

I know that the common thing between join & sort stages is the scratch file usages, the job already threw a scratch file error but it's gone now after increasing the scratch file space.

Job Design Image:

Appreciate your support.
Thanks

ray.wurlod · Post by **ray.wurlod** » Sat Dec 06, 2014 2:38 pm

What were the error messages in the job log? It may simply be that the job aborted before any rows were emitted by the Sort stage or Join stage.

sohasaid · Post by **sohasaid** » Sat Dec 06, 2014 4:44 pm

Hi Ray,

The error message was:

buffer(0),9: APT_BufferOperator: Add block to queue failed. This means that your buffer file systems all ran out of file space, or that some other system error occurred. Please ensure that you have sufficient scratchdisks in either the default or "buffer" pools on all nodes in your configuration file.

I've cleansed some files in the scratch files directory to secure extra space and the error has disappeared.

Currently there's no error message coming from this job, it extracts all data from source systems without database error, them keep on running without returning any records from the sort stages!

Note: in the snapshot, links are red because I've stopped it manually, not because a specific error message.

Thanks

rkashyap · Post by **rkashyap** » Sat Dec 06, 2014 8:33 pm

When job was stopped, what was the memory utilization and/or CPU usage of the PIDs associated with it? I feel that, that the job must have been sorting data when it was stopped. This job has multiple sort steps and a large no of records to be sorted, so it will take a long time to run.

If possible, sort data in database before extracting it and also set APT_SORT_INSERTION_CHECK_ONLY to true in properties to prevent DataStage from automatically inserting sort. If it is not possible to sort in database, then see following guidelines to tune the job.