Error when using Dataset that is being appended to

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
droberts
Premium Member
Premium Member
Posts: 38
Joined: Wed Apr 01, 2009 4:34 am
Location: UK

Error when using Dataset that is being appended to

Post by droberts »

Hi,

I am creating a dataset using OVEWRITE in 1 job and in a successor job, APPENDING to the same dataset. I have RCP turned off and the metadata definitions are identical (I and a colleague have checked). The schema definitions in DS management also match.

I run the sequence of Job 1 and Job 2 twice for separate business areas, so 2 datasets are created to be merged by Job 3:

Job 1 (creates dataset OVERWRITE)
|
Job 2 (appends to same dataset, schema / metadata identical to Job 1)
|
Job 3 (merges 2 identically formated datasets produced from Job 1 and Job 2 into 1 using a Funnel Stage. RCP is switched on so it can be re-used as we have 14 different subject areas to do this for. No metadata is included within the column definitions, I have checked.)


Job 3 - The APPENDED dataset is then used as input into a funnel stage job where 2 identically defined datasets are merged into 1. I am using a sequence and am using RCP to perform this.

I am doing this for 14 subject areas.

6 Work fine where I am not APPENDING - For datasets that are just being written using OVERWRITE with no need to APPEND (so no Job 2), the 3rd job using RCP to funnel the results of 2 datasets into 1 works fine.

However, if the input datasets of job 3 are APPENDED ones, the RCP Funnel job just hangs after reading a small number of recs (couple of hundred).

I tried re-running the hanging job, re-running Job 1 (OVERWRITE) and omitting Job 2 and it works fine so it is defintely something with regards to APPENDing to the dataset.

1) I have checked the metadata and the schema definitions within dataset management and they are identical both between business areas and between the OVERWRITE and APPEND jobs (job 1 and job2).
2) There are no error messages in the log. The last log entry shows:

main_program: Starting step execution


Has anyone else come across this issue? I can get around it by merging 2 datasets instead of using OVERWRITE and APPEND but using APPEND seems so basic with regards to a DS, I can't understand why it is causing an issue.

Many Thanks,

Daren
droberts
Premium Member
Premium Member
Posts: 38
Joined: Wed Apr 01, 2009 4:34 am
Location: UK

Post by droberts »

In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
droberts
Premium Member
Premium Member
Posts: 38
Joined: Wed Apr 01, 2009 4:34 am
Location: UK

Post by droberts »

sanjay wrote:Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
The data is populated into the identical target structure from separate sources, hence 2 separate jobs. You also cannot use a Dataset as input and Output in the same job, so that's why the funnel job is a 3rd (as well as it being generic and using RCP).
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

daren

i really dont understand statment "You also cannot use a Dataset as input and Output in the same job"

u can use Dataset as input and Output in the same job

regrds
sanjay

droberts wrote:
sanjay wrote:Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
The data is populated into the identical target structure from separate sources, hence 2 separate jobs. You also cannot use a Dataset as input and Output in the same job, so that's why the funnel job is a 3rd (as well as it being generic and using RCP).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

He means the same Data Set.
-craig

"You can never have too many knives" -- Logan Nine Fingers
droberts
Premium Member
Premium Member
Posts: 38
Joined: Wed Apr 01, 2009 4:34 am
Location: UK

Post by droberts »

chulett wrote:He means the same Data Set.
Thank you :D
Post Reply