Page 1 of 1

Error when using Dataset that is being appended to

Posted: Thu May 21, 2009 6:35 am
by droberts
Hi,

I am creating a dataset using OVEWRITE in 1 job and in a successor job, APPENDING to the same dataset. I have RCP turned off and the metadata definitions are identical (I and a colleague have checked). The schema definitions in DS management also match.

I run the sequence of Job 1 and Job 2 twice for separate business areas, so 2 datasets are created to be merged by Job 3:

Job 1 (creates dataset OVERWRITE)
|
Job 2 (appends to same dataset, schema / metadata identical to Job 1)
|
Job 3 (merges 2 identically formated datasets produced from Job 1 and Job 2 into 1 using a Funnel Stage. RCP is switched on so it can be re-used as we have 14 different subject areas to do this for. No metadata is included within the column definitions, I have checked.)


Job 3 - The APPENDED dataset is then used as input into a funnel stage job where 2 identically defined datasets are merged into 1. I am using a sequence and am using RCP to perform this.

I am doing this for 14 subject areas.

6 Work fine where I am not APPENDING - For datasets that are just being written using OVERWRITE with no need to APPEND (so no Job 2), the 3rd job using RCP to funnel the results of 2 datasets into 1 works fine.

However, if the input datasets of job 3 are APPENDED ones, the RCP Funnel job just hangs after reading a small number of recs (couple of hundred).

I tried re-running the hanging job, re-running Job 1 (OVERWRITE) and omitting Job 2 and it works fine so it is defintely something with regards to APPENDing to the dataset.

1) I have checked the metadata and the schema definitions within dataset management and they are identical both between business areas and between the OVERWRITE and APPEND jobs (job 1 and job2).
2) There are no error messages in the log. The last log entry shows:

main_program: Starting step execution


Has anyone else come across this issue? I can get around it by merging 2 datasets instead of using OVERWRITE and APPEND but using APPEND seems so basic with regards to a DS, I can't understand why it is causing an issue.

Many Thanks,

Daren

Posted: Thu May 21, 2009 7:12 am
by droberts
In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren

Posted: Thu May 21, 2009 7:37 am
by sanjay
Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren

Posted: Thu May 21, 2009 7:51 am
by droberts
sanjay wrote:Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
The data is populated into the identical target structure from separate sources, hence 2 separate jobs. You also cannot use a Dataset as input and Output in the same job, so that's why the funnel job is a 3rd (as well as it being generic and using RCP).

Posted: Thu May 21, 2009 11:54 am
by sanjay
daren

i really dont understand statment "You also cannot use a Dataset as input and Output in the same job"

u can use Dataset as input and Output in the same job

regrds
sanjay

droberts wrote:
sanjay wrote:Daren

not sure why u require 3 jobs .u can have 1 job with append option run in mutliple instance since all data are going in 1 dataset in 3rd job

Thanks
Sanjay
droberts wrote:In addition to the above I have amended Job 2 to write to a new (separate) dataset to Job 1, and then amended Job 3 to funnel these using RCP and it works OK.

This then proves that the metadata / schema from Job 1 and Job 2 are identical.

The APPEND of the DS is obviously causing some sort of issue, but I kinda knew that ;)

Just wanted to add that :)

Daren
The data is populated into the identical target structure from separate sources, hence 2 separate jobs. You also cannot use a Dataset as input and Output in the same job, so that's why the funnel job is a 3rd (as well as it being generic and using RCP).

Posted: Thu May 21, 2009 12:06 pm
by chulett
He means the same Data Set.

Posted: Thu May 21, 2009 2:25 pm
by droberts
chulett wrote:He means the same Data Set.
Thank you :D