Page 1 of 1

Disable Link in runtime

Posted: Wed Apr 30, 2014 10:15 am
by patelamit009
Hi All,

I was stuck in one of the job design and i had to split the job to serve the purpose. The initial design was as below.

Seqfile
|
|
\/
Transformer ----> Dataset
|
|
\/
Teradata Connector

Dataset - Overwrite
Teradata stage - Table is recreated
RCP is enabled in all stages.

In Trasformer stage i have given a flag to direct the records either to dataset or Teradata table. At any point of time, the records can traverse only in either of the link.

If the records are populating to the Teradata table, i'd like to disable the dataset link and i do not want the dataset to be created with 0 records. Similarily when the records are created in the dataset i do not want the Teradata link to be enabled and table should not be recreated.

At this point of time i had split the job into two. But, I'd like to know if above is achievable?

Note: I could not fine any related post in the forum. Please help me to refer if exist any.

Thanks

Posted: Wed Apr 30, 2014 11:04 am
by chulett
Cannot be done in a single job.

Posted: Wed Apr 30, 2014 7:41 pm
by ssnegi
You can create a file at runtime depending on the contraint in the transformer. This file can contain a flag for table or dataset with 0 records. Then in the AfterJob Subroutine you can read this file from unix and drop the table or remove the dataset depending on the flag. You can delete the flag file as well so that there is zero footprint.

Posted: Wed Apr 30, 2014 8:50 pm
by qt_ky
The process(es) for each stage will start up and run in parallel, whether or not a given link receives data or is constraint-disabled to not receive any data.

Splitting the job into two is a good solution. It leaves you with two clear job designs.

Posted: Wed Apr 30, 2014 9:29 pm
by chulett
Exactly. The only way to ensure that no dataset is created for zero records is to not run the dataset population job. Likewise, to ensure the table is not "recreated" for nothing you cannot run the job with that table as its target from an empty source. All of that happens before any data flows through.

Run the appropriate load job for the desired target. More general advice, check your source first and conditionally run the processing job only when there is data to process if empty targets are an issue for you.