Logging lookup failures and aborting a job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Logging lookup failures and aborting a job

Post by evee1 »

My job is performing a lookup (logically, it can be done via lookup, job, or merge stage, no constraints here). If the lookup fails it is considered a fatal error and the batch, that this job is part of, should stop. Howevere, at the same time I would like to log the offending records (i.e. for which the lookup failed) into some reject table.

If I use a Lookup stage with a reject link then the failed records are send to a link, while the others flow through the job and are written to the database. I don't want that, primarily because it is a full dimension load and dropping records will cause unnecessary closure of valid dimension records.
On the other hand, any option setting that would cause the job to abort (like "Fail" in a Lokkup Failure constraint) will result in failure to log the records into the reject table (or if I choose the file, it will be empty).

My latest solution is to divide a job into two:
Job1 - does the join and after that in the transformer I send all the records for which the lookup failed to link that logs them into the reject table. All the records are send to a dataset.
Job2 - reads the rescords from a dataset and aborts on the first records that contains values that indicate that the lookup has failed for this records.

But I don't like this very much. Maybe anyone has other suggestions that would not involve splitting the job.
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Splitting the job is definitely good practise and very much advised in the Redbook "Infosphere DataStage Parallel Framework Standard Practises".

It is simply the best way of keeping DataStage from writing false records to a database and thereby removing the necessity to cleanup in case of job failure. However, if you do not implement a process that keeps DataStage from starting the job that loads the data to the database in case of rejects, dividing the job does not really make much sense.

I would consider to send rejected rows through an Aggregator, count the number of rejected rows and perform a lookup against the value 0 from a row-generator stage (with lookup-failure condition set to "fail").

This will cause the job to abort after all non-reject rows were processed and written to an intermediary dataset. If you put this check into a shared-container, you can very simply reuse it.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

Thans for your reply.
BI-RMA wrote:Splitting the job is definitely good practise and very much advised in the Redbook "Infosphere DataStage Parallel Framework Standard Practises".
I agree, however this should rather be done rather when the job is quite large or for reusability reasons. The redbook mentions 50 as approximate number of stages when the job split should be considered. In my case, the job is quite small and I'd rather keep the processing from source to target in one job. Of coursesome stages handling the reject processing can be moved to s shared container to be reused.
I would consider to send rejected rows through an Aggregator, count the number of rejected rows and perform a lookup against the value 0 from a row-generator stage (with lookup-failure condition set to "fail").
I have tried that, but it still somehow manages to insert some rows (9 out of 23) into the database, before it rejects.
My job design is as follows. Is this something you had in mind?

Code: Select all

                  Ref Table
                      |
                      |
                      V
Source Table ---> Lkp Stage ---------> ...... -------> Target Table
                   with Reject
                   Constraint
                      |
                      |
                      | reject link
                      |
                      V
                   Transformer             Row Generator
                    |      |               with 0 value
                    |      |                   | 
                    |      |                   | 
                    |      |                   |
                    V      V                   V
               Dataset    Aggregator -----> Lkp Stage -------> Copy (Sink)
                                            with Fail
                                            Constraint
Could these inserts be raletd to the fact that I am running the job in parallel on found nodes?
Target Table is Oracle connector, with Insert Write mode, and Transaction Record count set to 0.
This will cause the job to abort after all non-reject rows were processed and written to an intermediary dataset. If you put this check into a shared-container, you can very simply reuse it.
Yes, this is happening, i.e. the rejected row gets written into the Dataset.
I can even make it write into the reject database table (Oracle connector with Transaction Record count set to 1), but I'm not sure whether it is not circumstantial, especially with low number of reject records.
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

I have tried that, but it still somehow manages to insert some rows (9 out of 23) into the database, before it rejects.
Actually it seems to be the common problem. I found many posts on this forum where the target commit is only required when a job finished successfully but the records are still inserted into the target table, regardless of Transaction size set to 0.

I have also experimented with very high values of $APT_ORAUPSERT_COMMIT_TIME_INTERVAL (10000000) and
$APT_ORAUPSERT_COMMIT_ROW_INTERVAL (7200) as suggested in other posts on this forum, but it didn't help either :evil:.
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

Hmm, according to this thread my job may be correctly commiting the records into the target database.

I was assuming (probably incorrectly) that both - writing to the target and the lookup stage after the aggregator, that causes the job to fail - belong to one transaction. But it might not be the case and setting transaction level to 0 doesn't help. Am I correct?

Can someone shed some light on how the transaction boundaries are created in the job?
Post Reply