Incorrect delete record count in job log

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Incorrect delete record count in job log

Post by saqib »

Hi,

I am working on an upgrade project from 8.5 to 9.1.2 version of DataStage.

job design is as below:
Oracle_connector1 --> transformer --> Oracle_connector2

Source and target table is same, lets call it table1.

Job description/purpose:
Based on run-time parameters, identify ID's from source table (table1) which needs to be deleted --> this part is achieved in source SQL (Oracle_connector1 )
Delete these records from table1 --> done in target SQL (Oracle_connector2)

Job run in 8.5:
- 1 ID identified from source SQL for deletion.
- There are 4 records of this ID present in table1.
- Thus all 4 records deleted from table1. This we verified by querying on table1.
- Job log shows 4 records deleted.

After the job run from 8.5 we reloaded the 4 deleted records in table1 (this is done because we do not have separate database for 9.1.2)

Job run in 9.1.2:
- 1 ID identified from source SQL for deletion.
- There are 4 records of this ID present in table1.
- Thus all 4 records deleted from table1. This we verified by querying on table1.
- Job log shows 1 record deleted.

As you can see, in both cases 4 records were actually deleted from table1 but job log shows different count.

Can anyone please help me in understanding this behavior in DataStage 9.1.2?

Do let me know if more information is required.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The language of the report could be improved. It is advising that one row was sent to the database server, and that the operation to be performed was DELETE. (The assumption is that the Key here is a unique primary key which, in your case, it is not.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Run this with a single node config file and see what the results are.
Mamu Kim
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

Tried below things:
1. Ran the job in 9.1.2 on single node
2. Ran the job with target connector stage in sequential mode

No luck with above two trials. Same result.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Clarify something for us so we stop making assumptions about exactly what it is you are doing. I understand your source and target are the same table. What I'm wondering is...

One "ID" needs to be deleted and there are four records associated with this ID. Does the job send 1 record to the target and the target deletes 4 records from the one DML operation?

Or are 4 records selected from the source and each one is individually sent to and deleted from the target?
-craig

"You can never have too many knives" -- Logan Nine Fingers
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

chulett wrote:Does the job send 1 record to the target and the target deletes 4 records from the one DML operation?
Yes, job sends 1 record to target. The source query is grouping data on the ID field and calculating the count. Output of my source query are two columns
E.g.
ID, count
1,4

Where count is the number of times ID is repeating in the table.
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

Tried one more thing with this job on 9.1.2
Identified 2 ID's from table1 which are present more than once and then modified my source SQL to return only these two ID's. So now the output from my source query is like this

ID,count
159,2
160,2

With this, job sends 2 records to DB for deletion and based on these ID's 4 records got deleted from table1 (as expected)
But again, job log shows 2 records deleted.

Wondering if it is something to do with the way DB communicates with DataStage for these stats and how would it be different in DS 8.5 and 9.1.2.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Technically 8.5 was broken and it has been corrected in 9.1. I know that's not the answer you are looking for but DataStage has always (as in from the earliest versions) reported the number of DML operations sent to the database. Meaning if you send one delete DML it logs it as one 'record' going to the target regardless of how many records it actually deletes as part of the request being fulfilled. This is what Ray was saying earlier.

One thing about Informatica that I appreciate it is shows 'applied' rows and 'affected' rows at the target. So yours would show 1 applied and 4 affected. Or I have large updates running today where 1 applied can affect millions of records and it captures that count.
-craig

"You can never have too many knives" -- Logan Nine Fingers
AshishDevassy
Participant
Posts: 32
Joined: Fri Nov 07, 2014 11:18 am

Post by AshishDevassy »

are you certain its not running on more than one node ?
Check with the DBA to see the queries executed by the datastage user ID.
that's wierd ?
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

chulett: Agreed. In 8.5, the performance statistics of this job showed 1 record being sent to DB (on the link between transformer and target oracle connector).
Applied and affected.. thats cool. Hope IBM incorporates somthing like this in DataStage.
Any idea on how would DB communicate the "affected" stats back to DatStage? like will it be through drivers or reading of log file or something else?
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

AshishDevassy: I ran the job using a single node config file. Is there anything else I can try to run the job on single node?
Regarding the queries executed by DS user ID, in both the job runs (8.5 and 9.1.2) 4 records are deleted from table1. This has been verified by running queries before and after job runs.
Problem is the count which is displayed in job log. 8.5 job log shows the actual (affected) number of records deleted from table whereas 9.1 job log (says number of records deleted: 1) shows the count as number of records sent to DB (applied).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

My understanding has always been that it is carried as part of the SQLCA or SQL Communication Area. Not really sure how (or if) one could get access to that in DataStage. Perhaps others do. Last time I recall actually taking advantage of that was way back in my Informix days.

ps. Don't really see how number of nodes has any bearing on this conversation. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
saqib
Participant
Posts: 14
Joined: Mon Jul 27, 2009 12:01 pm
Location: mumbai

Post by saqib »

Thanks for the SQLCA reference. Will check that out.
Post Reply