Inserts more records than the source has

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Abraham
Participant
Posts: 4
Joined: Wed Mar 26, 2014 2:22 pm
Location: México

Inserts more records than the source has

Post by Abraham »

Hi,

I hope someone could help me

I have a parallel job that extracts from a Oracle Table and inserts into Netezza Table
But if I put the query
Select count(*) from (
select field1,field2...fieldn from schema.table
)
then a click "view data" in the Stage "Oracle Connector"
The result is over 11 million
but if I execute the job using

select field1, field2... fieldn from schema.table

It inserts over 21 million into Netezza Table

I looked at the Director log and it hasn't any warnings
It looks like if the table had more records
"Number of rows fetched on the current node: 21138554. "
"Number of rows inserted: 21,138,554"

I asked to DBA and the schema.table has over 11 million like the count shows

I've tried the Oracle Connector and the Oracle Enterprise with same results

Help me, Please!!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

What is the row count reported from the Oracle Connector stage (in the job log)?
Are you really selecting from the correct Oracle schema and table in your DataStage job? Double check, then triple check.

How many nodes are there in your parallel execution configuration file?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Abraham
Participant
Posts: 4
Joined: Wed Mar 26, 2014 2:22 pm
Location: México

Post by Abraham »

Hi Ray!
Thanks for your answer,

Yes.. its the same DB and schema and table
The log reports 21138554 records and its the same inserted in the target Netezza.

Here I put some of the Log lines
The connector was configured to run in parallel mode on 2 nodes, but the partitioned reads were not enabled. The connector will run in sequential mode.
Source: Number of rows fetched on the current node: 21138554.
Target: Number of rows inserted: 21,138,554


Maybe the reason can be, that is configured to run in parallel on 2 nodes?
Where can I configure that?

Regards
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Abraham wrote:The connector was configured to run in parallel mode on 2 nodes, but the partitioned reads were not enabled. The connector will run in sequential mode.
Source: Number of rows fetched on the current node: 21138554.
Target: Number of rows inserted: 21,138,554
Hi,
Your source connector stage is running in sequential mode so running a job on 2 nodes is not an issue.
Please make sure that you are writing correct select query. Can you post your select query and the query you are using for count the number of rows.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

You can create a Test job with source as oracle connector stage and target as peek stage.

write below select query in connector stage:-

Code: Select all

select count(*) as ROW_CNT from ( 
select field1,field2...fieldn from schema.table 
)


in coumn tab define ROW_CNT as bigint and check the result.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

Are you sure that you are connecting to same database from SQL Client and from DataStage?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What count do you get running Test #3 or Test #5 through a non-DataStage client, such as TOAD?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Can you also ask the DBA to check what sql is actually being run by the Database to ensure your query is not being rewritten by the optimiser or some trigger, policy etc?
Abraham
Participant
Posts: 4
Joined: Wed Mar 26, 2014 2:22 pm
Location: México

Post by Abraham »

Hi, finally we got the solution... it was a bug in the database.
Now my job loads the correct number of records

Thanks for your suggestions
Post Reply