Page 1 of 1

Inserts more records than the source has

Posted: Wed Mar 26, 2014 4:40 pm
by Abraham
Hi,

I hope someone could help me

I have a parallel job that extracts from a Oracle Table and inserts into Netezza Table
But if I put the query
Select count(*) from (
select field1,field2...fieldn from schema.table
)
then a click "view data" in the Stage "Oracle Connector"
The result is over 11 million
but if I execute the job using

select field1, field2... fieldn from schema.table

It inserts over 21 million into Netezza Table

I looked at the Director log and it hasn't any warnings
It looks like if the table had more records
"Number of rows fetched on the current node: 21138554. "
"Number of rows inserted: 21,138,554"

I asked to DBA and the schema.table has over 11 million like the count shows

I've tried the Oracle Connector and the Oracle Enterprise with same results

Help me, Please!!

Posted: Wed Mar 26, 2014 6:43 pm
by ray.wurlod
Welcome aboard.

What is the row count reported from the Oracle Connector stage (in the job log)?
Are you really selecting from the correct Oracle schema and table in your DataStage job? Double check, then triple check.

How many nodes are there in your parallel execution configuration file?

Posted: Wed Mar 26, 2014 7:03 pm
by Abraham
Hi Ray!
Thanks for your answer,

Yes.. its the same DB and schema and table
The log reports 21138554 records and its the same inserted in the target Netezza.

Here I put some of the Log lines
The connector was configured to run in parallel mode on 2 nodes, but the partitioned reads were not enabled. The connector will run in sequential mode.
Source: Number of rows fetched on the current node: 21138554.
Target: Number of rows inserted: 21,138,554


Maybe the reason can be, that is configured to run in parallel on 2 nodes?
Where can I configure that?

Regards

Posted: Thu Mar 27, 2014 4:40 am
by prasson_ibm
Abraham wrote:The connector was configured to run in parallel mode on 2 nodes, but the partitioned reads were not enabled. The connector will run in sequential mode.
Source: Number of rows fetched on the current node: 21138554.
Target: Number of rows inserted: 21,138,554
Hi,
Your source connector stage is running in sequential mode so running a job on 2 nodes is not an issue.
Please make sure that you are writing correct select query. Can you post your select query and the query you are using for count the number of rows.

Posted: Thu Mar 27, 2014 4:41 am
by prasson_ibm
You can create a Test job with source as oracle connector stage and target as peek stage.

write below select query in connector stage:-

Code: Select all

select count(*) as ROW_CNT from ( 
select field1,field2...fieldn from schema.table 
)


in coumn tab define ROW_CNT as bigint and check the result.

Posted: Thu Mar 27, 2014 6:50 am
by priyadarshikunal
Are you sure that you are connecting to same database from SQL Client and from DataStage?

Posted: Thu Mar 27, 2014 1:35 pm
by ray.wurlod
What count do you get running Test #3 or Test #5 through a non-DataStage client, such as TOAD?

Posted: Fri Mar 28, 2014 2:44 am
by thompsonp
Can you also ask the DBA to check what sql is actually being run by the Database to ensure your query is not being rewritten by the optimiser or some trigger, policy etc?

Posted: Mon Apr 14, 2014 10:11 am
by Abraham
Hi, finally we got the solution... it was a bug in the database.
Now my job loads the correct number of records

Thanks for your suggestions