Discrepancy In row counts

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
raj_konig
Participant
Posts: 67
Joined: Thu Dec 22, 2005 12:27 am

Discrepancy In row counts

Post by raj_konig »

Folks,

While loading data from source to target, i see discrepancy in row counts.

The no of records shown by the Designer after running the job are sometimes different when i queried against the database.

Mine is a direct source---> transformer----> target JOB.


the no of records in the table are less than the no of records shown by the designer. this happens only sometimes.

can anyone guess the reason behind this.

rajesh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

No need to guess - some of the rows are being rejected by the database.

Depending upon the database and stage type you are using you have various ways of capturing these rejects. Please look up the appropriate stage type in your online documentation and read up on how to implement error handling and logging.
sun rays
Charter Member
Charter Member
Posts: 57
Joined: Wed Jun 08, 2005 3:35 pm
Location: Denver, CO

Re: Discrepancy In row counts

Post by sun rays »

No need to guess - some of the rows are being rejected by the database.

Depending upon the database and stage type you are using you have various ways of capturing these rejects. Please look up the appropriate stage type in your online documentation and read up on how to implement error handling and logging.
Duplicates could be a possibility !

Rajesh,
Could you please tell us what is the update action you are using,
is it "insert only" or is it inserts and updates.

If you don't see any warnings logged, then it means no records are rejected. The only possibility I could see is , there might be some duplicates in your source data.
Even this situation should log warnings with unique constraint error, if you are using insert only option.
But if you are using insert or update then you dont see any warnings..
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Re: Discrepancy In row counts

Post by DeepakCorning »

In addition to what sun says I guess you wil be table to see the warnings of duplication if you reduce your commit size.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

sun rays : Thanks - I didn't even think of the duplicates; I just jumped to the assumption of rejected rows!
raj_konig
Participant
Posts: 67
Joined: Thu Dec 22, 2005 12:27 am

Post by raj_konig »

its just a simple insert i am not even applying any constraints i have data in flat file i am trying to insert into the database. i dont have any duplicates in my source.

if the records are being reject then this is because of some violation like unquie key violation or something. then in this case my job should get failed. but its a success.

my job once failed because of this error. after taking the unique records, i am able to run the job successfully but missing some records.

more over if i try to rerun the same job(truncate / insert) the no of rows being inserted into the target table varies from one load to another and the database count also varies.

here my source file is an mdb file. is this source file causing any confusion?

thanks,
rajesh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you write a quickie job that does a lookup on your table for all the rows you can get a list of records that weren't inserted (the lookup will fail for those rows). Can you discern any pattern in those record? If you re-run the whole thing again to the same records not get inserted? What happens if you do a manual SQL insert for those rows?
DeepakCorning
Premium Member
Premium Member
Posts: 503
Joined: Wed Jun 29, 2005 8:14 am

Post by DeepakCorning »

What is the update action you are using?? I think sun asked this but I am still clear on this.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

How do you know that you don't have duplicate records in your source? A duplicate record in your source would be defined by violating a unique constraint on the target table.

If you have the ability to create a table, you could make a copy of the target table changing your unique index to a nonunique index. Insert the records and then check your table to see if the are duplicates based upon the index.

select colum1FromIndex, column2FromIndex, count(1)
group by colum1FromIndex, column2FromIndex
having count(1) > 1

That should tell you. You could also be rejected rows because your source has null values, or because you are inserting a date in the wrong format. There are quite a few reasons why the row would be rejected and just one of them is duplicates.
raj_konig
Participant
Posts: 67
Joined: Thu Dec 22, 2005 12:27 am

Post by raj_konig »

I do check tht ArndW.

i am just inserting the records. "Insert rows without clearing".

but i am dropping and creating the table everytime i run the job.

rajesh
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Looking closer at your original post you described your job as simple:
Mine is a direct source---> transformer----> target JOB.
I have found one of the more difficult parts about any ETL project is not the logic behind the ETL (although with the PX engine it can become really complex), but rather the time you need to spend working with the source data. Profiling it and cleansing it. I assume now that the data is dirty versus clean because I rarely see clean source data.

Have you taken the time to profile the data? What are in each of the columns? Is there an unusual value there that is going to cause my target database to puke on this when I try to cram it in?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

raj_konig wrote:if the records are being reject then this is because of some violation like unquie key violation or something. then in this case my job should get failed. but its a success.
I'm curious how you are determining this. Hopefully not by simply checking that all the links are green when you run the job from the Designer. :? This is a common mistake I find people making, hence the question.

From the Director - What is in the log? Does the job finish with any warnings?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply