Discrepancy In row counts
Moderators: chulett, rschirm, roy
Discrepancy In row counts
Folks,
While loading data from source to target, i see discrepancy in row counts.
The no of records shown by the Designer after running the job are sometimes different when i queried against the database.
Mine is a direct source---> transformer----> target JOB.
the no of records in the table are less than the no of records shown by the designer. this happens only sometimes.
can anyone guess the reason behind this.
rajesh
While loading data from source to target, i see discrepancy in row counts.
The no of records shown by the Designer after running the job are sometimes different when i queried against the database.
Mine is a direct source---> transformer----> target JOB.
the no of records in the table are less than the no of records shown by the designer. this happens only sometimes.
can anyone guess the reason behind this.
rajesh
No need to guess - some of the rows are being rejected by the database.
Depending upon the database and stage type you are using you have various ways of capturing these rejects. Please look up the appropriate stage type in your online documentation and read up on how to implement error handling and logging.
Depending upon the database and stage type you are using you have various ways of capturing these rejects. Please look up the appropriate stage type in your online documentation and read up on how to implement error handling and logging.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Re: Discrepancy In row counts
Duplicates could be a possibility !No need to guess - some of the rows are being rejected by the database.
Depending upon the database and stage type you are using you have various ways of capturing these rejects. Please look up the appropriate stage type in your online documentation and read up on how to implement error handling and logging.
Rajesh,
Could you please tell us what is the update action you are using,
is it "insert only" or is it inserts and updates.
If you don't see any warnings logged, then it means no records are rejected. The only possibility I could see is , there might be some duplicates in your source data.
Even this situation should log warnings with unique constraint error, if you are using insert only option.
But if you are using insert or update then you dont see any warnings..
-
- Premium Member
- Posts: 503
- Joined: Wed Jun 29, 2005 8:14 am
Re: Discrepancy In row counts
In addition to what sun says I guess you wil be table to see the warnings of duplication if you reduce your commit size.
sun rays : Thanks - I didn't even think of the duplicates; I just jumped to the assumption of rejected rows!
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
its just a simple insert i am not even applying any constraints i have data in flat file i am trying to insert into the database. i dont have any duplicates in my source.
if the records are being reject then this is because of some violation like unquie key violation or something. then in this case my job should get failed. but its a success.
my job once failed because of this error. after taking the unique records, i am able to run the job successfully but missing some records.
more over if i try to rerun the same job(truncate / insert) the no of rows being inserted into the target table varies from one load to another and the database count also varies.
here my source file is an mdb file. is this source file causing any confusion?
thanks,
rajesh
if the records are being reject then this is because of some violation like unquie key violation or something. then in this case my job should get failed. but its a success.
my job once failed because of this error. after taking the unique records, i am able to run the job successfully but missing some records.
more over if i try to rerun the same job(truncate / insert) the no of rows being inserted into the target table varies from one load to another and the database count also varies.
here my source file is an mdb file. is this source file causing any confusion?
thanks,
rajesh
If you write a quickie job that does a lookup on your table for all the rows you can get a list of records that weren't inserted (the lookup will fail for those rows). Can you discern any pattern in those record? If you re-run the whole thing again to the same records not get inserted? What happens if you do a manual SQL insert for those rows?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 503
- Joined: Wed Jun 29, 2005 8:14 am
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
How do you know that you don't have duplicate records in your source? A duplicate record in your source would be defined by violating a unique constraint on the target table.
If you have the ability to create a table, you could make a copy of the target table changing your unique index to a nonunique index. Insert the records and then check your table to see if the are duplicates based upon the index.
select colum1FromIndex, column2FromIndex, count(1)
group by colum1FromIndex, column2FromIndex
having count(1) > 1
That should tell you. You could also be rejected rows because your source has null values, or because you are inserting a date in the wrong format. There are quite a few reasons why the row would be rejected and just one of them is duplicates.
If you have the ability to create a table, you could make a copy of the target table changing your unique index to a nonunique index. Insert the records and then check your table to see if the are duplicates based upon the index.
select colum1FromIndex, column2FromIndex, count(1)
group by colum1FromIndex, column2FromIndex
having count(1) > 1
That should tell you. You could also be rejected rows because your source has null values, or because you are inserting a date in the wrong format. There are quite a few reasons why the row would be rejected and just one of them is duplicates.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Looking closer at your original post you described your job as simple:
Have you taken the time to profile the data? What are in each of the columns? Is there an unusual value there that is going to cause my target database to puke on this when I try to cram it in?
I have found one of the more difficult parts about any ETL project is not the logic behind the ETL (although with the PX engine it can become really complex), but rather the time you need to spend working with the source data. Profiling it and cleansing it. I assume now that the data is dirty versus clean because I rarely see clean source data.Mine is a direct source---> transformer----> target JOB.
Have you taken the time to profile the data? What are in each of the columns? Is there an unusual value there that is going to cause my target database to puke on this when I try to cram it in?
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
I'm curious how you are determining this. Hopefully not by simply checking that all the links are green when you run the job from the Designer. This is a common mistake I find people making, hence the question.raj_konig wrote:if the records are being reject then this is because of some violation like unquie key violation or something. then in this case my job should get failed. but its a success.
From the Director - What is in the log? Does the job finish with any warnings?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers