Query reg. Update records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Query reg. Update records

Post by chandra.shekhar@tcs.com »

Hi,
Plz answer my queries (though they are sillY :D )

1. DB2 Connector is my target stage, so whats the difference between "Insert then Update" and "Update then Insert" mode?
performance wise, logic wise etc.

2. One of my friend told me to keep "Record Count" = 1(i.e. commit after every record) while updating in the target?
Thanx and Regards,
ETL User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Recordcount=1 will not miss any updated records during the job abort.
Since it's getting committed after each and every update.
Thanks
pandeeswaran
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Insert then update tries to insert then, if that fails because the row exist, issues an update. Update then insert tries to update then, if that fails because the row does not exist, issues an insert. Choose the one that best meets your requirements - if there will be more inserts than updates prefer the former; if there will be more updates than inserts choose the latter. Making a wrong choice will carry a performance penalty.

Record Count governs commit behaviour. Ask your friend why that particular suggestion was made. It's not always going to be the best choice. Perhaps your friend has only a little knowledge. Does your friend also recommend always running parallel jobs in sequential mode?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@Ray,
Yes I am having more updates than my inserts. Thats why I chose Update then Insert mode. But the issue is in both modes I get different no. of records in my target table. Performance is not an issue because in either mode my job finishes in a minute.
That's why my friend suggested to limit Record Count to 1. This will ensure that no inserts will be skipped.
Right now I have kept both Record Count and Array Size = 2000.
Please suggest that why some of my inserts are getting skiiped.
Thanx and Regards,
ETL User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

I am curious why you are getting different counts update then insert mode and
Insert then update mode.
Both cases the count should be the same
pandeeswaran
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@Pandeesh
I am also trying to find the same thing, but no success till now.
Even my job is not complex too. A simple one-one mapping and finally writing in upsert mode.
But I am getting different counts in target. :(
Thanx and Regards,
ETL User
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

Just create 2 tables with minimum no of records so that we can get some idea.
Or else just explain how you are testing.
Thanks
pandeeswaran
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@Pandeesh

I am just checking the count in both source and target table.
Like count(distinct primary key) in Src should be equal to count(*) in Tgt.
Also in my job I am upserting my Tgt table with the latest record i.e. I can have multiple records of my PK from thr Src(no constraints in Src table) but the latest record of the PK(based upon a column seq_no) should go in the Tgt. To achieve this I am using a Sort stage and then a Remove Duplicate Stage.
Thanx and Regards,
ETL User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

pandeesh wrote:I am curious why you are getting different counts update then insert mode and Insert then update mode.
Both cases the count should be the same
Not true. The outcome of each is entirely dependant on the unique constraint that you may (or may not) on the record. Understand that the first action must fail for the second action to fire and the way inserts fail is if there is a unique constraint being violated.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

I got the solution,
Sorry guys, the issue was due to incorrrect data.
Thanx and Regards,
ETL User
Post Reply