Page 1 of 2

Load same teradata table 2 times in a single job

Posted: Tue Feb 15, 2011 9:00 am
by Vidyut
Hi

Is there any way that i can load a same teradata table 2 times in a single job....
The reason i need this is that i want to load duplicates in the table and if I use Mload for all the records than it effect the performance...
So what m trying to do is load the remaining data using the Teradata Enterprise Stage and duplicates using Teradata Mload....

Any Suggestion

Thanks in Advance

Posted: Tue Feb 15, 2011 9:06 am
by jwiles
Is there a constraint or other restriction which keeps you from loading all of the records using TD Enterprise? If the records are in separate streams and you must load the duplicates after the rest, one method could use a sequence funnel into a single TD Enterprise stage.

Or, you could use a sequence job as your single job point of view ;)

Regards,

Posted: Tue Feb 15, 2011 9:10 am
by Vidyut
Actually i was having data in a single stream only....but Teradata Enterprise Stage Silently drop all the duplicates....
Thats y i separated the data into two different streams....one for faster load (around 95% of data) and one for slower load(duplicate records)

Posted: Tue Feb 15, 2011 9:18 am
by Vidyut
One more issue I forgot to mention is that I want to avoid to create one different job for loading the duplicates bcoz I have hundreds of jobs & sequencers already designed having the same issue, so adding a new job is the last option..

Thanks

Posted: Tue Feb 15, 2011 10:15 am
by ShaneMuir
Do you actually need to load the duplicates? Or are they actually updates?

Is there a reject link on the teradata stage? If so it might be possible to reject out to a lookup stage and join your updates at that point, before updating those records via another teradata stage.

All the inserts should still happen, there will be no rejects off the teradata stage. On the lookup set the records not found condition to reject, and then using the records off the lookups reject link to perform the update. The lookup should act as a holding point until such time as all the inserts have finished.

Posted: Tue Feb 15, 2011 10:23 am
by jwiles
So the question comes back to: Does the table being loaded have a unique constraint of some sort (unique primary key, etc.)? TD Enterprise will be restricted to what constraints are in effect at the time of the load.

The TeraData Connector stage has an option to allow duplicates to be loaded.

Read the Connectivity Guide for TeraData databases. It's a PDF file included with the product and should be available through the Information Center installed on the Services Tier server and possibly on the client workstation as well.

Regards,

Posted: Tue Feb 15, 2011 11:20 am
by Vidyut
Thanks for the replies :)

I have millions of records and in which thr r some duplicates(few hundred)(complete duplicates i.e, all columns r same)
1) Issue with TD Enterprise Stage is it silently drops the duplicates
2) If i use TD Mload the performance is a big issue for me
3) If I use TD connector then again if i use fastload the it drops the duplicates silently or else the performance is the issue if use it otherwise

Thanks

Posted: Tue Feb 15, 2011 11:51 am
by jwiles
And you have yet to provide an answer to my question

Posted: Tue Feb 15, 2011 11:54 am
by Vidyut
Nopes...no key defined

Posted: Tue Feb 15, 2011 12:42 pm
by cwong
Suggestion is that you may google to read up about "Multiset" table in teradata.

Posted: Tue Feb 15, 2011 1:06 pm
by jwiles
Ok. It may be a limitation of the TD Enterprise stage. I don't see an option for allowing duplicate rows listed in the connectivity guide section on TD Enterprise. Maybe someone else on the forum has more information...I'm pretty much at my TD knowledge limit here :(

Regards,

Posted: Wed Feb 16, 2011 2:02 am
by Vidyut
Ok jwiles...
thanks all for ur inputs....
still trying to figure out a way

Cwong: We can insert using Mload but in my case the performance is a very big factor.....So what I am wondering if it is possible to load a same table 2 times in a single job using two different stages....Thanks

If any one has any more suggestion please post

Thanks

Posted: Thu Feb 17, 2011 2:33 am
by mansoor_nb
As you said the duplicate records are very minimal (in hundereds), identify these duplicate records and seperate it with the main stream. Using the link ordering execute the link which has only duplicate records and then the other one. Something like below.

Source ------> Identify Unique/Duplicates --- Link 1 -----> Load Duplicate
|
| Link 2
|
Load Unique records


By doing this, there are chances of getting a DB deadlocks, you need to handle the deadlocks

Or the other option is to have the file created and to have all the duplicate records at the end of the unique records. Write a BTEQ query to load the data into the TD environment.

Posted: Thu Feb 17, 2011 2:45 am
by Vidyut
Hi mansoor

The first option creates deadlock and I couldnt find a way out....thats y i posted this :)

Can u please explain a little abt the second option...Thanks in advance

Posted: Thu Feb 17, 2011 4:32 am
by ray.wurlod
Not "u", but "you". Not "abt", but "about".

U was a city in ancient Babylonia. Abt was an engineer who invented a particular style of cogged railway for steep ascents and descents.