Load same teradata table 2 times in a single job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Load same teradata table 2 times in a single job

Post by Vidyut »

Hi

Is there any way that i can load a same teradata table 2 times in a single job....
The reason i need this is that i want to load duplicates in the table and if I use Mload for all the records than it effect the performance...
So what m trying to do is load the remaining data using the Teradata Enterprise Stage and duplicates using Teradata Mload....

Any Suggestion

Thanks in Advance
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is there a constraint or other restriction which keeps you from loading all of the records using TD Enterprise? If the records are in separate streams and you must load the duplicates after the rest, one method could use a sequence funnel into a single TD Enterprise stage.

Or, you could use a sequence job as your single job point of view ;)

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Actually i was having data in a single stream only....but Teradata Enterprise Stage Silently drop all the duplicates....
Thats y i separated the data into two different streams....one for faster load (around 95% of data) and one for slower load(duplicate records)
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

One more issue I forgot to mention is that I want to avoid to create one different job for loading the duplicates bcoz I have hundreds of jobs & sequencers already designed having the same issue, so adding a new job is the last option..

Thanks
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Do you actually need to load the duplicates? Or are they actually updates?

Is there a reject link on the teradata stage? If so it might be possible to reject out to a lookup stage and join your updates at that point, before updating those records via another teradata stage.

All the inserts should still happen, there will be no rejects off the teradata stage. On the lookup set the records not found condition to reject, and then using the records off the lookups reject link to perform the update. The lookup should act as a holding point until such time as all the inserts have finished.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

So the question comes back to: Does the table being loaded have a unique constraint of some sort (unique primary key, etc.)? TD Enterprise will be restricted to what constraints are in effect at the time of the load.

The TeraData Connector stage has an option to allow duplicates to be loaded.

Read the Connectivity Guide for TeraData databases. It's a PDF file included with the product and should be available through the Information Center installed on the Services Tier server and possibly on the client workstation as well.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Thanks for the replies :)

I have millions of records and in which thr r some duplicates(few hundred)(complete duplicates i.e, all columns r same)
1) Issue with TD Enterprise Stage is it silently drops the duplicates
2) If i use TD Mload the performance is a big issue for me
3) If I use TD connector then again if i use fastload the it drops the duplicates silently or else the performance is the issue if use it otherwise

Thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

And you have yet to provide an answer to my question
- james wiles


All generalizations are false, including this one - Mark Twain.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Nopes...no key defined
cwong
Charter Member
Charter Member
Posts: 26
Joined: Tue Apr 30, 2002 8:02 am
Location: Canada

Post by cwong »

Suggestion is that you may google to read up about "Multiset" table in teradata.
cwong
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Ok. It may be a limitation of the TD Enterprise stage. I don't see an option for allowing duplicate rows listed in the connectivity guide section on TD Enterprise. Maybe someone else on the forum has more information...I'm pretty much at my TD knowledge limit here :(

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Ok jwiles...
thanks all for ur inputs....
still trying to figure out a way

Cwong: We can insert using Mload but in my case the performance is a very big factor.....So what I am wondering if it is possible to load a same table 2 times in a single job using two different stages....Thanks

If any one has any more suggestion please post

Thanks
mansoor_nb
Participant
Posts: 48
Joined: Wed Jun 01, 2005 7:10 am

Post by mansoor_nb »

As you said the duplicate records are very minimal (in hundereds), identify these duplicate records and seperate it with the main stream. Using the link ordering execute the link which has only duplicate records and then the other one. Something like below.

Source ------> Identify Unique/Duplicates --- Link 1 -----> Load Duplicate
|
| Link 2
|
Load Unique records


By doing this, there are chances of getting a DB deadlocks, you need to handle the deadlocks

Or the other option is to have the file created and to have all the duplicate records at the end of the unique records. Write a BTEQ query to load the data into the TD environment.
Vidyut
Participant
Posts: 24
Joined: Wed Oct 13, 2010 12:45 am

Post by Vidyut »

Hi mansoor

The first option creates deadlock and I couldnt find a way out....thats y i posted this :)

Can u please explain a little abt the second option...Thanks in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not "u", but "you". Not "abt", but "about".

U was a city in ancient Babylonia. Abt was an engineer who invented a particular style of cogged railway for steep ascents and descents.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply