Teradata Connector stage Performance Query

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Teradata Connector stage Performance Query

Post by ravindras83 »

Hi
I have a job that loads about 11 Million records into teradata target table. I am using teradata connector stage as both source and target. Job looks like below

Teradata Conn (Source) -----> Transforer -----> Teradata Connector (Target)

Target table structure is as below
Keycol1,
Keycol2,
Keycol3,
Keycol4,
Amount,
count,
date,
code1,
code2,
acctg_period
Primary index on Keycol1,Keycol2,Keycol3,Keycol4 (non partitioned)
Range Partition by acctg_period.

There are no other indexes.

Source has a query which inner joins Fact table (55 Million records) with Dimension (14 Million) this is then left outer joined with two views( which together contain about 45K records). The result set is grouped on the four key columns. There is simple CASE statement in the select clause to derive a column. There are no transformations except null handling.

The target table should be loaded with 11 Million records. There are no duplicates coming from source query.

When run from Teradata Sql Assistant query takes about 28 minutes to fetch the data.

Right now the job is taking 1 Hour 40 Mins to load 11 million data. I am using Array Size =4000 and Record count(commit Interval) = 40000.

The whole process can be done in sql script itself but for some reasons this is to be done using DS job. (Changes are expected that can not be done from sql).

I am not sure about how much time it should take to load this volumes.

Could you by any chance be able to tell whether this is a reasonable time taken?
If not kindly suggest any changes that may improve performance.


I have run the job with Teradata Multiload stage (Fastload option) but the job takes about 5 Hours for the same volume.
I kept the default options for limit settings in the stage.
I have never used Teradata stages before and so do not know if I am doing anything wrong.

Any help is greatly appreciated.

Thanks
Ravindra
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Is it the same Teradata Server or a second box?

If it's the same box, can you do the work via a stored procedure and simply invoke that?


It would be a shame to have to extract 11M rows if you didn't need to.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Replace the target stage temporarily with a Peek stage to isolate where the slowness may be.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

Thanks Ray
I ran the job with peek as output and job runs in 20 mins for tha same volume.
Job also runs fine when run only one months data (populating 1 million records) in about 7 mins.
It takes time when i run on the entire data.
ray.wurlod wrote:Replace the target stage temporarily with a Peek stage to isolate where the slowness may be. ...
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

Thanks Paul.
It is the same server actually its the same db schema also.

But i have to change the job afterwords to include some external sources. Also more importantly I do not have enough expertise to do stored procedure.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

You've got a Teradata DBA in the house right?
cppwiz
Participant
Posts: 135
Joined: Tue Sep 04, 2007 11:27 am

Post by cppwiz »

Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

Thanks for your help cppwiz. I did the changes for the Min Sessions and Max Sessions and now the job runs considerably quickly in about 35 minutes. I think I will take that considering it is only one time load. Thanks Again.
ravindras83
Participant
Posts: 15
Joined: Tue Sep 22, 2009 5:54 am

Post by ravindras83 »

PaulVL wrote:You've got a Teradata DBA in the house right?
Yes Paul I understand and agree your point. For the moment with the changes cppwiz suggested job is running in reasonable time. There is some time for this to go in QA and PROD. I will get hold of the DBA (which is going to be difficult ) and try that approach too.
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

cppwiz wrote:Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.
Hi,

Can you help me, how i can find how many AMPs i am having?

Thanks
pandeeswaran
pandeesh
Premium Member
Premium Member
Posts: 1399
Joined: Sun Oct 24, 2010 5:15 am
Location: CHENNAI, TAMIL NADU

Post by pandeesh »

ravindras83 wrote:Thanks for your help cppwiz. I did the changes for the Min Sessions and Max Sessions and now the job runs considerably quickly in about 35 minutes. I think I will take that considering it is only one time load. Thanks Again.
Hi, May i know in how many nodes configuration your job is running?
What's the value you have specified in Min and max sessions property?

Thanks
pandeeswaran
Post Reply