Teradata Connector stage Performance Query

ravindras83 · Post by **ravindras83** » Mon Nov 07, 2011 10:24 am

Hi
I have a job that loads about 11 Million records into teradata target table. I am using teradata connector stage as both source and target. Job looks like below

Teradata Conn (Source) -----> Transforer -----> Teradata Connector (Target)

Target table structure is as below
Keycol1,
Keycol2,
Keycol3,
Keycol4,
Amount,
count,
date,
code1,
code2,
acctg_period
Primary index on Keycol1,Keycol2,Keycol3,Keycol4 (non partitioned)
Range Partition by acctg_period.

There are no other indexes.

Source has a query which inner joins Fact table (55 Million records) with Dimension (14 Million) this is then left outer joined with two views( which together contain about 45K records). The result set is grouped on the four key columns. There is simple CASE statement in the select clause to derive a column. There are no transformations except null handling.

The target table should be loaded with 11 Million records. There are no duplicates coming from source query.

When run from Teradata Sql Assistant query takes about 28 minutes to fetch the data.

Right now the job is taking 1 Hour 40 Mins to load 11 million data. I am using Array Size =4000 and Record count(commit Interval) = 40000.

The whole process can be done in sql script itself but for some reasons this is to be done using DS job. (Changes are expected that can not be done from sql).

I am not sure about how much time it should take to load this volumes.

Could you by any chance be able to tell whether this is a reasonable time taken?
If not kindly suggest any changes that may improve performance.

I have run the job with Teradata Multiload stage (Fastload option) but the job takes about 5 Hours for the same volume.
I kept the default options for limit settings in the stage.
I have never used Teradata stages before and so do not know if I am doing anything wrong.

Any help is greatly appreciated.

Thanks
Ravindra

PaulVL · Post by **PaulVL** » Mon Nov 07, 2011 11:51 am

Is it the same Teradata Server or a second box?

If it's the same box, can you do the work via a stored procedure and simply invoke that?

It would be a shame to have to extract 11M rows if you didn't need to.

ray.wurlod · Post by **ray.wurlod** » Mon Nov 07, 2011 3:14 pm

Replace the target stage temporarily with a Peek stage to isolate where the slowness may be.

ravindras83 · Post by **ravindras83** » Tue Nov 08, 2011 10:15 am

Thanks Ray
I ran the job with peek as output and job runs in 20 mins for tha same volume.
Job also runs fine when run only one months data (populating 1 million records) in about 7 mins.
It takes time when i run on the entire data.

ray.wurlod wrote:Replace the target stage temporarily with a Peek stage to isolate where the slowness may be. ...

ravindras83 · Post by **ravindras83** » Tue Nov 08, 2011 10:20 am

Thanks Paul.
It is the same server actually its the same db schema also.

But i have to change the job afterwords to include some external sources. Also more importantly I do not have enough expertise to do stored procedure.

PaulVL · Post by **PaulVL** » Tue Nov 08, 2011 3:18 pm

You've got a Teradata DBA in the house right?

cppwiz · Post by **cppwiz** » Tue Nov 08, 2011 3:43 pm

Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.

ravindras83 · Post by **ravindras83** » Sat Nov 12, 2011 9:09 am

Thanks for your help cppwiz. I did the changes for the Min Sessions and Max Sessions and now the job runs considerably quickly in about 35 minutes. I think I will take that considering it is only one time load. Thanks Again.

ravindras83 · Post by **ravindras83** » Sat Nov 12, 2011 9:13 am

PaulVL wrote:You've got a Teradata DBA in the house right?

Yes Paul I understand and agree your point. For the moment with the changes cppwiz suggested job is running in reasonable time. There is some time for this to go in QA and PROD. I will get hold of the DBA (which is going to be difficult ) and try that approach too.

pandeesh · Post by **pandeesh** » Sat Nov 12, 2011 11:29 am

cppwiz wrote:Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.

Hi,

Can you help me, how i can find how many AMPs i am having?

Thanks

pandeesh · Post by **pandeesh** » Sat Nov 12, 2011 11:34 am

ravindras83 wrote:Thanks for your help cppwiz. I did the changes for the Min Sessions and Max Sessions and now the job runs considerably quickly in about 35 minutes. I think I will take that considering it is only one time load. Thanks Again.

Hi, May i know in how many nodes configuration your job is running?
What's the value you have specified in Min and max sessions property?

Thanks