Teradata Connector stage Performance Query
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Teradata Connector stage Performance Query
Hi
I have a job that loads about 11 Million records into teradata target table. I am using teradata connector stage as both source and target. Job looks like below
Teradata Conn (Source) -----> Transforer -----> Teradata Connector (Target)
Target table structure is as below
Keycol1,
Keycol2,
Keycol3,
Keycol4,
Amount,
count,
date,
code1,
code2,
acctg_period
Primary index on Keycol1,Keycol2,Keycol3,Keycol4 (non partitioned)
Range Partition by acctg_period.
There are no other indexes.
Source has a query which inner joins Fact table (55 Million records) with Dimension (14 Million) this is then left outer joined with two views( which together contain about 45K records). The result set is grouped on the four key columns. There is simple CASE statement in the select clause to derive a column. There are no transformations except null handling.
The target table should be loaded with 11 Million records. There are no duplicates coming from source query.
When run from Teradata Sql Assistant query takes about 28 minutes to fetch the data.
Right now the job is taking 1 Hour 40 Mins to load 11 million data. I am using Array Size =4000 and Record count(commit Interval) = 40000.
The whole process can be done in sql script itself but for some reasons this is to be done using DS job. (Changes are expected that can not be done from sql).
I am not sure about how much time it should take to load this volumes.
Could you by any chance be able to tell whether this is a reasonable time taken?
If not kindly suggest any changes that may improve performance.
I have run the job with Teradata Multiload stage (Fastload option) but the job takes about 5 Hours for the same volume.
I kept the default options for limit settings in the stage.
I have never used Teradata stages before and so do not know if I am doing anything wrong.
Any help is greatly appreciated.
Thanks
Ravindra
I have a job that loads about 11 Million records into teradata target table. I am using teradata connector stage as both source and target. Job looks like below
Teradata Conn (Source) -----> Transforer -----> Teradata Connector (Target)
Target table structure is as below
Keycol1,
Keycol2,
Keycol3,
Keycol4,
Amount,
count,
date,
code1,
code2,
acctg_period
Primary index on Keycol1,Keycol2,Keycol3,Keycol4 (non partitioned)
Range Partition by acctg_period.
There are no other indexes.
Source has a query which inner joins Fact table (55 Million records) with Dimension (14 Million) this is then left outer joined with two views( which together contain about 45K records). The result set is grouped on the four key columns. There is simple CASE statement in the select clause to derive a column. There are no transformations except null handling.
The target table should be loaded with 11 Million records. There are no duplicates coming from source query.
When run from Teradata Sql Assistant query takes about 28 minutes to fetch the data.
Right now the job is taking 1 Hour 40 Mins to load 11 million data. I am using Array Size =4000 and Record count(commit Interval) = 40000.
The whole process can be done in sql script itself but for some reasons this is to be done using DS job. (Changes are expected that can not be done from sql).
I am not sure about how much time it should take to load this volumes.
Could you by any chance be able to tell whether this is a reasonable time taken?
If not kindly suggest any changes that may improve performance.
I have run the job with Teradata Multiload stage (Fastload option) but the job takes about 5 Hours for the same volume.
I kept the default options for limit settings in the stage.
I have never used Teradata stages before and so do not know if I am doing anything wrong.
Any help is greatly appreciated.
Thanks
Ravindra
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Thanks Ray
I ran the job with peek as output and job runs in 20 mins for tha same volume.
Job also runs fine when run only one months data (populating 1 million records) in about 7 mins.
It takes time when i run on the entire data.
I ran the job with peek as output and job runs in 20 mins for tha same volume.
Job also runs fine when run only one months data (populating 1 million records) in about 7 mins.
It takes time when i run on the entire data.
ray.wurlod wrote:Replace the target stage temporarily with a Peek stage to isolate where the slowness may be. ...
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
-
- Participant
- Posts: 15
- Joined: Tue Sep 22, 2009 5:54 am
Yes Paul I understand and agree your point. For the moment with the changes cppwiz suggested job is running in reasonable time. There is some time for this to go in QA and PROD. I will get hold of the DBA (which is going to be difficult ) and try that approach too.PaulVL wrote:You've got a Teradata DBA in the house right?
Hi,cppwiz wrote:Have you tried increasing the number of sessions that your Teradata connection is using? Try increasing to one third of the number of AMP's. For example, if you have a system with 72 AMP's, you would start with 24 sessions. This is the Minimum Sessions setting in the stage. This will increase parallelism of the load.
Can you help me, how i can find how many AMPs i am having?
Thanks
pandeeswaran
Hi, May i know in how many nodes configuration your job is running?ravindras83 wrote:Thanks for your help cppwiz. I did the changes for the Min Sessions and Max Sessions and now the job runs considerably quickly in about 35 minutes. I think I will take that considering it is only one time load. Thanks Again.
What's the value you have specified in Min and max sessions property?
Thanks
pandeeswaran