Job Optimization

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Job Optimization

Post by abhilashnair »

I have a job which reads from complex flat file and loads to an empty Teradata table. I am using Teradata connector with Bulk Mode (Load Driver). The table is multiset table and allows duplicates. It also has primary index.

The incoming file is fixed width EBCDIC file with a row width of 7000 bytes. The file has 1600 fields in it. There are no transformations. It is a straight load to table. Currently the job takes 3 hours for 4 million rows.

Should I be content with the performance or can it be improved?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

There isn't any criteria that throughput of 8-9 GB per hour is good or bad. It depends on the available resources and server configuration. To be content with performance you should review the design and option selected to make sure you are utilizing the resources well and not putting any un-necessary overheads.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

That sounds slow. As a test, you could break a copy of the job into 2 parts and measure each part separately: 1) CFF to Data Set and 2) Data Set to Teradata. My guess would be part 1 is really fast and part 2 is really slow, then talking with a DBA may be in order.
Choose a job you love, and you will never have to work a day in your life. - Confucius
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

That may sound slow but I have seen cases where development environment is very low spec and/or a lot of projects running on the same server ultilizing the same database and are very slow. Performance is not a criteria for test, just the correctness of data. Also the performance of the dev env should not be use a benchmark.


For my environment loading to Oracle from CFF with almost 1900 columns, its matter of minutes not hours. But you need to find out whether its a genuine configuration/design issue or resource issue.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

i did the following changes in the job

Source Sequential File Stage ---> Read from Multiple Nodes Set to "Yes"

The teradata connector was running in Sequential mode. So I changed the setting for Parallel Syncronization to "Yes"

Also buffering mode was initially "Default" across all stages , which I changed to "Buffer"

These changes resulted in the job taking 1/3 rd of the time compared to previous run.
Post Reply