Job Optimization

abhilashnair · Post by **abhilashnair** » Mon Jun 01, 2015 4:58 am

I have a job which reads from complex flat file and loads to an empty Teradata table. I am using Teradata connector with Bulk Mode (Load Driver). The table is multiset table and allows duplicates. It also has primary index.

The incoming file is fixed width EBCDIC file with a row width of 7000 bytes. The file has 1600 fields in it. There are no transformations. It is a straight load to table. Currently the job takes 3 hours for 4 million rows.

Should I be content with the performance or can it be improved?

priyadarshikunal · Post by **priyadarshikunal** » Mon Jun 01, 2015 1:45 pm

There isn't any criteria that throughput of 8-9 GB per hour is good or bad. It depends on the available resources and server configuration. To be content with performance you should review the design and option selected to make sure you are utilizing the resources well and not putting any un-necessary overheads.

qt_ky · Post by **qt_ky** » Tue Jun 02, 2015 2:44 pm

That sounds slow. As a test, you could break a copy of the job into 2 parts and measure each part separately: 1) CFF to Data Set and 2) Data Set to Teradata. My guess would be part 1 is really fast and part 2 is really slow, then talking with a DBA may be in order.

priyadarshikunal · Post by **priyadarshikunal** » Thu Jun 04, 2015 12:22 am

That may sound slow but I have seen cases where development environment is very low spec and/or a lot of projects running on the same server ultilizing the same database and are very slow. Performance is not a criteria for test, just the correctness of data. Also the performance of the dev env should not be use a benchmark.

For my environment loading to Oracle from CFF with almost 1900 columns, its matter of minutes not hours. But you need to find out whether its a genuine configuration/design issue or resource issue.

abhilashnair · Post by **abhilashnair** » Wed Jun 10, 2015 1:07 am

i did the following changes in the job

Source Sequential File Stage ---> Read from Multiple Nodes Set to "Yes"

The teradata connector was running in Sequential mode. So I changed the setting for Parallel Syncronization to "Yes"

Also buffering mode was initially "Default" across all stages , which I changed to "Buffer"

These changes resulted in the job taking 1/3 rd of the time compared to previous run.