Performance Impact on using ODBC to extract and load

nikhil_bhasin · Post by **nikhil_bhasin** » Wed Aug 29, 2012 4:54 am

Hi All,

We are planning to use ODBC connector stage for extraction and loading of the data from vertica database. We are using DS version 8.7.
My concern is that we will be extracting huge volume of data ~270 Million rows with around 30-40 columns, so performance of jobs would take a lot of hit. Can anyone suggest, how can improve the performance of the jobs for such a volume of data?

ray.wurlod · Post by **ray.wurlod** » Wed Aug 29, 2012 5:37 am

More partitions, provided that your current configuration is leaving spare capacity on the server engine(s).

PaulVL · Post by **PaulVL** » Wed Aug 29, 2012 8:30 am

Never heard of Vertica database.

On average, ODBC is a turtle, but if you parallelize the load, you can have a hurd of turtles moving your data.

Some large volume loads could be sped up if you FTP a load ready file to the target system and use existing bulk load tools native to that DBMS platform. It will complicate your job, but "might" improve the speed.

kwwilliams · Post by **kwwilliams** » Tue Sep 04, 2012 9:16 pm

Vertica is a columnar database which lends itself well to performing analytics, but its strength is not in loading and extracting large volumes of data. The database is going to work hard at reconstructing the rows that it has stored in columnar format (each unique value only stored one time physically with pointers being utilized to represent rows). With that in mind, I believe that the database is going to be your bottleneck. If I were you I would parallelize my query by using the ODBC Connector's built in partition read method.

Have fun experimenting with speed. Try the extracts without partition reads and then with the partition reads in place using modulus and minimum/maximum range.