Performance Impact on using ODBC to extract and load

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nikhil_bhasin
Participant
Posts: 50
Joined: Tue Jan 19, 2010 4:14 am

Performance Impact on using ODBC to extract and load

Post by nikhil_bhasin »

Hi All,

We are planning to use ODBC connector stage for extraction and loading of the data from vertica database. We are using DS version 8.7.
My concern is that we will be extracting huge volume of data ~270 Million rows with around 30-40 columns, so performance of jobs would take a lot of hit. Can anyone suggest, how can improve the performance of the jobs for such a volume of data?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

More partitions, provided that your current configuration is leaving spare capacity on the server engine(s).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Never heard of Vertica database.

On average, ODBC is a turtle, but if you parallelize the load, you can have a hurd of turtles moving your data.

Some large volume loads could be sped up if you FTP a load ready file to the target system and use existing bulk load tools native to that DBMS platform. It will complicate your job, but "might" improve the speed.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Vertica is a columnar database which lends itself well to performing analytics, but its strength is not in loading and extracting large volumes of data. The database is going to work hard at reconstructing the rows that it has stored in columnar format (each unique value only stored one time physically with pointers being utilized to represent rows). With that in mind, I believe that the database is going to be your bottleneck. If I were you I would parallelize my query by using the ODBC Connector's built in partition read method.

Have fun experimenting with speed. Try the extracts without partition reads and then with the partition reads in place using modulus and minimum/maximum range.
Post Reply