Long running SQL server bulk load

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
JDionne
Participant
Posts: 342
Joined: Wed Aug 27, 2003 1:06 pm

Long running SQL server bulk load

Post by JDionne »

I have a job that bulk loads to a SQL server. The flat file is 8 million + rows. If i do this through SQL server it takes 20 mins. In DS its taken 14 Hours!!!! Why is it taking soo long to do this?
Jim
Sure I need help....But who dosent?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Because the DS bulk loader stages have performance issues in that they re-verify the data content while spooling the load file. The stages are meant to make life simple for those not experienced with bulk loaders.

If you have the skill to insure a proper load file, then skip using the Stage and just write out a sequential file. Then, initiate the bulk load from a script and be done with it.

Your runtime comparisons are right in line with what one would expect. You're discovering that the tool, in its efforts to be all encompassing and do all things, sacrifices performance in tragic ways sometimes.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Post by shawn_ramsey »

Ken "right in line" give me a break. Are you seriously suggesting that DataStage would actually only bulk load the data at 185 rows a second as apposed to the native bulk loader 6000+ rows a second? There is obviously something else going on here since we are loading some pretty wide tables at 2600 rows a second and some narrower ones near 9000 rows a second.

First of all there are two bulk load stages for SQL Server and you should use the SQL Bulk and not the BCP. I would also look at the parameters that you are running with and play around with the commit points.
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Shawn, let's be quite clear to everyone out there that DataStage does not bulk load data. Its bulk loader stages either invoke the respective database bulk loader command to perform the load, or write the necessary control files to use with the command line bulk loader.

In either case, DataStage spools the output link data into a sequential file that will be used by the command line bulk loader invocation operation. It is in the act of spooling that data that DataStage logic inserts itself and puts tremendous drag on that flow. This is where data is re-typed, scaled, in some cases the dates are messed with (internalized), etc. This drag is so significant that Ascential Technical Support has for years recommended to people to actually spool their data to another sequential file, and limit the spooling to the bulk loader stage to just one row. This way, you get the control file generated, but the data is in another file. You then do an after-job edit operation to the control file to point to the full data file, rather than the 1-row datafile generated by the bulk loader stage.

I made no assumptions regarding which stage was being used. I simply was pointing out that as a developer gets more experienced using DataStage and their familiarity with bulk loading increases, you will become dissatisfied with the things DataStage does and you will, in my experience, end up just spooling the data yourself and writing your own bulk loader scripts and directly invoke the process so that you can handle all the erroring, logging, and reporting yourself.

From a performance standpoint, when someone says that using DataStage jobs generic design just like the salesman showed them are 10-20X slower than if they did some manual intervention at key points and just blasted the data themselves, I say "Right on!". There are times to step outside the tool and do what seems right, as opposed to always staying within the tool. Hope this clarifies my point.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

And this is called "programming around the DataStage limitations" instead of "programming with DataStage".

And the more you work with the DataStage the more you realize there's an awful lot of things to work around. After a few years you sometimes just wonder why you even bother programming it via DataStage.

We can do it probably cheaper, faster (in design and execution) and more reliable with a good Perl framework. It's just that Ascential is so good in locking in customers :twisted:

The ranting for this evening is finished. :D

Ogmios
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

ogmios wrote: After a few years you sometimes just wonder why you even bother programming it via DataStage.
Now I did not even come close to suggesting this. I think a perspective must be kept and that not everything has to be inside the tool. Your toolbelt should have as many tools in it and your judgement should be exercised when you decide which tool(s) are utilized. You should emphasize the strengths of every tool in your belt, and accommodate the weaknesses.

However, as I've said in the past...The worst programmers can make the best tool look like crap. The worst tool can look great in the hands of a skilled programmer/architect.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

ROFL :D, you've got to admit that in some areas DataStage itself is not of the best quality.

I once overheard someone describe DataStage as an "overgrown PacBase for SQL with featuritis of which not all features work as they should".

But now we're getting very off-topic :wink: . E.g. About the bulk loaders, what prohibits Ascential to make the bulk-loader stages more as they should be? But they would rather implement some high tech new feature they can show off on their flyers.

Ogmios
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I shall reserve my comments for private communications. However, your term featuritis is what I call "shiny objects". They dazzle the eye and distract the beholder. Wow! Auto-column derivation wizard! Cool!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply