FAST LOAD

bapajju · Post by **bapajju** » Thu Sep 15, 2005 11:47 am

Hello All,

Has anyone used any FAST LOAD utility using Data Stage. Kindly let me know how do we go about it if we want to do a FAST LOAD instead of MULTILOAD.

maffan76 · Post by **maffan76** » Sun Sep 18, 2005 6:02 am

Hi,
we are using both with DS on windows env. what kind of help do you need for these?

aartlett · Post by **aartlett** » Sun Sep 18, 2005 7:17 pm

After talking with several Teradata DBA's (from NCR) the concensus was to do all multiload/fastload ectivities outside of datastage.

While the plugins (especially the API) work adequatly, they do not substitute for the real thang.

Data volume considerations are also important, as well as table locking. All Fastload, Muliload and Fastexport activities lock a table with a multiload lock. This denies all access to the table.

We found that due to the start/stop of multiload/fastload (20 seconds regardless of data volume) any upsert of less than 1000 records was best handled by the API, but that was over a dedicated gigabit link on Unix boxes.

If you have some good teradata DBA's on tap, talk to them. Most of them don't bite (well there was this guy Mark with a wicked sense of humour, but he's "mostly Harmless"). Get advise on the best way to configure the system. Don't let them bully you, but they can help.

maffan76 · Post by **maffan76** » Mon Sep 19, 2005 5:23 am

aartlett,
We are using DS for running Mloads and Fexps for TD and the approach we have adopted is that we are exporting the data in to a file instead of using NAMED PIPES and facing no problems as its behaviour is same as using TD Utilities and the only problem i see in running Utils outside DS is that if you want to integrate DS with MetaStage it will have no clue that whats going on and More DBA intervention would be required incase of job failure.

Correct me if i am wrong.

aartlett · Post by **aartlett** » Mon Sep 19, 2005 6:12 am

Affan,
There are no right and wrong ways here. We developed an architecture which reflected the best practices from the DBA's and the knowledgable people here. Experience showed up some problems with M/Load and F/Export etc that we wanted to get around. The table locks applied by these utilities was one of them, as well as the locks left if the jobs failed.

If you are integratinf D/Stage and M/Stage then your approach is certainly a viable method.

maffan76 · Post by **maffan76** » Mon Sep 19, 2005 6:18 am

Andrew,
If there are isseues can you share you best practices as it will help us a lot.

maffan76 · Post by **maffan76** » Mon Sep 19, 2005 6:19 am

Sorry i mean if there NO isuues then can you please share your best practices.

aartlett · Post by **aartlett** » Mon Sep 19, 2005 5:44 pm

Affan,
I was trying to get across that there are no Best (True?) practices, only ones that work in certain circumstances that people engineer.

The enviroment was an ETL box with 4 processors and a Teradata box with 4 nodes. After a bit (well lot) of pushing we arranged a dedicated 1GB link between the two boxes via a switch as the backbone of the lan was decidedly over crowded. This got us an automatic 10x on through-put. Never neglet your infrastructure

.

We decided to use Hash files for lookups where the input file was less than 10% of the table being looked up ... ie we dumped via the API the lookup table to a hash file. This also was done for PX jobs to lookup stages. Hash files were loaded to cache and we set the cache high enough for 99% of hashes to load. I have used this approach on other systems and it has always been good.

For loading tables we worked out (through a scientific method called trial and error) that approx 1000 records would upsert into our bigger tables in 20 seconds. As M/Load has a 20 second overhead (and then runs like a streak of lightning) we decided on upsert for <1000 records and M/Load for >1000. A certain amount of leeway was allowed, but not much as upserting did run to an expensive exercise.

Multiloads were done by scripts that were initaited by the shell script that ran our datastage jobs, not as an after job routine.

This may not work for you and your levels will be different on different hardware, but it made for a simple robust system.

maffan76 · Post by **maffan76** » Tue Sep 20, 2005 3:29 am

Andrew,
Thanks a lot for sharing the experience as we are running into the same kind of problems and this will help us a lot as we have the same HW/SW config except that we are running DS on Windows 2000 Server.

Any HW/Infrastructure Recommendations you would like to give?

aartlett · Post by **aartlett** » Tue Sep 20, 2005 6:51 pm

[quote="maffan76"]Any HW/Infrastructure Recommendations you would like to give?[/quote]

Only what I said before: A high speed, dedicated connection between your ETL and Terradata boxes.

If you using W2K then make sure you have plenty of RAM. If using new boxes make sure your Intels are hyperthreaded andmake sure your disks are as fast as you can get. These things make a difference.

Also with the restrictions that Windoze will give (compared to Unix or Linux) you'll need to be carefull not to overload your machines.

bapajju · Post by **bapajju** » Wed Sep 21, 2005 4:24 pm

Thanks Guys. It's really helpful.

aartlett wrote:
maffan76 wrote:Any HW/Infrastructure Recommendations you would like to give?
Only what I said before: A high speed, dedicated connection between your ETL and Terradata boxes.

If you using W2K then make sure you have plenty of RAM. If using new boxes make sure your Intels are hyperthreaded andmake sure your disks are as fast as you can get. These things make a difference.

Also with the restrictions that Windoze will give (compared to Unix or Linux) you'll need to be carefull not to overload your machines.