FAST LOAD

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

FAST LOAD

Post by bapajju »

Hello All,

Has anyone used any FAST LOAD utility using Data Stage. Kindly let me know how do we go about it if we want to do a FAST LOAD instead of MULTILOAD.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Hi,
we are using both with DS on windows env. what kind of help do you need for these?
Regards,
Affan
"Questioning is Half Knowledge"
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

After talking with several Teradata DBA's (from NCR) the concensus was to do all multiload/fastload ectivities outside of datastage.

While the plugins (especially the API) work adequatly, they do not substitute for the real thang.

Data volume considerations are also important, as well as table locking. All Fastload, Muliload and Fastexport activities lock a table with a multiload lock. This denies all access to the table.

We found that due to the start/stop of multiload/fastload (20 seconds regardless of data volume) any upsert of less than 1000 records was best handled by the API, but that was over a dedicated gigabit link on Unix boxes.

If you have some good teradata DBA's on tap, talk to them. Most of them don't bite (well there was this guy Mark with a wicked sense of humour, but he's "mostly Harmless"). Get advise on the best way to configure the system. Don't let them bully you, but they can help.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

aartlett,
We are using DS for running Mloads and Fexps for TD and the approach we have adopted is that we are exporting the data in to a file instead of using NAMED PIPES and facing no problems as its behaviour is same as using TD Utilities and the only problem i see in running Utils outside DS is that if you want to integrate DS with MetaStage it will have no clue that whats going on and More DBA intervention would be required incase of job failure.

Correct me if i am wrong.
Regards,
Affan
"Questioning is Half Knowledge"
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Affan,
There are no right and wrong ways here. We developed an architecture which reflected the best practices from the DBA's and the knowledgable people here. Experience showed up some problems with M/Load and F/Export etc that we wanted to get around. The table locks applied by these utilities was one of them, as well as the locks left if the jobs failed.

If you are integratinf D/Stage and M/Stage then your approach is certainly a viable method.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Andrew,
If there are isseues can you share you best practices as it will help us a lot.
Regards,
Affan
"Questioning is Half Knowledge"
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Sorry i mean if there NO isuues then can you please share your best practices.
Regards,
Affan
"Questioning is Half Knowledge"
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Affan,
I was trying to get across that there are no Best (True?) practices, only ones that work in certain circumstances that people engineer.

The enviroment was an ETL box with 4 processors and a Teradata box with 4 nodes. After a bit (well lot) of pushing we arranged a dedicated 1GB link between the two boxes via a switch as the backbone of the lan was decidedly over crowded. This got us an automatic 10x on through-put. Never neglet your infrastructure :).

We decided to use Hash files for lookups where the input file was less than 10% of the table being looked up ... ie we dumped via the API the lookup table to a hash file. This also was done for PX jobs to lookup stages. Hash files were loaded to cache and we set the cache high enough for 99% of hashes to load. I have used this approach on other systems and it has always been good.

For loading tables we worked out (through a scientific method called trial and error) that approx 1000 records would upsert into our bigger tables in 20 seconds. As M/Load has a 20 second overhead (and then runs like a streak of lightning) we decided on upsert for <1000 records and M/Load for >1000. A certain amount of leeway was allowed, but not much as upserting did run to an expensive exercise.

Multiloads were done by scripts that were initaited by the shell script that ran our datastage jobs, not as an after job routine.

This may not work for you and your levels will be different on different hardware, but it made for a simple robust system.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
maffan76
Participant
Posts: 110
Joined: Tue Aug 23, 2005 5:27 am

Post by maffan76 »

Andrew,
Thanks a lot for sharing the experience as we are running into the same kind of problems and this will help us a lot as we have the same HW/SW config except that we are running DS on Windows 2000 Server.

Any HW/Infrastructure Recommendations you would like to give?
Regards,
Affan
"Questioning is Half Knowledge"
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

[quote="maffan76"]Any HW/Infrastructure Recommendations you would like to give?[/quote]

Only what I said before: A high speed, dedicated connection between your ETL and Terradata boxes.

If you using W2K then make sure you have plenty of RAM. If using new boxes make sure your Intels are hyperthreaded andmake sure your disks are as fast as you can get. These things make a difference.

Also with the restrictions that Windoze will give (compared to Unix or Linux) you'll need to be carefull not to overload your machines.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
bapajju
Participant
Posts: 82
Joined: Wed Nov 19, 2003 11:58 pm

Post by bapajju »

Thanks Guys. It's really helpful.
aartlett wrote:
maffan76 wrote:Any HW/Infrastructure Recommendations you would like to give?
Only what I said before: A high speed, dedicated connection between your ETL and Terradata boxes.

If you using W2K then make sure you have plenty of RAM. If using new boxes make sure your Intels are hyperthreaded andmake sure your disks are as fast as you can get. These things make a difference.

Also with the restrictions that Windoze will give (compared to Unix or Linux) you'll need to be carefull not to overload your machines.
Post Reply