Page 1 of 1

Teradata load timings...

Posted: Wed Dec 01, 2004 3:21 pm
by lamphic
We are a divivsion of an Insurance company that is moving from UDB DB2 on UNIX to Teradata. This is because 2 Data Warehouse groups for 2 divisions are merging technologies. The other division has had Teradata but no Datastage. Our divsion has had Datastage for 4 years. The way that the other division has loaded data to the Teradata server is via MLoads, Bteqs, and Fast Loads running on the mainframe.

The other division is looking at the functionality of using DataStage to load directly to tables on Teradata, however they are thinking that FTPing data from the Unix box to a mainframe and then loading that data to Teradata from the mainframe is faster than having Datastage directly load the data to Teradata using Datastage's plug-ins.

Does anyone have experience with loading data via Datastage from a Unix box to a Teradata box? Does anyone have any thoughts that I might be able to use in a Proof of Concept I am working on?

We do not have parallel extender but could possibly install it if it would help increase the timing on these data loads to Teradata tables. A problem that we are facing on the Teradata server is that it will only allow 6 Multiloads to run at a time and we only have a certain window to run all of the multiloads we need to complete. That window is full, so we need to decrease the time it takes to run the loads.

Thank you

Posted: Wed Dec 01, 2004 4:07 pm
by T42
I do not know if EE have any direct loading mechanism for Teradata (I haven't used that database yet), so I can't really help you on this. (Basically, I do not know whether they install an EE client on the Teradata box itself, or use some global API call to it from the EE box.)

I would definitely recommend that you contact Ascential to get some numbers on this information. They do heavy research on this subject and should be able to provide you with some details.

Posted: Thu Dec 02, 2004 4:30 am
by l_homme_de_rio
PX and TD are great working together !!! it's probably the best configuration you can have. Performances are really impressive ! Obviously you will need some hints around PX/TD like timestamp support or # of multiload sessions.
Bteq (TD "PL/Sql") is really a nightmare to maintain.
PX is really THE "must have" conf within a TD environment
If you want more details ask me
All the best

Posted: Mon Dec 06, 2004 1:26 am
by tombastian
Hi,
Is Parallel Extender part of DataStage Enterprise Edition or Is it a seperate product?

Posted: Mon Dec 06, 2004 2:16 pm
by ray.wurlod
Parallel Extender is a part of DataStage Enterprise Edition. :D

Posted: Thu Dec 09, 2004 6:22 pm
by aartlett
Just got onto this thread, hope I'm not too late :).

We are using teradata backend (NCR unix) from a solaris box. We mix multiload (mload) and direct data/stage writes depending on volume.

Multiload has a 20 second startup time, if the time to load the number of records is < 20 secionds on direct load, then direct load is obviously the way to go. if > 20 seconds then mload is the way. We found our breakeven point to be 1000 input records for upsert handling (select, insert else update).

We are going across a dedicated GB link between the datastage box and the teradata box. I'm not sure why you would upload to a mainframe to load to teradata as I thought teradata only ran on win32 and unix.

Talk to the NCR/Teradata DBA's, find a good friendly one who won't BS you.

Another thing, as you are running to a distributed database (across a network) is to try and get a dedicated link, GB ifpossible, and create lookup sets if volumes aren't to high. We are creating hash lookups (server) and lookup files (PX) when the ratio of avg input records:target lookup table is 1:10 or less as this greatly speeds up the lookup process. We create the lookups in a seprate job run in the sequencer. Use T/D fastexport for PX, make sure the utility is loaded first (had THAT problem on a deployment weekend).

Any problems or further discussion: Feel free to ask


Andrew