We are a divivsion of an Insurance company that is moving from UDB DB2 on UNIX to Teradata. This is because 2 Data Warehouse groups for 2 divisions are merging technologies. The other division has had Teradata but no Datastage. Our divsion has had Datastage for 4 years. The way that the other division has loaded data to the Teradata server is via MLoads, Bteqs, and Fast Loads running on the mainframe.
The other division is looking at the functionality of using DataStage to load directly to tables on Teradata, however they are thinking that FTPing data from the Unix box to a mainframe and then loading that data to Teradata from the mainframe is faster than having Datastage directly load the data to Teradata using Datastage's plug-ins.
Does anyone have experience with loading data via Datastage from a Unix box to a Teradata box? Does anyone have any thoughts that I might be able to use in a Proof of Concept I am working on?
We do not have parallel extender but could possibly install it if it would help increase the timing on these data loads to Teradata tables. A problem that we are facing on the Teradata server is that it will only allow 6 Multiloads to run at a time and we only have a certain window to run all of the multiloads we need to complete. That window is full, so we need to decrease the time it takes to run the loads.
Thank you
Teradata load timings...
Moderators: chulett, rschirm, roy
Teradata load timings...
Specialist
Allied Information Systems
Data Warehouse Group
1100 Locust St Dept 0900
Des Moines, IA 50391-0900
phone (515) 508-4463
Allied Information Systems
Data Warehouse Group
1100 Locust St Dept 0900
Des Moines, IA 50391-0900
phone (515) 508-4463
I do not know if EE have any direct loading mechanism for Teradata (I haven't used that database yet), so I can't really help you on this. (Basically, I do not know whether they install an EE client on the Teradata box itself, or use some global API call to it from the EE box.)
I would definitely recommend that you contact Ascential to get some numbers on this information. They do heavy research on this subject and should be able to provide you with some details.
I would definitely recommend that you contact Ascential to get some numbers on this information. They do heavy research on this subject and should be able to provide you with some details.
-
- Participant
- Posts: 17
- Joined: Fri Jul 30, 2004 1:16 am
PX and TD are great working together !!! it's probably the best configuration you can have. Performances are really impressive ! Obviously you will need some hints around PX/TD like timestamp support or # of multiload sessions.
Bteq (TD "PL/Sql") is really a nightmare to maintain.
PX is really THE "must have" conf within a TD environment
If you want more details ask me
All the best
Bteq (TD "PL/Sql") is really a nightmare to maintain.
PX is really THE "must have" conf within a TD environment
If you want more details ask me
All the best
-
- Premium Member
- Posts: 41
- Joined: Fri Jun 04, 2004 5:52 am
- Location: Bangalore
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Just got onto this thread, hope I'm not too late :).
We are using teradata backend (NCR unix) from a solaris box. We mix multiload (mload) and direct data/stage writes depending on volume.
Multiload has a 20 second startup time, if the time to load the number of records is < 20 secionds on direct load, then direct load is obviously the way to go. if > 20 seconds then mload is the way. We found our breakeven point to be 1000 input records for upsert handling (select, insert else update).
We are going across a dedicated GB link between the datastage box and the teradata box. I'm not sure why you would upload to a mainframe to load to teradata as I thought teradata only ran on win32 and unix.
Talk to the NCR/Teradata DBA's, find a good friendly one who won't BS you.
Another thing, as you are running to a distributed database (across a network) is to try and get a dedicated link, GB ifpossible, and create lookup sets if volumes aren't to high. We are creating hash lookups (server) and lookup files (PX) when the ratio of avg input records:target lookup table is 1:10 or less as this greatly speeds up the lookup process. We create the lookups in a seprate job run in the sequencer. Use T/D fastexport for PX, make sure the utility is loaded first (had THAT problem on a deployment weekend).
Any problems or further discussion: Feel free to ask
Andrew
We are using teradata backend (NCR unix) from a solaris box. We mix multiload (mload) and direct data/stage writes depending on volume.
Multiload has a 20 second startup time, if the time to load the number of records is < 20 secionds on direct load, then direct load is obviously the way to go. if > 20 seconds then mload is the way. We found our breakeven point to be 1000 input records for upsert handling (select, insert else update).
We are going across a dedicated GB link between the datastage box and the teradata box. I'm not sure why you would upload to a mainframe to load to teradata as I thought teradata only ran on win32 and unix.
Talk to the NCR/Teradata DBA's, find a good friendly one who won't BS you.
Another thing, as you are running to a distributed database (across a network) is to try and get a dedicated link, GB ifpossible, and create lookup sets if volumes aren't to high. We are creating hash lookups (server) and lookup files (PX) when the ratio of avg input records:target lookup table is 1:10 or less as this greatly speeds up the lookup process. We create the lookups in a seprate job run in the sequencer. Use T/D fastexport for PX, make sure the utility is loaded first (had THAT problem on a deployment weekend).
Any problems or further discussion: Feel free to ask
Andrew