Teradata Enterprise Stage Read
Moderators: chulett, rschirm, roy
Teradata Enterprise Stage Read
I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.
Teradata Enterprise Stage --> Sequential stage
The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.
What would cause this job to be such a dog? Network? Teradata?
Is there something within DataStage that I can do to prove where the bottleneck is?
The output is to 16 Sequential files, one per node, defined as /dev/null.
Teradata Enterprise Stage --> Sequential stage
The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.
What would cause this job to be such a dog? Network? Teradata?
Is there something within DataStage that I can do to prove where the bottleneck is?
What kind of a selection/query are you doing? Can you duplicate the query in another tool of choice to see if the performance is as abyssmal?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
lol - those words are as close to heresy as you can get in this forum.Krazykoolrohit wrote:eliminating DS should do the trick.
To the original question - can you query on the Teradata server itself to elminate the network from the equation?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Nervezzo - I don't see one answer set from FastExport. I see many processes, all delivering data. In my case, I see 16 Teradata Enterprise Stage UNIX processes, all with 15 sessions per player. This means 240 sessions on the Teradata side returning into 16 nodes. It is my belief that Teradata IS returning rows in parallel. To specifically answer your question: I don't think it is CPU or I/O bound on the ETL Server therefore I have defined as many nodes as I can closely fit to the number of amps (248). I did do some tests with 4 nodes and 62 sessions per player. Also with 10 and 20 sessions per player. The timing was about the same. BTW - 10 sessionsperplayer will fire up 24 Teradata Enterprise Stage UNIX processes (20/12) regardless of the nodes defined in the Configuration file as per the documentation.
I figured I would try to overwhelm the ETL Server by defining a 16 node Config.apt but it easily handled 16 nodes. I could define 248 nodes for that matter but that's a lot of cutting and pasting and at this point that ain't the problem.
I figured I would try to overwhelm the ETL Server by defining a 16 node Config.apt but it easily handled 16 nodes. I could define 248 nodes for that matter but that's a lot of cutting and pasting and at this point that ain't the problem.
There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?
-
- Charter Member
- Posts: 560
- Joined: Wed Jul 13, 2005 5:36 am
- Location: Ohio
I said I will be happy. Our DBA is a big idiot.. all i can get is a meagre 200trobinson wrote:There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I missed the 16 nodes, 15 sessions per player bit.
Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.
If you exceed this number, then many of your connections may simply be waiting for others to be freed.
For example, with 4 TPAs and 4 AMPs per TPA, the maximum sessions is 16. Try setting SessionsPerPlayer to 1 in this scenario. 16 x 1 = 16, exactly the right number.
For the non-Teradata folks, the acronyms are:
TPA = Trusted Parallel Application node
AMP = Access Module Processor
Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.
If you exceed this number, then many of your connections may simply be waiting for others to be freed.
For example, with 4 TPAs and 4 AMPs per TPA, the maximum sessions is 16. Try setting SessionsPerPlayer to 1 in this scenario. 16 x 1 = 16, exactly the right number.
For the non-Teradata folks, the acronyms are:
TPA = Trusted Parallel Application node
AMP = Access Module Processor
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Teradata Enterprise Stage Read
Always good practise to have double number of requestedsessions per node in config file and leve 2 sessions per player is good
so if you have 2x2 config file set your requestedsessions to 4 and optionally you can set sessionsperplayer to 2 as its defaulted to 2 (always)
you can set these on enterprise stage
when you do this you will get maximum throughput. in your case the reason for taking more time is because your job is repartitioning the data its reading.
-pratap
so if you have 2x2 config file set your requestedsessions to 4 and optionally you can set sessionsperplayer to 2 as its defaulted to 2 (always)
you can set these on enterprise stage
when you do this you will get maximum throughput. in your case the reason for taking more time is because your job is repartitioning the data its reading.
-pratap
trobinson wrote:I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.
Teradata Enterprise Stage --> Sequential stage
The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.
What would cause this job to be such a dog? Network? Teradata?
Is there something within DataStage that I can do to prove where the bottleneck is?
Thanks & Regards
Pratap Bhimireddy
Pratap Bhimireddy
It certainly appears able to. I see rows returned for all 16 nodes with minimal skew in the same amount of time per node.ray.wurlod wrote:
Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.
I am marking this resolved! The problem was with the network interface card. They turned on the gigabit interface card on the ETL Server and the job went from 29 minutes 36 seconds to 1 minute and 55 seconds.