Teradata Enterprise Stage Read

trobinson · Post by **trobinson** » Wed Jan 31, 2007 10:56 am

I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.

Teradata Enterprise Stage --> Sequential stage

The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.

What would cause this job to be such a dog? Network? Teradata?

Is there something within DataStage that I can do to prove where the bottleneck is?

ArndW · Post by **ArndW** » Wed Jan 31, 2007 11:18 am

What kind of a selection/query are you doing? Can you duplicate the query in another tool of choice to see if the performance is as abyssmal?

Krazykoolrohit · Post by **Krazykoolrohit** » Wed Jan 31, 2007 11:21 am

Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed

nrevezzo · Post by **nrevezzo** » Wed Jan 31, 2007 12:01 pm

Why do you have 16 output files?
The Teradata Enterprise stage invokes FastExport which produces 1 answer set. All the parallel processing is performed on the Teradata Server not in DataStage.
For 1.4 M rows I've seen better throughput using the Teradata API stage.

trobinson · Post by **trobinson** » Wed Jan 31, 2007 12:05 pm

Good idea.
bteq from the command line is equally slow. So from this I think we've eliminated DataStage. Must be network or teradata itself.

Could it be anything else?

Krazykoolrohit · Post by **Krazykoolrohit** » Wed Jan 31, 2007 12:06 pm

eliminating DS should do the trick.

ArndW · Post by **ArndW** » Wed Jan 31, 2007 12:10 pm

Krazykoolrohit wrote:eliminating DS should do the trick.

lol - those words are as close to heresy as you can get in this forum.

To the original question - can you query on the Teradata server itself to elminate the network from the equation?

trobinson · Post by **trobinson** » Wed Jan 31, 2007 12:23 pm

Nervezzo - I don't see one answer set from FastExport. I see many processes, all delivering data. In my case, I see 16 Teradata Enterprise Stage UNIX processes, all with 15 sessions per player. This means 240 sessions on the Teradata side returning into 16 nodes. It is my belief that Teradata IS returning rows in parallel. To specifically answer your question: I don't think it is CPU or I/O bound on the ETL Server therefore I have defined as many nodes as I can closely fit to the number of amps (248). I did do some tests with 4 nodes and 62 sessions per player. Also with 10 and 20 sessions per player. The timing was about the same. BTW - 10 sessionsperplayer will fire up 24 Teradata Enterprise Stage UNIX processes (20/12) regardless of the nodes defined in the Configuration file as per the documentation.

I figured I would try to overwhelm the ETL Server by defining a 16 node Config.apt but it easily handled 16 nodes. I could define 248 nodes for that matter but that's a lot of cutting and pasting and at this point that ain't the problem.

trobinson · Post by **trobinson** » Wed Jan 31, 2007 12:33 pm

Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed

There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.

726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?

Krazykoolrohit · Post by **Krazykoolrohit** » Wed Jan 31, 2007 1:56 pm

trobinson wrote:
Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.

726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?

I said I will be happy. Our DBA is a big idiot.. all i can get is a meagre 200

ray.wurlod · Post by **ray.wurlod** » Wed Jan 31, 2007 4:16 pm

I missed the 16 nodes, 15 sessions per player bit.

Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.

If you exceed this number, then many of your connections may simply be waiting for others to be freed.

For example, with 4 TPAs and 4 AMPs per TPA, the maximum sessions is 16. Try setting SessionsPerPlayer to 1 in this scenario. 16 x 1 = 16, exactly the right number.

For the non-Teradata folks, the acronyms are:
TPA = Trusted Parallel Application node
AMP = Access Module Processor

pathmaker · Post by **pathmaker** » Sat Feb 03, 2007 8:01 am

Always good practise to have double number of requestedsessions per node in config file and leve 2 sessions per player is good

so if you have 2x2 config file set your requestedsessions to 4 and optionally you can set sessionsperplayer to 2 as its defaulted to 2 (always)

you can set these on enterprise stage

when you do this you will get maximum throughput. in your case the reason for taking more time is because your job is repartitioning the data its reading.

-pratap

trobinson wrote:I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.

Teradata Enterprise Stage --> Sequential stage

The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.

What would cause this job to be such a dog? Network? Teradata?

Is there something within DataStage that I can do to prove where the bottleneck is?

trobinson · Post by **trobinson** » Mon Feb 05, 2007 11:25 am

ray.wurlod wrote:
Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.

It certainly appears able to. I see rows returned for all 16 nodes with minimal skew in the same amount of time per node.

I am marking this resolved! The problem was with the network interface card. They turned on the gigabit interface card on the ETL Server and the job went from 29 minutes 36 seconds to 1 minute and 55 seconds.

DSXchange

Teradata Enterprise Stage Read

Teradata Enterprise Stage Read

Re: Teradata Enterprise Stage Read