Teradata Enterprise Stage Read

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Teradata Enterprise Stage Read

Post by trobinson »

I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.

Teradata Enterprise Stage --> Sequential stage

The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.

What would cause this job to be such a dog? Network? Teradata?

Is there something within DataStage that I can do to prove where the bottleneck is?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What kind of a selection/query are you doing? Can you duplicate the query in another tool of choice to see if the performance is as abyssmal?
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
nrevezzo
Participant
Posts: 15
Joined: Mon Sep 08, 2003 2:36 pm

Post by nrevezzo »

Why do you have 16 output files?
The Teradata Enterprise stage invokes FastExport which produces 1 answer set. All the parallel processing is performed on the Teradata Server not in DataStage.
For 1.4 M rows I've seen better throughput using the Teradata API stage.
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Good idea.
bteq from the command line is equally slow. So from this I think we've eliminated DataStage. Must be network or teradata itself.

Could it be anything else?
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

eliminating DS should do the trick.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Krazykoolrohit wrote:eliminating DS should do the trick.
lol - those words are as close to heresy as you can get in this forum.

To the original question - can you query on the Teradata server itself to elminate the network from the equation?
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Nervezzo - I don't see one answer set from FastExport. I see many processes, all delivering data. In my case, I see 16 Teradata Enterprise Stage UNIX processes, all with 15 sessions per player. This means 240 sessions on the Teradata side returning into 16 nodes. It is my belief that Teradata IS returning rows in parallel. To specifically answer your question: I don't think it is CPU or I/O bound on the ETL Server therefore I have defined as many nodes as I can closely fit to the number of amps (248). I did do some tests with 4 nodes and 62 sessions per player. Also with 10 and 20 sessions per player. The timing was about the same. BTW - 10 sessionsperplayer will fire up 24 Teradata Enterprise Stage UNIX processes (20/12) regardless of the nodes defined in the Configuration file as per the documentation.

I figured I would try to overwhelm the ETL Server by defining a 16 node Config.apt but it easily handled 16 nodes. I could define 248 nodes for that matter but that's a lot of cutting and pasting and at this point that ain't the problem.
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.

726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?
Krazykoolrohit
Charter Member
Charter Member
Posts: 560
Joined: Wed Jul 13, 2005 5:36 am
Location: Ohio

Post by Krazykoolrohit »

trobinson wrote:
Krazykoolrohit wrote:Are the indexes properly setup?
BTW how much speed you want to attain with it? for me its a pretty good speed
There is no where clause, it's a dump of a single table. I will verify that the table doesn't have issues by itself.

726 rows/sec from anywhere going straight to the bit bucket is acceptable performance for an EE job? Can this be?
I said I will be happy. Our DBA is a big idiot.. all i can get is a meagre 200
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I missed the 16 nodes, 15 sessions per player bit.

Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.

If you exceed this number, then many of your connections may simply be waiting for others to be freed.

For example, with 4 TPAs and 4 AMPs per TPA, the maximum sessions is 16. Try setting SessionsPerPlayer to 1 in this scenario. 16 x 1 = 16, exactly the right number.

For the non-Teradata folks, the acronyms are:
TPA = Trusted Parallel Application node
AMP = Access Module Processor
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pathmaker
Premium Member
Premium Member
Posts: 14
Joined: Wed Dec 07, 2005 7:37 pm

Re: Teradata Enterprise Stage Read

Post by pathmaker »

Always good practise to have double number of requestedsessions per node in config file and leve 2 sessions per player is good

so if you have 2x2 config file set your requestedsessions to 4 and optionally you can set sessionsperplayer to 2 as its defaulted to 2 (always)

you can set these on enterprise stage

when you do this you will get maximum throughput. in your case the reason for taking more time is because your job is repartitioning the data its reading.

-pratap
trobinson wrote:I have a job that reads 1.4 million rows into 81 columns, 1000 bytes per row from teradata. It is a simple SELECT from a single table. The db is on a separate machine within a 100 ft of the DS machine. The db has 248 amps. The job reads with 16 players, 15 sessions per player.
The output is to 16 Sequential files, one per node, defined as /dev/null.

Teradata Enterprise Stage --> Sequential stage

The job takes 30 minutes and attains a whopping 726 rows/sec.
The DS job is neither disk I/O or CPU bound as far as I can determine.

What would cause this job to be such a dog? Network? Teradata?

Is there something within DataStage that I can do to prove where the bottleneck is?
Thanks & Regards
Pratap Bhimireddy
trobinson
Participant
Posts: 208
Joined: Thu Apr 11, 2002 6:02 am
Location: Saint Louis
Contact:

Post by trobinson »

ray.wurlod wrote:
Can your Teradata installation accept (16 x 15) sessions? Maximum sessions is number of TPAs multiplied by number of AMPs per TPA.
It certainly appears able to. I see rows returned for all 16 nodes with minimal skew in the same amount of time per node.

I am marking this resolved! The problem was with the network interface card. They turned on the gigabit interface card on the ETL Server and the job went from 29 minutes 36 seconds to 1 minute and 55 seconds.
Post Reply