Does the link collector option improve performance?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Does the link collector option improve performance?

Post by ak77 »

Hi everybody,

Guess its a better idea to become a charter member as most of the replies doesn't make sense without that.

I am reading from a join of two tables. This data is looked up against another hash table. Am using a Link partioner and Link Collector. Finally am updating to a table.

Does the link partioner and link collector improve the performance in this situation?

My transaction handling size is 5000

Before I had four links updating the table with the same transaction size
My job took like 7 hrs to process 3 Million records

So I am using this link collector option. But the job seems to have hung
The number of records processed is same in the Monitor and the designer.
One more thing is that that the performance statistics in the designer and the monitor are not same.

Thanks for the reply
Kishan
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Link partitioners and collectors usually are a method for performance improvement, but dealing with varying logic paths that need to separate and recombine.

Consider breaking your design down into Extraction, Transform, and Load specific jobs. You'll see then how long spooling source data requires, how long business rule application, and then how long just pure inserts and updates require. Consider separating inserts and updates so that you can potentially take advantage of bulk loading.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Am not able to read the rest of the message
I dono when my chartered account will be created

I did a ps-ef|grep jobname

ps -ef|grep jobname
ak77 106358 379828 0 11:31:01 - 0:03 phantom DSD.StageRun jobname. jobname.Transformer_106 2 0/5
0
ak77 198758 1088320 0 11:31:01 - 0:04 phantom DSD.StageRun jobname. jobname.Link_Collector_129 2
0/50
ak77 242974 379828 0 11:31:01 - 0:03 phantom DSD.StageRun jobname. jobname.Transformer_107 2 0/5
0
ak77 378982 379828 0 11:31:01 - 0:03 phantom DSD.StageRun jobname. jobname.Transformer_108 2 0/5
0
ak77 379828 1086594 0 11:31:00 - 0:03 phantom DSD.StageRun jobname. jobname.lpr 2 0/50
ak77 381382 8178 1 13:55:43 pts/3 0:00 grep jobname
ak77 1086594 383472 0 11:30:34 - 0:05 phantom DSD.RUN jobname 0/50 SrcDB=DBName SrcUsrID=User SrcPswd=
L<;@@KV1<9C>06EID9JE2K96BL< TempFileDirectory=/etl_data/tmp
ak77 1088320 379828 0 11:31:00 - 0:03 phantom DSD.StageRun jobname. jobname.tnxfrmJoinTables 2 0/
50

What does this mean?

The performance characteristics in the director and the stats in Monitor has been same for the past 2 hrs. Is the job running?

Kishan
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You can click on the graphics text "continue trial" to read the full message.

Your extract of the "ps -ef" doesn't mean anything in particular, it shows some DataStage processes running, but nothing that has been using CPU for a long time. Can you have your DBA check your connection to the database to see if it is still active?
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Thank you very much

I will try to do something like that
Write it to a sequential file and update in a different job

Kishan
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Thanks a lot

I did this in two steps
First selecting from the table and writing to a Seq file
Second Seq File to Update the table

This has really improved the performance

But I have been questioned by my boss that how does a direct update take more time than writing to a file and updating.

I am not sure what to say?
But the job is running faster
Yes we need to clean up the files and extra space is required

Well thanks again

Kishan
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

When a query is actively spooling data, it wants to send the data as fast as possible, but if the receiving process is busy doing something, the query slows down because it doesn't have an urgent need to be doing something.

When the transformation only job is doing database handshaking for lookups and inserts, it can't work fast enough because it's waiting on database and network i/o, thus it also slows down because the OS sees it doesn't need urgent attention.

During loading, if the delivery of rows for insert or update is not consistent because the originating query or transformation is busy, again the database loses "urgency" to spend attention on the loading process.

Dedicated extract allows processing to "concentrate" on spooling data. Dedicated transformation without network/database interference allows the process to fly. Dedicated loading allows the database to consume source rows and apply it in a consistently fed pattern, thus keeping the attention of the OS for the duration of process and thus getting higher priority.

It actually works folks, smaller modular jobs can be faster overall.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply