about usage of Link partitioners

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

about usage of Link partitioners

Post by pavanns »

Hi
can link partitoning be done in server or has it to be only on parallel jobs ..im trying just to practice abt link partitioners and link collectors..when i try this all the output frm the link partitioners is in to one of the three transformers that i have linked to teh link partitioner..the other two transformers have no input values..why is this happening am i correct in joining 3 transformers in parallel to a LP..pls throw some light on this
pavan
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

Post by pavanns »

let me add to the above query : wht kind of partitioning is best in case of performance analysis in this stage
pavan
pavanns
Participant
Posts: 27
Joined: Wed Sep 28, 2005 8:00 pm
Location: ca

Post by pavanns »

pls help me
pavan
trokosz
Premium Member
Premium Member
Posts: 188
Joined: Thu Sep 16, 2004 6:38 pm
Contact:

Post by trokosz »

Yes, LinkPartitioner and LinkCollector Stages......But check out the IPC Stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I do not like to use Link Partitioners to achieve partition parallelism (see Chapter 2 of Parallel Job Developer's Guide) in server jobs. A job that uses a link partitioner presumably splits one stream of data for multiple processing. And a link collector has to gather them all back together into a single stream for writing.

Instead I would prefer to use a multi-instance job. In this way I am not bottlenecked on readers or writers. If necessary (for summarising across the entire set, for example), I might direct the various jobs' outputs into another job - perhaps using named pipes or some other ipc mechanism - in which that could occur. But I'd probably use intermediate text files, cat them all together, and use that as the single input for the final job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Ray,

Just last week I came across a scenario where using the link partitioner made more sense than doing a multi-instance job.

The data flow came from a database table and some complex transformations were done to the data before writing it to a staging hashed file. The throughput was (I'll use rows/second to give scaling) about 400 rows/s on a large SMP machine.

It turns out that the database access was a full table select and as the database was on a remote machine with a fixed bandwidth it made no sense to split that into separate queries, but after some tuning of the complex transformations and using a link partitioner to split the data into 4 parallel links the performance went up to 5000 rows/s.

In this case the process was 100% CPU bound, so splitting the computations across several processes balanced the load so that the job was using all of it's I/O potential without having to wait for a single process' CPU. I think this is one of the few types of cases where I see link partitioning to be advantageous.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'll always keep an open mind (which is different from a hole in the head). Sure there will be exceptions to every "rule", and Arnd has highlighted a good one. I probably would have gone the same way there, or I may have split the task to load into a single local source (text file would do, and fixed-width is best) and run multi-instance jobs using that as source, each instance processing a different subset of rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply