Passing multiple groups of recs to a shared container

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Passing multiple groups of recs to a shared container

Post by tbtcust »

Hello All,

- How can I pass multiple groups of records to a shared container for processioning at the group level?

- Is there a way to have a shared container wait/pause while a group of records is being processed?


I have a need to pass groups of records to a shared container for transform as a groups and there are inter-dependencies among the records for each group.

I have worked out how to pass the groups into the shared container as a group. However, when I send a single group through the shared container it works fine. When I send multiple groups, the groups get mixed together.

Thanks in advance for any help.
bart12872
Participant
Posts: 82
Joined: Fri Jan 19, 2007 5:38 pm

Post by bart12872 »

Well, shared container must be seen like a simple DataStage job.
The shared container just permit to factorize process that will be use in multiple jobs.

Said in another words, shared container can't do more than a job can do.

You can't use a shared container to make a process than you can't do i a job.

In your case, I don't understand your needs, with group level and records in each group and dependencies.
Can you give an example ?
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Thank you for your reply bart12872

In the example below the key field is what I am using to group the data.

Once grouped in the calling job I pass the groups to the shared container.

In the shared container I'm evaluating fld_1 and fld_2. For a group when fld_1 is "1", there must be a "X", "z", and "r" in fld_2 across that group and send back to the calling program a single record. There is similar logic when fld_1 is "2" or "3".

When I have one group in the input file of the call job it works fine. When I have multiple groups in the input file, all the records are evaluated together in the container, instead of one group at a time.

Thanks.

Key, fld_1, fld_2
=-=-=-=-=-=-=-=-=
AAA, 1, x
AAA, 1, y
AAA, 1, z
AAA, 1, r

BBB, 2, d
BBB, 2, f
BBB, 2, g

CCC, 3, h
CCC, 3, j
CCC, 3, k
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Describe the contents of the shared container - what stages are there and what functions are they performing? Off the top of my head, this just looks like a Transformer using stage variables to do 'group change detection', perhaps supported by a Sort stage adding a Key Change column. Each time the group changes or when you hit EOD pass out your group result... whatever that is.
-craig

"You can never have too many knives" -- Logan Nine Fingers
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Hello chulett,

There is a transformer that receives the groups and performs the evaluations. and a couple of joins to create the output.
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

I believe this is a partitioning issue.When the data is send to the shared container it is grouped as per your need but when it gets into shared container,it is getting re partitioned thus affecting your group logic.If it is so it is better to use a same partition in input column of shared container so that re partition doesnt happen.but keep in mind that it all depends on what operations you are performing in shared container.if there is a operation which involves key change then it is inevitable to escape re partition.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

It would help if you explained your 'grouping evaluation' logic in the transformer.

Assuming this shared container is meant to be used in multiple jobs then I would suggest you make no assumptions about how the incoming data is arriving. Sort it by group first, add a Key Change column to ease your grouping logic in the transformer and use hash partitioning so that your groups stay together... unless you force the transformer to run sequentially or the job always runs on a single node.
-craig

"You can never have too many knives" -- Logan Nine Fingers
tbtcust
Premium Member
Premium Member
Posts: 230
Joined: Tue Mar 04, 2008 9:07 am

Post by tbtcust »

Thank you all.

I thought through what bart12872 and pavi wrote and redesigned the approach to the shared container.

I am treating the container as a job where I'm using change key in sorts, LastRowInGroup(), and control break logic.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Glad I could help. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply