Passing multiple groups of recs to a shared container
Moderators: chulett, rschirm, roy
Passing multiple groups of recs to a shared container
Hello All,
- How can I pass multiple groups of records to a shared container for processioning at the group level?
- Is there a way to have a shared container wait/pause while a group of records is being processed?
I have a need to pass groups of records to a shared container for transform as a groups and there are inter-dependencies among the records for each group.
I have worked out how to pass the groups into the shared container as a group. However, when I send a single group through the shared container it works fine. When I send multiple groups, the groups get mixed together.
Thanks in advance for any help.
- How can I pass multiple groups of records to a shared container for processioning at the group level?
- Is there a way to have a shared container wait/pause while a group of records is being processed?
I have a need to pass groups of records to a shared container for transform as a groups and there are inter-dependencies among the records for each group.
I have worked out how to pass the groups into the shared container as a group. However, when I send a single group through the shared container it works fine. When I send multiple groups, the groups get mixed together.
Thanks in advance for any help.
Well, shared container must be seen like a simple DataStage job.
The shared container just permit to factorize process that will be use in multiple jobs.
Said in another words, shared container can't do more than a job can do.
You can't use a shared container to make a process than you can't do i a job.
In your case, I don't understand your needs, with group level and records in each group and dependencies.
Can you give an example ?
The shared container just permit to factorize process that will be use in multiple jobs.
Said in another words, shared container can't do more than a job can do.
You can't use a shared container to make a process than you can't do i a job.
In your case, I don't understand your needs, with group level and records in each group and dependencies.
Can you give an example ?
Thank you for your reply bart12872
In the example below the key field is what I am using to group the data.
Once grouped in the calling job I pass the groups to the shared container.
In the shared container I'm evaluating fld_1 and fld_2. For a group when fld_1 is "1", there must be a "X", "z", and "r" in fld_2 across that group and send back to the calling program a single record. There is similar logic when fld_1 is "2" or "3".
When I have one group in the input file of the call job it works fine. When I have multiple groups in the input file, all the records are evaluated together in the container, instead of one group at a time.
Thanks.
Key, fld_1, fld_2
=-=-=-=-=-=-=-=-=
AAA, 1, x
AAA, 1, y
AAA, 1, z
AAA, 1, r
BBB, 2, d
BBB, 2, f
BBB, 2, g
CCC, 3, h
CCC, 3, j
CCC, 3, k
In the example below the key field is what I am using to group the data.
Once grouped in the calling job I pass the groups to the shared container.
In the shared container I'm evaluating fld_1 and fld_2. For a group when fld_1 is "1", there must be a "X", "z", and "r" in fld_2 across that group and send back to the calling program a single record. There is similar logic when fld_1 is "2" or "3".
When I have one group in the input file of the call job it works fine. When I have multiple groups in the input file, all the records are evaluated together in the container, instead of one group at a time.
Thanks.
Key, fld_1, fld_2
=-=-=-=-=-=-=-=-=
AAA, 1, x
AAA, 1, y
AAA, 1, z
AAA, 1, r
BBB, 2, d
BBB, 2, f
BBB, 2, g
CCC, 3, h
CCC, 3, j
CCC, 3, k
Describe the contents of the shared container - what stages are there and what functions are they performing? Off the top of my head, this just looks like a Transformer using stage variables to do 'group change detection', perhaps supported by a Sort stage adding a Key Change column. Each time the group changes or when you hit EOD pass out your group result... whatever that is.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I believe this is a partitioning issue.When the data is send to the shared container it is grouped as per your need but when it gets into shared container,it is getting re partitioned thus affecting your group logic.If it is so it is better to use a same partition in input column of shared container so that re partition doesnt happen.but keep in mind that it all depends on what operations you are performing in shared container.if there is a operation which involves key change then it is inevitable to escape re partition.
It would help if you explained your 'grouping evaluation' logic in the transformer.
Assuming this shared container is meant to be used in multiple jobs then I would suggest you make no assumptions about how the incoming data is arriving. Sort it by group first, add a Key Change column to ease your grouping logic in the transformer and use hash partitioning so that your groups stay together... unless you force the transformer to run sequentially or the job always runs on a single node.
Assuming this shared container is meant to be used in multiple jobs then I would suggest you make no assumptions about how the incoming data is arriving. Sort it by group first, add a Key Change column to ease your grouping logic in the transformer and use hash partitioning so that your groups stay together... unless you force the transformer to run sequentially or the job always runs on a single node.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers