How to implement control break in PX?
Moderators: chulett, rschirm, roy
How to implement control break in PX?
Here's my problem:
- I have data entered in target database with the following structure :
KEY1 KEY2 KEY3
XXXX YYYYY 1
XXXX YYYYY 2
Think of it key1 as employee number and key2 division number or something similar; basically they do not change. On the other hand whenever there is a new entry for this division and employee the KEY3 goes up by 1.
I have data coming in that will have XXXX and YYYYY as keys. The challange is to get the numbers in KEY3 increased by 1 all the time only for the combination of XXXX YYYYs
Sounds simple, at first, but considering I will have more than one data coming in with XXXX and YYYY (i.e. the same data in KEY1 KEY2) the sequence will have to be incremented more than once.
I can capture the maximum value for the combination of XXXX / YYYY with SELECT MAX(KEY3) from ABCtable group by KEY1, KEY2. That's not the issue.
The biggest problem I have is how do I keep the value of KEY3 increase everytime I have more than one data coming in from the stream.
The second biggest problem is how do I make sure the sort sequence doesn't get out of sequence; since this is parallel processing. Would it help to chance the Execution mode to "Sequential"?
I hope I didn't make it sound the situation more complicated than it already is..
thanks..
I have new data coming in from a stream
- I have data entered in target database with the following structure :
KEY1 KEY2 KEY3
XXXX YYYYY 1
XXXX YYYYY 2
Think of it key1 as employee number and key2 division number or something similar; basically they do not change. On the other hand whenever there is a new entry for this division and employee the KEY3 goes up by 1.
I have data coming in that will have XXXX and YYYYY as keys. The challange is to get the numbers in KEY3 increased by 1 all the time only for the combination of XXXX YYYYs
Sounds simple, at first, but considering I will have more than one data coming in with XXXX and YYYY (i.e. the same data in KEY1 KEY2) the sequence will have to be incremented more than once.
I can capture the maximum value for the combination of XXXX / YYYY with SELECT MAX(KEY3) from ABCtable group by KEY1, KEY2. That's not the issue.
The biggest problem I have is how do I keep the value of KEY3 increase everytime I have more than one data coming in from the stream.
The second biggest problem is how do I make sure the sort sequence doesn't get out of sequence; since this is parallel processing. Would it help to chance the Execution mode to "Sequential"?
I hope I didn't make it sound the situation more complicated than it already is..
thanks..
I have new data coming in from a stream
Earthbound misfit I..
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Have you investigated the Surrogate Key Generator stage?
Otherwise you'd design it like you would a server job, substituting a Data Set stage for the Hashed File stage. And you would have to restrict execution to sequential mode, and guarantee that the Data Set is updated before the next row is processed. (Or take the overhead of a SELECT MAX... for every row processed.)
Otherwise you'd design it like you would a server job, substituting a Data Set stage for the Hashed File stage. And you would have to restrict execution to sequential mode, and guarantee that the Data Set is updated before the next row is processed. (Or take the overhead of a SELECT MAX... for every row processed.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
There is a FAQ on implementing a counter in a parallel transformer using the @PARTITIONNUM and @PARTITIONCOUNT variables.
Consider this design: sort by KEY1 and KEY2. In a lookup stage retrieve the KEY1, KEY2 and MAX(KEY3) from the database to return MAX_KEY3. This should be a very efficient lookup in a parallel job as the SQL will only be run once. In a transformer set up a counter variables as per the FAQ. Add some additional logic, if the KEY1 and KEY2 values change reset the counter to MAX_KEY3, otherwise increment it as per the FAQ.
This should give you an incrementing key for each key combination that is unique between partitions. There may be some missing values in the key sequence if one partition has more rows then the others.
Consider this design: sort by KEY1 and KEY2. In a lookup stage retrieve the KEY1, KEY2 and MAX(KEY3) from the database to return MAX_KEY3. This should be a very efficient lookup in a parallel job as the SQL will only be run once. In a transformer set up a counter variables as per the FAQ. Add some additional logic, if the KEY1 and KEY2 values change reset the counter to MAX_KEY3, otherwise increment it as per the FAQ.
This should give you an incrementing key for each key combination that is unique between partitions. There may be some missing values in the key sequence if one partition has more rows then the others.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The partition number and partition count are also available in the Increment property of the Surrogate Key Generator and Column Generator (and, I believe, the Row Generator) stages.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I am already capturing the maximum value for each key column like this.vmcburney wrote: Consider this design: sort by KEY1 and KEY2. In a lookup stage retrieve the KEY1, KEY2 and MAX(KEY3) from the database to return MAX_KEY3. This should be a very efficient lookup in a parallel job as the SQL will only be run once.
Here's the problem? How do i capture the change in the KEY1 value? (For the sake of argument let's assume I have only one key). I was able to capture the change in the key value using the RowProcPreviousValue using the BASIC Transformer. Your FAQ solution doesn't allow this because it uses the PX transformer.vmcburney wrote: In a transformer set up a counter variables as per the FAQ. Add some additional logic, if the KEY1 and KEY2 values change reset the counter to MAX_KEY3, otherwise increment it as per the FAQ.
I think we are not at the same page as far as my requirement is concerned. I have no problems establishing an ever increasing number for each data in the stream either using your method or using a BASIC transformer. My challange is to be able to reset the counter every time I have a control break. Using a stage variable doesn't allow me to reset the value. (If there is a way, I am not aware of it)vmcburney wrote: This should give you an incrementing key for each key combination that is unique between partitions. There may be some missing values in the key sequence if one partition has more rows then the others.
To help you visiualize the issue let me give you the following sample data:
Code: Select all
KEY MAXVALUE
------ ------------
AAAA 33
AAAA 33
AAAA 33
AAAA 33
BBBB 67
CCCC 88
CCCC 88
Code: Select all
KEY SEQNUMBER
AAAA 34
AAAA 35
AAAA 36
AAAA 37
BBBB 68
CCCC 89
CCCC 90
Earthbound misfit I..
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
If you partition and sort by KEY1 and KEY2 then all records belonging to a particular KEY1 / KEY2 value will be on the same partition in a sorted order. This means you can use the transformer stage variables to keep the previous value of KEY1/KEY2 and compare it to the incoming row.
Code: Select all
svNewKey input.KEY1 : '|' : input.KEY2
svKeyChange svNewKey <> svLastKey or svLastKey = ""
svLastKey input.KEY1 : '|' : input.KEY2
svCounter if KeyChange then input.MAX_KEY3 else svCounter + @NUMPARTITIONS
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Done..vmcburney wrote:If you partition and sort by KEY1 and KEY2 then all records belonging to a particular KEY1 / KEY2 value will be on the same partition in a sorted order.
In this case both svLastKey and svNewKey have the same values. I am not sure if LastKey will hold a different value than the NewKey..vmcburney wrote: This means you can use the transformer stage variables to keep the previous value of KEY1/KEY2 and compare it to the incoming row.
Code: Select all
svNewKey input.KEY1 : '|' : input.KEY2 svKeyChange svNewKey <> svLastKey or svLastKey = "" svLastKey input.KEY1 : '|' : input.KEY2 svCounter if KeyChange then input.MAX_KEY3 else svCounter + @NUMPARTITIONS
As a mattar of fact in the middle of the reply i tested this and I found to be that both NewKey and LastKey return the same values.
Am I missing something?
thanks..
Earthbound misfit I..