Brain Teaser...Complex Scneario
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Brain Teaser...Complex Scneario
Hi DSXians,
I came up with a Brain Teaser on Datastage. This problem is bugging us, hope the Datastage world will help me.
The Scenario is, my source data have a constrain on the Length of the String Field (10), if it crosses that limit, it is storing in a new row with the same primary keys by adding a Sequence number it. See the example below:
SOURCE_D KEY1 KEY2 SEQ_I String
2010-04-10 100 200 1 HELLO HOWRU
2010-04-10 100 200 2 I AM FINE
In my destination warehouse table, I am concatenating the String from SEQ_I 1 & 2. I need to see this data as
KEY1 KEY2 String START_D END_D
100 200 HELLO HOWRU I AM FINE 2010-04-10 9999-12-31
The above scenario is a simple scenario. Now I have scenario like below and all the data whatever I am mentioning is coming in a single source file and it is a History Load of 2 months at 1 time.
SOURCE_D KEY1 KEY2 SEQ_I String
2010-04-10 100 200 1 HELLO HOWRU
2010-04-10 100 200 2 I AM FINE
2010-04-12 100 200 3 WAT ABT YOU?
2010-04-14 100 200 4 ME FINE
2010-04-16 100 200 2 I AM NOT FINE
Now, I need my output should be show in warehouse as mentioned below.
KEY1 KEY2 String START_D END_D
100 200 HELLO HOWRU I AM FINE 2010-04-10 2010-04-11
100 200 HELLO HOWRU I AM FINE WAT ABT YOU? 2010-04-12 2010-04-13
100 200 HELLO HOWRU I AM FINE WAT ABT YOU? ME FINE 2010-04-14 2010-04-15
100 200 HELLO HOWRU I AM NOT FINE WAT ABT YOU? ME FINE 2010-04-16 9999-12-31
I tried many ways by using SQL, by using stage variables, I am not succeeded. Only thing you have to remember is everything is coming in one single file and we have to generate 4 output records as mentioned above.
Please do let me know, your inputs on this problem.
I came up with a Brain Teaser on Datastage. This problem is bugging us, hope the Datastage world will help me.
The Scenario is, my source data have a constrain on the Length of the String Field (10), if it crosses that limit, it is storing in a new row with the same primary keys by adding a Sequence number it. See the example below:
SOURCE_D KEY1 KEY2 SEQ_I String
2010-04-10 100 200 1 HELLO HOWRU
2010-04-10 100 200 2 I AM FINE
In my destination warehouse table, I am concatenating the String from SEQ_I 1 & 2. I need to see this data as
KEY1 KEY2 String START_D END_D
100 200 HELLO HOWRU I AM FINE 2010-04-10 9999-12-31
The above scenario is a simple scenario. Now I have scenario like below and all the data whatever I am mentioning is coming in a single source file and it is a History Load of 2 months at 1 time.
SOURCE_D KEY1 KEY2 SEQ_I String
2010-04-10 100 200 1 HELLO HOWRU
2010-04-10 100 200 2 I AM FINE
2010-04-12 100 200 3 WAT ABT YOU?
2010-04-14 100 200 4 ME FINE
2010-04-16 100 200 2 I AM NOT FINE
Now, I need my output should be show in warehouse as mentioned below.
KEY1 KEY2 String START_D END_D
100 200 HELLO HOWRU I AM FINE 2010-04-10 2010-04-11
100 200 HELLO HOWRU I AM FINE WAT ABT YOU? 2010-04-12 2010-04-13
100 200 HELLO HOWRU I AM FINE WAT ABT YOU? ME FINE 2010-04-14 2010-04-15
100 200 HELLO HOWRU I AM NOT FINE WAT ABT YOU? ME FINE 2010-04-16 9999-12-31
I tried many ways by using SQL, by using stage variables, I am not succeeded. Only thing you have to remember is everything is coming in one single file and we have to generate 4 output records as mentioned above.
Please do let me know, your inputs on this problem.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It seems you me that you're needing a vertical pivot and starting a new record when the sequence number resets to 1.
This is easily accomplished with a Sort stage, a Transformer stage and a Remove Duplicates stage. The technique can be found on DSXchange; search for vertical pivot.
This is easily accomplished with a Sort stage, a Transformer stage and a Remove Duplicates stage. The technique can be found on DSXchange; search for vertical pivot.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
As Ray Suggested you can use a sort stage and sort over the key columns and record start date then use a transformer with a pair of stage variables in such a way that you have both the current record sequence number and previous record sequence number then use two stage variable (svr1,svr2) ,
svr2:- keep the description of the latest record where sequence number was '1'.
svr1:-
1) If current record seq_number=1 then its description
2) If current record seq_number > Previous record seq_number then svr1=svr1:current_description
3) If current record seq_number < Previous record seq_number then svr1=svr2:current_description
Hope this helps
svr2:- keep the description of the latest record where sequence number was '1'.
svr1:-
1) If current record seq_number=1 then its description
2) If current record seq_number > Previous record seq_number then svr1=svr1:current_description
3) If current record seq_number < Previous record seq_number then svr1=svr2:current_description
Hope this helps
-
- Premium Member
- Posts: 783
- Joined: Mon Jan 16, 2006 10:17 pm
- Location: Sydney, Australia
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What you ideally want is an array... but not sure how you could hold such in a Parallel job... there is the vector but I haven't figured out how that could be used as part of a transform stage...
However, you could try make your own array as long as you know of a character that won't exist in your field (maybe ~ or |) or you could always filter such characters out...
At it's simplest, use a stage variable to hold the current text and one to hold the number of fields within this text.
After first row
For the next row, check if your SEQ_I <= Count
If > append <delim>String
Else Replace SEQ_I numbered text field in string with new text (this will probably need to be a set of Field, Index and Substring calls)
After second row
By the time you get to 2010-04-16 you should have
So when you get to "2 I AM NOT FINE"
you do a
On output, you could either push out every record and then do a remove duplicates or only output at each date change replacing the <delim> with space.
An idea anyway, may have to play around a bit but hopefully the general idea is there... or find out how to use vectors... (I see the Function list provides an ElementAt but no indication how you actually populate the vector in the first place)
However, you could try make your own array as long as you know of a character that won't exist in your field (maybe ~ or |) or you could always filter such characters out...
At it's simplest, use a stage variable to hold the current text and one to hold the number of fields within this text.
After first row
Code: Select all
Text: HELLO HOWRU
Count: 1
If > append <delim>String
Else Replace SEQ_I numbered text field in string with new text (this will probably need to be a set of Field, Index and Substring calls)
After second row
Code: Select all
Text: HELLO HOWRU<delim>I AM FINE
Count: 2
Code: Select all
Text: HELLO HOWRU<delim>I AM FINE<delim>WAT ABT YOU?<delim>ME FINE
Count: 4
you do a
Code: Select all
svX: Index(Field(<Text>, <SEQ_I>)) - SEQ_I being 2 in this case
svY: Substr(1,svX)<new text>substr(Field(<Text>, SEQ_I+1)-1)
An idea anyway, may have to play around a bit but hopefully the general idea is there... or find out how to use vectors... (I see the Function list provides an ElementAt but no indication how you actually populate the vector in the first place)
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
No need for vector/array. Just concatenate to the existing string if the indicator is greater than one, or use the new string if the indicator is 1.
The Remove Duplicates stage captures the last of each group. That, along with sorted data, is all you need.
The Remove Duplicates stage captures the last of each group. That, along with sorted data, is all you need.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
The way I am reading it, that is not what the OP wants. Any subsequent SEQ_I that is equal to a previous SEQ_I should replace the text of that record.
Using their example
Should a
appear after those already given it should result in
and not restart the sequence, nor should it be the only output.
If they wanted only one record with the most recent activity for each SEQ_I then I can see a sort/transform/dedup working but this has more requirements than what I can see such a process resolving.... but then as per the great Homer Simpson "Every time I learn something new it pushes something old out of my brain" so maybe my imagination hasn't twigged the bit that pushes that method in to place
Using their example
Should a
Code: Select all
2010-04-20 100 200 1 How ya going?
Code: Select all
100 200 How ya going? I AM NOT FINE WAT ABT YOU? ME FINE 2010-04-20 9999-12-31
If they wanted only one record with the most recent activity for each SEQ_I then I can see a sort/transform/dedup working but this has more requirements than what I can see such a process resolving.... but then as per the great Homer Simpson "Every time I learn something new it pushes something old out of my brain" so maybe my imagination hasn't twigged the bit that pushes that method in to place
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: