Logic

samsuf2002 · Post by **samsuf2002** » Tue Sep 15, 2009 11:28 pm

Hi All,

I have requirement where for example I have sample data

ID|VALUE|NUMBER
100|AA1|1
100|AA1|2
100|AA1|3
100|AA1|10
100|AA1|11
100|AA1|12
100|BB1|23
100|BB1|24
100|BB1|25
100|BB1|50
100|BB1|51
100|BB1|52
100|BB1|53
100|BB1|54

I need the output to be

Code: Select all

ID|VALUE|NUMBER1|NUMBER2
100|AA1|1|3
100|AA1|10|12
100|BB1|23|25
100|BB1|50|54

Can anyone help me with any ideas to implement the logic...

Thanks in Advance

saraswati · Post by **saraswati** » Tue Sep 15, 2009 11:43 pm

i presume the last row to be : 100|BB1|50|52 instead of 100|BB1|51|52

saraswati · Post by **saraswati** » Tue Sep 15, 2009 11:45 pm

i presume the last row to be : 100|BB1|50|52 instead of 100|BB1|51|52

ray.wurlod · Post by **ray.wurlod** » Tue Sep 15, 2009 11:45 pm

Write it out in English first. This is always a good idea, as it focusses your thought processes. For example, is the break every third row, or the tens digit of NUMBER, or something else? Once you've figured that out, and documented it, you will have your transformation (source to target mapping) specification.

meet_deb85 · Post by **meet_deb85** » Wed Sep 16, 2009 12:20 am

Follow the following steps : -

1) Take the input into a transformer.
2) Make a column tens_place in the output link.
3) The derivation of that column should be input.NUMBER/10, (Cast it as an integer column)
4) Take an aggregator and group on the basis of ID,VALUE and tens_place.
5) Take out the maximum and minimum value for NUMBER.

samsuf2002 · Post by **samsuf2002** » Wed Sep 16, 2009 8:19 am

The break can be at any point, it should generate a new row with minimum and maximum value for every sequence stream. if any of the next value is greater than the next sequence value (like getting 8 after 5 followed by 9,10,11)) then there should be another row having the minimum as 8 and maximum as 11. Its like getting the range of all the sequence breaks for VALUE column. I have changed the example above to be understand properly.

And if there is a case where we get like 5,8,11 (single value breaks) then new row with minimum and maximum value will be same like 5|5,8|8,11|11.

Sorry for not explaining in the first post.

samsuf2002 · Post by **samsuf2002** » Wed Sep 16, 2009 12:19 pm

I was working on it and found one solution, if I could add another column to my input file say SEQ such that it look like

Code: Select all

ID|VALUE|NUMBER|SEQ
100|AA1|1|1
100|AA1|2|1
100|AA1|3|1
100|AA1|10|2
100|AA1|11|2
100|AA1|12|2
100|BB1|23|1
100|BB1|24|1
100|BB1|25|1
100|BB1|50|2
100|BB1|51|2
100|BB1|52|2
100|BB1|53|2
100|BB1|54|2
100|BB1|102|3
100|BB1|103|3
100|BB1|104|3
100|BB1|105|3
100|BB1|106|3
100|BB1|107|3
100|BB1|108|3

then if I group it in aggregator on ID,VALUE, and SEQ then I can get min and max values for each break...If anyone can suggest me if I am going right and how can I get the above format for SEQ column that will be great.

Sreenivasulu · Post by **Sreenivasulu** » Wed Sep 16, 2009 12:32 pm

Great !!! you found a solution as soon as you wrote the logic in english as requested by wurlod.

Regards
Sreeni

ray.wurlod · Post by **ray.wurlod** » Wed Sep 16, 2009 5:35 pm

So the secret was "non-contiguous" rather than "tens digit"?