Page 1 of 1

how to check the set of 50 rows

Posted: Thu Nov 25, 2004 11:32 am
by ravikiran2712
happy thanksgiving guys,
in one of my jobs iam getting 50 rows for each store/date combination into a transformer.i will have to check the 50 rows for one condition and i can output the 50 rows only if 80% of the 50 rows satisfy that condition.the problem over here is with a single transformer stage i cannot do it as i can decide whether i can output the 50 rows after i read the 50 rows so i will have to buffer the rows instead of writing it to the output. i need suggestions on this regarding the usage of the trasnformer stage for this purpose. if anyone can give a better idea i can change the logic.
thank you,
ravi

Posted: Thu Nov 25, 2004 5:07 pm
by xcb
Hi Ravi,

I would suggest you do this in a 2 part process. The first part takes your input data passes it through a transform and gives a rank\weight to each record based on your condition. Then load this rank along with your keys (store\date) into a UV table.

The second part again takes your input data into a transformer which has a lookup to the pre-loaded UV table. Use a group by on the UV table with a sum on the rank to identify for each store\date combination the total rank (this would most likely be a sum aggregation). Within the transformer match your input data to the lookup data and place a constraint on the output side of the transformer that only passes data through if the total rank from the lookup is >= 80%.

If you have a large volume of data you may find the lookup is slow, so it would be worth while aggregating the data before you load the lookup.

Sorry if this isn't what you were after, at least it should give you another approach to how you are doing things.

Posted: Thu Nov 25, 2004 5:09 pm
by xcb
Sorry - I just noticed that this is for px. I don't know if my solution will work.

can we do it without storing

Posted: Thu Nov 25, 2004 9:01 pm
by ravikiran2712
hi thanx for our reply,
can we do the job without storing it in a database as it takes lot of time.

Posted: Thu Nov 25, 2004 10:11 pm
by xcb
I don't think so, at least not doing it the way that I have suggested. I've never done any parallel work so there may be a better and more elegant solution out there for that architecture.

Posted: Mon Nov 29, 2004 11:02 pm
by T42
Input -> Transformer -> Sort -> Transformer -> Output.

First transformer = use a counter method (search for it) that handles your conditions. Pass the counter output appended to each record.

Sort the data so that the maximum value of the counter are ranked first.

Second transformer would handle the logic of "If this group of key have a value above this, trigger this behavior (output record to output 1 using constraint)."

Good luck.