Page 1 of 1

How to pick first/ last CHAR value based on key

Posted: Thu Aug 14, 2003 6:24 pm
by dsproj2003
Hi,

How do we pick first/ last CHAR value, based on a key,

For e.g. :-

If my input data is,say:
key----fld2---fld3
X------"AA"---"MEM"
X------"CC"---"SEC"
X------"BB"---"CPU"
Y------"SS"---"BIT"
Y------"TT"---"REM"

I need to pick first of fld2, and last of fld3 (lexographically)

So required output should be:
X-----"AA"----"SEC"
Y-----"SS"----"REM"

I understand we can use aggregator for integer fields (option min/ max)
How do we abt the first/ last fld for char type fields?

Pls give pointers..

Thanks.
Regards,
Nitin

Posted: Thu Aug 14, 2003 8:16 pm
by degraciavg
You can also find the FIRST and LAST aggregation function in the Aggregator stage. I think that will solve your problem.

regards,
vladimir

Posted: Thu Aug 14, 2003 9:58 pm
by dsproj2003
Hi,

I dont know why but I cant find the firts/ last aggregation function in the sub-properties in aggregator stage.

It only has MIN and MAX options, however those seem to be working for numeric fields only and not for char fields.

Is this probably not avbl in ther version I have?

I am using PX ver 6.0. DataSteg Designer ver 6.0.0.17

Any clues....?

Nitin

Posted: Thu Aug 14, 2003 11:13 pm
by degraciavg
it's not in PX?! duh![xx(]

I'm not familiar with PX yet but there's a work-around to your problem... well, IF we can sort your data... you will need two hash files - one to get the fist value and another to hold the last value. here's the trick:
1. create a hash file with key and fld2
2. sort key and fld2 in DESCENDING order and write your data into the hash file (this will give you the first value)
3. create another hash file with key and fld3
4. sort key and fld2 in ASCENDING order and write your data into the hash file (this will give you the last value)

it's a nice trick... and with better performance than doing aggregation [;)] hope this helps [:)]

vladimir

Posted: Fri Aug 15, 2003 12:06 am
by ray.wurlod
The "trick" that Vladimir described works because all writes to hashed files are destructive overwrites. However, parallel jobs do not have a hashed file stage, so you need to write to some kind of data set or table with a "replace existing rows" rule, for example a DataSet stage with "Overwrite" as the Update Policy.

(If that seems like gibberish to those of you who only know server jobs I'm sorry, but it can't be helped.) [8D]

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

Posted: Fri Aug 15, 2003 12:58 am
by degraciavg
indeed, that's soundz Grrrr...EEK to me! [:D]

nevertheless, we need focus here [:)]

how about this solution...
1. generate a surrogate key (just a plain sequence) for each record. so now you have key, skey, fld2, fld3
2. stage your data into a staging table, say tmpTable
3. use this query
select a.key, a.fld2 as FIRST
from tmpTable a
where a.skey = (select min(b.skey) from tmpTable b
where a.key = b.key)
and
select a.key, a.fld3 as LAST
from tmpTable a
where a.skey = (select max(b.skey) from tmpTable b
where a.key = b.key)

it works... unless conventional SQL doesn't work in PX too... hmmm... [}:)]

vladimir

Posted: Fri Aug 15, 2003 3:38 am
by ray.wurlod
The whole point of PX is to avoid touching down on disk wherever possible. A data set, in PX, is essentially a memory resident table. But it is not accessed via SQL.

Posted: Fri Aug 15, 2003 5:08 am
by degraciavg
Thanks Ray.

Btw, I've just read from the DS PX document that server jobs can be created and accessed using Shared Containers [:0] All server plug-ins (including hash files) are available.

Can the Shared Container be used to address Nitin's problems?

I don't have a PX server so I can't try it myself. Perhaps, Nitin would be kind enough to test it for us.

vladimir

Posted: Fri Aug 15, 2003 7:28 am
by WoMaWil
Hi,

and what about stage variables, if all key fields come together? Could be done via an external sorting or a sort stage.

one stage variable remembers if key is the same
one stage variable remembers the lowest lexographical of field1
one stage variable remembers the highst lexographical of field3
one stage variable remembers the last key

only when first variable says change then the values are written and so on.

Eventually add an additional last line for to get the last switch.



Wolfgang Huerter
=====================
Cologne, Germany

Posted: Fri Aug 15, 2003 6:20 pm
by ray.wurlod
(Vladimir)
It is true that server jobs can be encapsulated in shared containers and these can then be used in parallel jobs, though there are some restruictions. Even though you don't have PX licensed, you do have the Parallel Job Developer's Guide installed with your DS client software and so can read about the restrictions there.

Posted: Mon Aug 18, 2003 12:04 pm
by Teej
With 1 key, get first for a field, and last for a field?

* * *

[input stage 1] ---> [Filter Stage]
|
v
[Merge] -----> [whatever]
^
|
[input Stage 2] ---> [Filter Stage]


* * *

For the first input stage, get key, and first column.
For the second input stage, get key and second column.

Filter Stage are to be set to sort ascending for first column, and descending for second column. Then you are to retain the first duplicate. Be sure to have Null set to last for both (unless you want Null as a valid value one way or another).

Then merge using the key.

If you are doing this in the middle of a process, use a Transform stage to split up the records.

Extra fields? Send it one way or another, no problem.

-T.J.


* * *

... now if this can make breakfast, my life is complete.

Posted: Mon Aug 18, 2003 3:33 pm
by bigpoppa
Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

If performance is critical for you, buildop will probably be your best bet. To create performant PX jobs, you should minimize the amount of times you sort and/or rehash your data. Buildop will take advantage of upstream sorts/hashes, and it introduces no sequential bottlenecks.

Posted: Tue Aug 19, 2003 8:04 am
by Teej
quote:Originally posted by bigpoppa
[br]Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

I don't understand why it is necessary to write a buildop when Filter Stage does the job. Can you elaborate on the benefits of using a buildop over Filter Stage?

-T.J.


* * *

... now if this can make breakfast, my life is complete.