How to pick first/ last CHAR value based on key

dsproj2003 · Post by **dsproj2003** » Thu Aug 14, 2003 6:24 pm

Hi,

How do we pick first/ last CHAR value, based on a key,

For e.g. :-

If my input data is,say:
key----fld2---fld3
X------"AA"---"MEM"
X------"CC"---"SEC"
X------"BB"---"CPU"
Y------"SS"---"BIT"
Y------"TT"---"REM"

I need to pick first of fld2, and last of fld3 (lexographically)

So required output should be:
X-----"AA"----"SEC"
Y-----"SS"----"REM"

I understand we can use aggregator for integer fields (option min/ max)
How do we abt the first/ last fld for char type fields?

Pls give pointers..

Thanks.
Regards,
Nitin

degraciavg · Post by **degraciavg** » Thu Aug 14, 2003 8:16 pm

You can also find the FIRST and LAST aggregation function in the Aggregator stage. I think that will solve your problem.

regards,
vladimir

dsproj2003 · Post by **dsproj2003** » Thu Aug 14, 2003 9:58 pm

Hi,

I dont know why but I cant find the firts/ last aggregation function in the sub-properties in aggregator stage.

It only has MIN and MAX options, however those seem to be working for numeric fields only and not for char fields.

Is this probably not avbl in ther version I have?

I am using PX ver 6.0. DataSteg Designer ver 6.0.0.17

Any clues....?

Nitin

degraciavg · Post by **degraciavg** » Thu Aug 14, 2003 11:13 pm

it's not in PX?! duh![xx(]

I'm not familiar with PX yet but there's a work-around to your problem... well, IF we can sort your data... you will need two hash files - one to get the fist value and another to hold the last value. here's the trick:
1. create a hash file with key and fld2
2. sort key and fld2 in DESCENDING order and write your data into the hash file (this will give you the first value)
3. create another hash file with key and fld3
4. sort key and fld2 in ASCENDING order and write your data into the hash file (this will give you the last value)

it's a nice trick... and with better performance than doing aggregation [;)] hope this helps [:)]

vladimir

ray.wurlod · Post by **ray.wurlod** » Fri Aug 15, 2003 12:06 am

The "trick" that Vladimir described works because all writes to hashed files are destructive overwrites. However, parallel jobs do not have a hashed file stage, so you need to write to some kind of data set or table with a "replace existing rows" rule, for example a DataSet stage with "Overwrite" as the Update Policy.

(If that seems like gibberish to those of you who only know server jobs I'm sorry, but it can't be helped.) [8D]

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

degraciavg · Post by **degraciavg** » Fri Aug 15, 2003 12:58 am

indeed, that's soundz Grrrr...EEK to me! [:D]

nevertheless, we need focus here [:)]

how about this solution...
1. generate a surrogate key (just a plain sequence) for each record. so now you have key, skey, fld2, fld3
2. stage your data into a staging table, say tmpTable
3. use this query
select a.key, a.fld2 as FIRST
from tmpTable a
where a.skey = (select min(b.skey) from tmpTable b
where a.key = b.key)
and
select a.key, a.fld3 as LAST
from tmpTable a
where a.skey = (select max(b.skey) from tmpTable b
where a.key = b.key)

it works... unless conventional SQL doesn't work in PX too... hmmm... [}:)]

vladimir

ray.wurlod · Post by **ray.wurlod** » Fri Aug 15, 2003 3:38 am

The whole point of PX is to avoid touching down on disk wherever possible. A data set, in PX, is essentially a memory resident table. But it is not accessed via SQL.

degraciavg · Post by **degraciavg** » Fri Aug 15, 2003 5:08 am

Thanks Ray.

Btw, I've just read from the DS PX document that server jobs can be created and accessed using Shared Containers [:0] All server plug-ins (including hash files) are available.

Can the Shared Container be used to address Nitin's problems?

I don't have a PX server so I can't try it myself. Perhaps, Nitin would be kind enough to test it for us.

vladimir

WoMaWil · Post by **WoMaWil** » Fri Aug 15, 2003 7:28 am

Hi,

and what about stage variables, if all key fields come together? Could be done via an external sorting or a sort stage.

one stage variable remembers if key is the same
one stage variable remembers the lowest lexographical of field1
one stage variable remembers the highst lexographical of field3
one stage variable remembers the last key

only when first variable says change then the values are written and so on.

Eventually add an additional last line for to get the last switch.

Wolfgang Huerter
=====================
Cologne, Germany

ray.wurlod · Post by **ray.wurlod** » Fri Aug 15, 2003 6:20 pm

(Vladimir)
It is true that server jobs can be encapsulated in shared containers and these can then be used in parallel jobs, though there are some restruictions. Even though you don't have PX licensed, you do have the Parallel Job Developer's Guide installed with your DS client software and so can read about the restrictions there.

Teej · Post by **Teej** » Mon Aug 18, 2003 12:04 pm

With 1 key, get first for a field, and last for a field?

* * *

[input stage 1] ---> [Filter Stage]
|
v
[Merge] -----> [whatever]
^
|
[input Stage 2] ---> [Filter Stage]

* * *

For the first input stage, get key, and first column.
For the second input stage, get key and second column.

Filter Stage are to be set to sort ascending for first column, and descending for second column. Then you are to retain the first duplicate. Be sure to have Null set to last for both (unless you want Null as a valid value one way or another).

Then merge using the key.

If you are doing this in the middle of a process, use a Transform stage to split up the records.

Extra fields? Send it one way or another, no problem.

-T.J.

* * *

... now if this can make breakfast, my life is complete.

bigpoppa · Post by **bigpoppa** » Mon Aug 18, 2003 3:33 pm

Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

If performance is critical for you, buildop will probably be your best bet. To create performant PX jobs, you should minimize the amount of times you sort and/or rehash your data. Buildop will take advantage of upstream sorts/hashes, and it introduces no sequential bottlenecks.

Teej · Post by **Teej** » Tue Aug 19, 2003 8:04 am

quote:Originally posted by bigpoppa
[br]Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

I don't understand why it is necessary to write a buildop when Filter Stage does the job. Can you elaborate on the benefits of using a buildop over Filter Stage?

-T.J.

* * *

... now if this can make breakfast, my life is complete.