How to pick first/ last CHAR value based on key

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dsproj2003
Participant
Posts: 21
Joined: Wed Oct 01, 2003 11:53 am

How to pick first/ last CHAR value based on key

Post by dsproj2003 »

Hi,

How do we pick first/ last CHAR value, based on a key,

For e.g. :-

If my input data is,say:
key----fld2---fld3
X------"AA"---"MEM"
X------"CC"---"SEC"
X------"BB"---"CPU"
Y------"SS"---"BIT"
Y------"TT"---"REM"

I need to pick first of fld2, and last of fld3 (lexographically)

So required output should be:
X-----"AA"----"SEC"
Y-----"SS"----"REM"

I understand we can use aggregator for integer fields (option min/ max)
How do we abt the first/ last fld for char type fields?

Pls give pointers..

Thanks.
Regards,
Nitin
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Post by degraciavg »

You can also find the FIRST and LAST aggregation function in the Aggregator stage. I think that will solve your problem.

regards,
vladimir
dsproj2003
Participant
Posts: 21
Joined: Wed Oct 01, 2003 11:53 am

Post by dsproj2003 »

Hi,

I dont know why but I cant find the firts/ last aggregation function in the sub-properties in aggregator stage.

It only has MIN and MAX options, however those seem to be working for numeric fields only and not for char fields.

Is this probably not avbl in ther version I have?

I am using PX ver 6.0. DataSteg Designer ver 6.0.0.17

Any clues....?

Nitin
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Post by degraciavg »

it's not in PX?! duh![xx(]

I'm not familiar with PX yet but there's a work-around to your problem... well, IF we can sort your data... you will need two hash files - one to get the fist value and another to hold the last value. here's the trick:
1. create a hash file with key and fld2
2. sort key and fld2 in DESCENDING order and write your data into the hash file (this will give you the first value)
3. create another hash file with key and fld3
4. sort key and fld2 in ASCENDING order and write your data into the hash file (this will give you the last value)

it's a nice trick... and with better performance than doing aggregation [;)] hope this helps [:)]

vladimir
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "trick" that Vladimir described works because all writes to hashed files are destructive overwrites. However, parallel jobs do not have a hashed file stage, so you need to write to some kind of data set or table with a "replace existing rows" rule, for example a DataSet stage with "Overwrite" as the Update Policy.

(If that seems like gibberish to those of you who only know server jobs I'm sorry, but it can't be helped.) [8D]

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Post by degraciavg »

indeed, that's soundz Grrrr...EEK to me! [:D]

nevertheless, we need focus here [:)]

how about this solution...
1. generate a surrogate key (just a plain sequence) for each record. so now you have key, skey, fld2, fld3
2. stage your data into a staging table, say tmpTable
3. use this query
select a.key, a.fld2 as FIRST
from tmpTable a
where a.skey = (select min(b.skey) from tmpTable b
where a.key = b.key)
and
select a.key, a.fld3 as LAST
from tmpTable a
where a.skey = (select max(b.skey) from tmpTable b
where a.key = b.key)

it works... unless conventional SQL doesn't work in PX too... hmmm... [}:)]

vladimir
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The whole point of PX is to avoid touching down on disk wherever possible. A data set, in PX, is essentially a memory resident table. But it is not accessed via SQL.
degraciavg
Premium Member
Premium Member
Posts: 39
Joined: Tue May 20, 2003 3:36 am
Location: Singapore

Post by degraciavg »

Thanks Ray.

Btw, I've just read from the DS PX document that server jobs can be created and accessed using Shared Containers [:0] All server plug-ins (including hash files) are available.

Can the Shared Container be used to address Nitin's problems?

I don't have a PX server so I can't try it myself. Perhaps, Nitin would be kind enough to test it for us.

vladimir
WoMaWil
Participant
Posts: 482
Joined: Thu Mar 13, 2003 7:17 am
Location: Amsterdam

Post by WoMaWil »

Hi,

and what about stage variables, if all key fields come together? Could be done via an external sorting or a sort stage.

one stage variable remembers if key is the same
one stage variable remembers the lowest lexographical of field1
one stage variable remembers the highst lexographical of field3
one stage variable remembers the last key

only when first variable says change then the values are written and so on.

Eventually add an additional last line for to get the last switch.



Wolfgang Huerter
=====================
Cologne, Germany
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

(Vladimir)
It is true that server jobs can be encapsulated in shared containers and these can then be used in parallel jobs, though there are some restruictions. Even though you don't have PX licensed, you do have the Parallel Job Developer's Guide installed with your DS client software and so can read about the restrictions there.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

With 1 key, get first for a field, and last for a field?

* * *

[input stage 1] ---> [Filter Stage]
|
v
[Merge] -----> [whatever]
^
|
[input Stage 2] ---> [Filter Stage]


* * *

For the first input stage, get key, and first column.
For the second input stage, get key and second column.

Filter Stage are to be set to sort ascending for first column, and descending for second column. Then you are to retain the first duplicate. Be sure to have Null set to last for both (unless you want Null as a valid value one way or another).

Then merge using the key.

If you are doing this in the middle of a process, use a Transform stage to split up the records.

Extra fields? Send it one way or another, no problem.

-T.J.


* * *

... now if this can make breakfast, my life is complete.
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

Post by bigpoppa »

Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

If performance is critical for you, buildop will probably be your best bet. To create performant PX jobs, you should minimize the amount of times you sort and/or rehash your data. Buildop will take advantage of upstream sorts/hashes, and it introduces no sequential bottlenecks.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

quote:Originally posted by bigpoppa
[br]Well, just to corroborate. There is no 'first' functionality in the PX aggregator. To get 'firsts', I've always written a very simple buildop that takes in sorted, grouped input and spits out the first record in each group.

I don't understand why it is necessary to write a buildop when Filter Stage does the job. Can you elaborate on the benefits of using a buildop over Filter Stage?

-T.J.


* * *

... now if this can make breakfast, my life is complete.
Post Reply