Aggregator Grouping

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Aggregator Grouping

Post by JezT »

I am trying to aggregate some data from a source file containing records of telephone calls made on a certain day. When I pass the data into the aggregator, I want to group by the following fields in this order:

Date, Site, Call Application Number, Quarter Hour Slot, Half Hour Slot.

It seems to work fine with Date and Site but the problems begin with Call Application Number. It works so far in that all Applications beginning with a 1 come before any beginning with a 2. However, within all the ones beginning with a 1, there is no order to them. In other words, 110 may come before 1 and if there are multiple instances of the same number, they are not always grouped together. Is this because the Aggregator only looks at the first digit and if so, how can I change it to check the entire record before doing the grouping ?

Following in on from that, I have a CALL_OFFD column which needs to be totalled for each Quarter Hour, for each Application for each Site on the specified day. So therefore there should be 96 rows for each day (1 row for each Quarter hour slot during the day). I would assume therefore that if I SUM this column that would do the trick, but it doesn't. Not sure if this is the SUM function that isn't working or whether it is down to the Grouping not working.

Any suggestions ?
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

Is "Call Application Number" defined as an integer? It might be the metadata that causes the ORDER BY to be handled differently.

with regards to the second issue, test the SUM on something different. If the SUM works there, then the SUM function works; the issue is then somewhere else.
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As mentioned, you probably have the Call Number defined as a untrimmed string and you are seeing perfectly normal string sorting. You would need it to be numeric to get the sort order you are looking for.

As for the sum problem, you can't have the Half Hour Slot come after the Quarter Hour Slot in the grouping and drop out subtotals by Quarter Hour. You'd need to flip the order of the last two fields to get what you want... either that or drop the Half Hour Slot from the grouping.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Post by JezT »

Just in answer to your first point, the Call Application Number is an Integer.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's... interesting. We'd need to see specific examples of your data and the output order that you are seeing. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Post by JezT »

The fields in the file I am trying to sort appear as follows:

Code: Select all

110,21,11
110,5,3
1,1,1
1,1,1
111,45,23
1,17,9
1,25,13
1,25,13
1,25,13
12,69,35
1,29,15
1,33,17
137,45,23
1,53,27
1,65,33
1,69,35
1,77,39
1,85,43
1,89,43
1,9,5
204,69,35
207,77,39
307,45,23
307,9,5
36,5,3
62,61,31
The sort command I have put onto the sequential file is as follows:

Code: Select all

sh -c "sort -n -k 3"
As you can see from the list, it only appears to sort by the first character for all 3 columns. Just for clarification purposes, these are columns 3, 4 and 5 in the file.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? I think you are going to need to adjust your 'sort' command. Since you don't specify a delimiting character, 'space' is used by default to count the fields. It looks to me what you are getting is a sort where it is considering all three of your fields as one long field, especially with the inclusion of the -n option.

Try this and see how it works:

Code: Select all

sh -c "sort -n -t, -k 3,5"
If you want to sort on all three fields, that is - otherwise change the '-k 3,5' option to '-k 3,3' to sort only on the third field. In any case, I believe the key is to tell it via the -t option to use a comma as the field seperator.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JezT
Participant
Posts: 75
Joined: Wed Sep 15, 2004 1:52 am

Post by JezT »

Thanks for that.

Tried the first option and it sorted most of the rows correctly but there were still some out of place. But when I tried the second option, it sorted them all correctly

Cheers

:D
Post Reply