Page 1 of 1
Select First Record only
Posted: Mon Feb 27, 2006 12:26 pm
by srinagesh
Hi !
My Source File is as follows
Code: Select all
Site ReplenishmentDate ReplenishmentQty
--------------------------------------------------------
A 12-march-06 1200
B 15-march-06 1500
A 17-march-06 2200
C 18-march-06 1600
C 19-march-06 1700
I need to take first records only (based on Site)
ie; My output should be
Code: Select all
Site ReplenishmentDate ReplenishmentQty
--------------------------------------------------------
A 12-march-06 1200
B 15-march-06 1500
C 18-march-06 1600
If I use a hash file, with key as Site.. I am getting the Last record (as shown below)
Code: Select all
Site ReplenishmentDate ReplenishmentQty
--------------------------------------------------------
B 15-march-06 1500
A 17-march-06 2200
C 19-march-06 1700
Please suggest me a way of acheiving this.
PS: Storing the data into a temporary database table with PrimaryKey as site and logic as "truncate and insert" is not an option. The requirement is such that we can use database at this stage.
Regards
Nagesh
Posted: Mon Feb 27, 2006 12:40 pm
by kumar_s
Hi,
There are several way of acheiving this. Pls do search the forum.
You are grouping based on the Site and getting the max of rest of the columns.
One of the way would be to add another column to flag out the change in group.
Set 2 stage variable in transformer like
Change = If Previous_col = Input_link.Site Then 0 Else 1
Previous_col = Input_link.Site
Assign Change to a column say Flag.
Pass the sorted data to the transformer and select only the records for which Flag =1.
Posted: Mon Feb 27, 2006 12:43 pm
by kcbland
Change your date to ISO format YYYY-MM-DD, pass thru a Sort stage and sort descending on site, date, then write to hash file. Last row into hash file with just site as key will be the oldest date for that site.
Posted: Mon Feb 27, 2006 5:48 pm
by rasi
If you are reading from database then do the order by in database SQL and then use a stage variable to hold the last record and compare with the current record. If it is same then do not send to output.
Posted: Mon Feb 27, 2006 6:39 pm
by vmcburney
Re: Select First Record only
Posted: Mon Feb 27, 2006 7:13 pm
by chowdary
Nagesh
First Sort the data on that particular column using sort stage and then in the transformer you can also use Transform function RowProcCompareWithPreviousValue to ignore duplicate values.
The routine will check the supplied value with the previous value.
If its same, then 1 is returned else 0 is returned. you can use this routing in the Stage variable.
I think this helps
Chowdary
[/b]
Posted: Tue Feb 28, 2006 4:15 am
by srinagesh
Hi !
I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.
ie., If the file is
Code: Select all
Site Date Qty
------------------------------------
A 2006-02-17 1300
A 2006-02-16 1500
A 2006-02-18 1200
Then the output should be
Code: Select all
Site Date Qty
------------------------------------
A 2006-02-17 1300
For this reason I cant use any sort or any other feature.
Please suggest me an alternative
Posted: Tue Feb 28, 2006 4:38 am
by Poornagirija
hi srinagesh, :D
Use aggregator stage along with Xfm.
Make Site(field) as group
for the rest of columns make derivation as FIRST.
Hope that will give your answer
![Confused :?](./images/smilies/icon_confused.gif)
Posted: Tue Feb 28, 2006 6:32 am
by loveojha2
You can achieve it through Stage Variables and Constraints
Create Two Stage Variables
Code: Select all
Prev_Key: Derivation Curr_Key (Initiallize with a value say '-1')
Curr_Key: Derivation Src.Site
Constraint for the Output Link
Hope this helps.
Posted: Tue Feb 28, 2006 8:03 am
by srinagesh
Thanx Guys,
It works.
-Nagesh
Posted: Wed Mar 01, 2006 12:00 am
by djm
Belatedly throwing in an approach using UNIX ...
Code: Select all
sort -s -k1,1 yourfile |
awk '
BEGIN \
{
prev_key = "";
}
( $1 != prev_key ) \
{
print $0;
}
{
prev_key = $1;
}
' > firstrowforkey
Caveats:
1) requires your sort command to have the "-s" option
2) requires my off-the-cuff awk script to be correct.
David
Posted: Wed Mar 01, 2006 2:08 am
by djm
And if your flavour of unix does not have "-s" for stable sorting, you may be able to use "sort -k1,1 -u" instead. According to the man pages I have seen, this will suppress duplicates based solely on the key field. Unfortunately, the man page for sort gives no indication which of the lines will be output (e.g. always the first, always the last, always random).
Unscientific testing on a far too small a sample set yielded a behaviour of the first instance being output - your actual mileage may vary! Does anyone know definitively?
David.
Posted: Wed Mar 01, 2006 4:52 pm
by gateleys
srinagesh wrote:Hi !
I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.
ie., If the file is
Code: Select all
Site Date Qty
------------------------------------
A 2006-02-17 1300
A 2006-02-16 1500
A 2006-02-18 1200
Then the output should be
Code: Select all
Site Date Qty
------------------------------------
A 2006-02-17 1300
For this reason I cant use any sort or any other feature.
Please suggest me an alternative
Use Aggregator. Group by Site. For Date and Qty, use the Derivation 'First'.
gateleys