Logic help

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Logic help

Post by vamsi.4a6 »

The source is oracle

id name St_Dt ed_Dt
-- --- ---- -----
1 a 10-jan-10 10-aug-11
1 a 11-jan-10 15-aug-11
2 b 11-jan-11 Null
2 b 12-feb-11 Null

O/p
--

id name St_Dt ed_Dt
-- --- ---- -----
1 a 10-jan-10 15-aug-11
2 b 12-feb-11 Null



For a particular id i need to fetch the record from source which is having maximum ed_dt,if it is null for all the records for particular key then i need to fetch the record which is having maximum St_Dt for that key
Last edited by vamsi.4a6 on Tue Apr 03, 2012 3:51 am, edited 3 times in total.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Try using Oracle's COALESCE() function in your SELECT statement.
Choose a job you love, and you will never have to work a day in your life. - Confucius
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

I want implement through Datastage

1)extracted fields from source
2)In sort stage
Hashparition on id with both sorting and partition
ed_Dt with only sorting in desc
key column for stage is ed_Dt with desc

If it is null means then i need to fetch based upon St_Dt.How to achieve this logic through Datastage?
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

Could you put through a Remove Duplicates stage. Sort and partition on The key field (ID). Then Sort by End_Dt Desc then by Start_Date Desc, then select to retain the first record.

I believe that should work.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Perhaps an Aggregator with min(start dt) and max(end date).
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Re: Logic help

Post by qt_ky »

vamsi.4a6 wrote:The source is oracle
I guess the Oracle tidbit is irrelevant.

Not sure if this helps or not.... In sorting you can set the Nulls position so nulls will sort first or last.
Choose a job you love, and you will never have to work a day in your life. - Confucius
mobashshar
Participant
Posts: 91
Joined: Wed Apr 20, 2005 7:59 pm
Location: U.S.

Post by mobashshar »

Do this.
1. Read Oracle source.
2. Use Copy stage to split the rows in Two links as St_Dt and Ed_Dt.
3. Use Remove Duplicate Stage. Sort and Partition on ID and Sort only on St_Dt with Ascending and keep First Row.
4. Use Remove Duplicate Stage. Sort and Partition on ID and Sort only on Ed_Dt with Ascending and keep Last Row.
5. Use Join Stage to join the rows from St_Dt and Ed_Dt link.
You will get the desired result.
Hope this helps.
srinivas.g
Participant
Posts: 251
Joined: Mon Jun 09, 2008 5:52 am

Post by srinivas.g »

seperate the data using filter stage. first link is having ed_date not null records and second link is having ed_date null records after that use remove duplicate stage(first link) sort desc based on ed_date and keep first and same do it for second link sort based on st_date then use funnel to combine the both the links.
hope this will help you.
Srinu Gadipudi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Craig's solution is the most elegant offered. Group by id, min start_date, max end_date.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Re: Logic help

Post by Kryt0n »

vamsi.4a6 wrote:The source is oracle

id name St_Dt ed_Dt
-- --- ---- -----
1 a 10-jan-10 10-aug-11
1 b 11-jan-10 15-aug-11
2 a 11-jan-11 Null
2 b 12-feb-11 Null

O/p
--

id name St_Dt ed_Dt
-- --- ---- -----
1 a 10-jan-10 15-aug-11
2 b 12-feb-11 Null



For a particular id i need to fetch the record from source which is having maximum ed_dt,if it is null then i need to fetch the record based on St_Dt
Your output doesn't match your rule, if you wanted the record the with max end date, you would have

Code: Select all

1   b     11-jan-10  15-aug-11
and not

Code: Select all

1   a     10-jan-10  15-aug-11
Sorting by end date desc with nulls first and start date desc then remove duplicates, keeping first would meet the rule you stated. Keeping as per your output example would require a combination of aggregator and something that determines which "name" you keep
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Good catch.. the requirement itself is not clear ;)
Kandy
_________________
Try and Try again…You will succeed atlast!!
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Re: Logic help

Post by vamsi.4a6 »

Thanks for pointing the issue and please check the input and output.Now i hope it is correct.Can you please tell me know how to implement the above requirement
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

All you changed were the 'a' and 'b', so (to me anyway) that doesn't really change anything as I assumed they were wrong in the first place. However, you really haven't done anything to clarify your requirement.

"if it is null then i need to fetch the record based on St_Dt"

No idea what that means. We can see that for a 'normal' set you want the earliest start date with the latest end date, correct? However, when the end date is null it looks like you want the latest start date for that group, is that true?
-craig

"You can never have too many knives" -- Logan Nine Fingers
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

For a particulat key(id and name) i need to select the record which is having maximum start date and if it(start date) is null for all the records for that key i need to select based on end date

For better understanding i am explaining my requirement with two following examples

For ex-1

In this case St_Dt is not all for all the records for the key(1 and a)
id name St_Dt ed_Dt
-- --- ---- -----
1 a 10-jan-10 10-aug-11
1 a 11-jan-10 15-aug-11
1 a null 10-aug-11

o/p
--
id name St_Dt ed_Dt
-- --- ---- -----
1 a 11-jan-10 15-aug-11

For ex-2

id name St_Dt ed_Dt
-- --- ---- -----
2 b null 10-aug-11
2 b null 15-aug-11
2 b null 11-aug-11

here (start date) is null for all the records for the key ( 2 and b)so i need the fetch the record which is having maximum ed_dt

o/p
--
id name St_Dt ed_Dt
-- --- ---- -----
2 b null 15-aug-11
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

vamsi.4a6 wrote:For a particulat key(id and name) i need to select the record which is having maximum start date and if it(start date) is null for all the records for that key i need to select based on end date
Suddenly, the direction & nullability of the dates have reversed. :?

Now we're just at an Aggregator with max() on each date.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply