Hi,
Trying to implement following scenario in my DS job:
Field1 Date1 Date2
------ ------ -------
F1 D11 NULL
F1 D12 D11
F1 D13 D12
F1 D14 D13
F2 D21 NULL
F2 D22 D21
F3 D31 NULL
For given Field1, the Date2 is to be populated based on Date1 field.
I have more than 13million records in my source. I'm joining this source table with 6 dimension tables(some of them have 3million records).
For above requirement, I have used oracle join and lead function. The oracle join is OK interms of performance, but lead function taking longer time.
Please advice if any one has idea to improve the performance.
Is there any DS functionality (if any) to solve this problem?
Thanks
how to implement oracle lead function in datastage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 30
- Joined: Sun Apr 06, 2008 9:58 pm
-
- Premium Member
- Posts: 232
- Joined: Fri Aug 04, 2006 1:20 am
- Location: Bangalore
Could You please expalin, how you want your output? You can always do it by using Stage Variables. For ex, if the following is your requirement, i will order by salary in descending and do the calculations in the stage variables
Code: Select all
The following example uses the LEAD function to return the salary from the next row, and to calulate the difference between the salary of the current row and the following row.
SELECT empno,
ename,
job,
sal,
LEAD(sal, 1, 0) OVER (ORDER BY sal) AS sal_next,
LEAD(sal, 1, 0) OVER (ORDER BY sal) - sal AS sal_diff
FROM emp;
EMPNO ENAME JOB SAL SAL_NEXT SAL_DIFF
---------- ---------- --------- ---------- ---------- ----------
7369 SMITH CLERK 800 950 150
7900 JAMES CLERK 950 1100 150
7876 ADAMS CLERK 1100 1250 150
7521 WARD SALESMAN 1250 1250 0
7654 MARTIN SALESMAN 1250 1300 50
7934 MILLER CLERK 1300 1500 200
7844 TURNER SALESMAN 1500 1600 100
7499 ALLEN SALESMAN 1600 2450 850
7782 CLARK MANAGER 2450 2850 400
7698 BLAKE MANAGER 2850 2975 125
7566 JONES MANAGER 2975 3000 25
7788 SCOTT ANALYST 3000 3000 0
7902 FORD ANALYST 3000 5000 2000
7839 KING PRESIDENT 5000 0 -5000
-
- Participant
- Posts: 30
- Joined: Sun Apr 06, 2008 9:58 pm
Hi DS_SUPPORT,
Our requirement is exactly similar to SAL_NEXT in your example.
Please advice whether the usage of stage variable will affect the performance, because it sounds like the system will process all records in sequence mode (I may be wrong as I have only few weeks experience in DS). Please clarify. My source has more than 13 million records.
Thanks for support.
Our requirement is exactly similar to SAL_NEXT in your example.
Please advice whether the usage of stage variable will affect the performance, because it sounds like the system will process all records in sequence mode (I may be wrong as I have only few weeks experience in DS). Please clarify. My source has more than 13 million records.
Thanks for support.
-
- Participant
- Posts: 30
- Joined: Sun Apr 06, 2008 9:58 pm
Hi Chulett,
In the below example, Date2 field is calculated as follows:
Select F1,Date1,
lead(Date1,1) over(partition by Field1 order by Date1) Date2
This query will give values as:
Field1 Date1 Date2
------ ------ -------
F1 D11 NULL
F1 D12 D11
F1 D13 D12
F1 D14 D13
F2 D21 NULL
F2 D22 D21
F3 D31 NULL
Basically the field "Date2" is derived from "Field1" and previous value of "Date1".
My doubt is that if the stage variable is used, how Datastage will process the records.... is it in the row level (row by row) or table level (all rows rows in parallel) ?
Pls let me know if you have any queries.
Thanks. [/list]
In the below example, Date2 field is calculated as follows:
Select F1,Date1,
lead(Date1,1) over(partition by Field1 order by Date1) Date2
This query will give values as:
Field1 Date1 Date2
------ ------ -------
F1 D11 NULL
F1 D12 D11
F1 D13 D12
F1 D14 D13
F2 D21 NULL
F2 D22 D21
F3 D31 NULL
Basically the field "Date2" is derived from "Field1" and previous value of "Date1".
My doubt is that if the stage variable is used, how Datastage will process the records.... is it in the row level (row by row) or table level (all rows rows in parallel) ?
Pls let me know if you have any queries.
Thanks. [/list]
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ok, so actually no input examples, just output with the lead function. Fine. This is just bog standard 'group change detection' done with stage variables. As noted, you can't look ahead but you can easily look 'behind' and stash values from previous records.
So you'll need variables to save the current 'group' which is your Field1 and for the previous value of Date1. When the group changes, set Date2 to null otherwise while you are in the same group use the saved value of Date1 from the previous record for Date2.
It is important to note that this is possible because stage variables are evaluated in order and row by row. This is key, especially the former. And IF the 'lead' function in your source query is indeed the source of your slowness, this will alleviate that.
So you'll need variables to save the current 'group' which is your Field1 and for the previous value of Date1. When the group changes, set Date2 to null otherwise while you are in the same group use the saved value of Date1 from the previous record for Date2.
It is important to note that this is possible because stage variables are evaluated in order and row by row. This is key, especially the former. And IF the 'lead' function in your source query is indeed the source of your slowness, this will alleviate that.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers