sql query to ETL

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

sql query to ETL

Post by bobby »

Hi,
Client gave a requirment like this they wanna it do with ETL now
i have few quesions if u can help as they are using these 3 source table
FROM clause , my question is how can i proceed in ETL job to get this query .
Thanks
Bobby


INSERT INTO
dwadmin.FCT_REVENUE_SUMMARY_JAN_2004
(SELECT MAX(F.REVENUE_KEY),
DECODE(F.REVENUE_TYPE_KEY, 10000003, TRUNC(F.BILL_END_DT, 'MONTH'), TRUNC(POSTED_DT, 'MONTH')),
F.SUBSCRIBER_KEY,
F.SUBSCRIBER_DIM_KEY,
TRUNC(F.CYCLE_DT, 'MONTH'),
TRUNC(F.BILL_START_DT, 'MONTH'),
TRUNC(F.BILL_END_DT, 'MONTH'),
TRUNC(F.TRAN_DT, 'MONTH'),
TRUNC(F.POSTED_DT, 'MONTH'),
DECODE(SIGN(F.PPV_KEY), 1, TRUNC(F.POSTED_DT, 'MONTH'), TRUNC(F.BILL_END_DT, 'MONTH')),
DECODE(SIGN(F.PPV_KEY), 1, TRUNC(F.TRAN_DT, 'MONTH'), NULL),
F.PROGRAM_PACKAGE_DIM_KEY,
F.PROGRAM_PACKAGE_KEY,
F.SERVICE_DIM_KEY,
F.SERVICE_KEY,
DECODE(SIGN(F.PROGRAM_PACKAGE_KEY), 1, 'P',
DECODE(SIGN(F.PPV_KEY), 1, 'V',
DECODE(SIGN(F.SERVICE_KEY), 1, 'S', 'X'))),
DECODE(SIGN(F.PROGRAM_PACKAGE_KEY), 1, 'P' || F.PROGRAM_PACKAGE_KEY,
DECODE(SIGN(F.PPV_KEY), 1, 'V' || F.PPV_KEY,
DECODE(SIGN(F.SERVICE_KEY), 1, 'S' || F.SERVICE_KEY, NULL))),
F.PPV_KEY,
F.REVENUE_TYPE_KEY,
F.PARTIAL_BILLING_FLG,
SUM(F.BILLING_AMT),
F.SOURCE_OF_DATA,
S.REGION_KEY,
S.SUBSCRIBER_TYPE_KEY,
S.SUBSCRIBER_LANGUAGE_KEY,
S.VIP_KEY,
S.SUBSCRIBER_STATUS_KEY,
S.RURAL_URBAN_FLG,
S.PROMOTION_CAMP_KEY,
TRUNC(S.CREATE_DATE, 'MONTH'),
TRUNC(S.DISCONNECT_DT, 'MONTH'),
S.DISCONNECT_REASON_CD,
S.NO_OF_MTH_BILL_PERIOD,
S.DISCOUNT_CD,
TRUNC(SB.BILL_END_DT, 'MONTH'),
SB.SERVICE_DIM_KEY,
SB.SERVICE_KEY,
SB.A_LA_CARTE_LEVEL,
SUM(SB.BILLING_AMT),
SB.PROGRAM_PACKAGE_KEY,
SB.PROGRAM_PACKAGE_DIM_KEY,
SB.SUBS_PRODUCT,
0,
TRUNC(S.CONNECT_DT, 'MONTH') CONNECT_DT
FROM DWADMIN.FCT_ALL_REVENUE F,
DWADMIN.DIM_SUBSCRIBER S,
DWADMIN.DIM_SUBSCRIBER_BASIC SB
WHERE F.SUBSCRIBER_DIM_KEY = S.SUBSCRIBER_DIM_KEY AND
F.REVENUE_KEY = SB.REVENUE_KEY(+) AND
TO_CHAR(TRUNC(DECODE(F.REVENUE_TYPE_KEY, 10000003, F.BILL_END_DT, POSTED_DT), 'MONTH'),'YYYY-MM-DD HH24:MI:SS') = '2004-01-01 00:00:00'
GROUP BY DECODE(F.REVENUE_TYPE_KEY, 10000003, TRUNC(F.BILL_END_DT, 'MONTH'), TRUNC(POSTED_DT, 'MONTH')),
F.SUBSCRIBER_KEY,
F.SUBSCRIBER_DIM_KEY,
TRUNC(F.CYCLE_DT, 'MONTH'),
TRUNC(F.BILL_START_DT, 'MONTH'),
TRUNC(F.BILL_END_DT, 'MONTH'),
TRUNC(F.TRAN_DT, 'MONTH'),
TRUNC(F.POSTED_DT, 'MONTH'),
DECODE(SIGN(F.PPV_KEY), 1, TRUNC(F.POSTED_DT, 'MONTH'), TRUNC(F.BILL_END_DT, 'MONTH')),
DECODE(SIGN(F.PPV_KEY), 1, TRUNC(F.TRAN_DT, 'MONTH'), NULL),
F.PROGRAM_PACKAGE_DIM_KEY,
F.PROGRAM_PACKAGE_KEY,
F.SERVICE_DIM_KEY,
F.SERVICE_KEY,
DECODE(SIGN(F.PROGRAM_PACKAGE_KEY), 1, 'P',
DECODE(SIGN(F.PPV_KEY), 1, 'V',
DECODE(SIGN(F.SERVICE_KEY), 1, 'S', 'X'))),
DECODE(SIGN(F.PROGRAM_PACKAGE_KEY), 1, 'P' || F.PROGRAM_PACKAGE_KEY,
DECODE(SIGN(F.PPV_KEY), 1, 'V' || F.PPV_KEY,
DECODE(SIGN(F.SERVICE_KEY), 1, 'S' || F.SERVICE_KEY, NULL))),
F.PPV_KEY,
F.REVENUE_TYPE_KEY,
F.PARTIAL_BILLING_FLG,
F.SOURCE_OF_DATA,
S.REGION_KEY,
S.SUBSCRIBER_TYPE_KEY,
S.SUBSCRIBER_LANGUAGE_KEY,
S.VIP_KEY,
S.SUBSCRIBER_STATUS_KEY,
S.RURAL_URBAN_FLG,
S.PROMOTION_CAMP_KEY,
TRUNC(S.CREATE_DATE, 'MONTH'),
TRUNC(S.DISCONNECT_DT, 'MONTH'),
S.DISCONNECT_REASON_CD,
S.NO_OF_MTH_BILL_PERIOD,
S.DISCOUNT_CD,
TRUNC(SB.BILL_END_DT, 'MONTH'),
SB.SERVICE_DIM_KEY,
SB.SERVICE_KEY,
SB.A_LA_CARTE_LEVEL,
SB.PROGRAM_PACKAGE_KEY,
SB.PROGRAM_PACKAGE_DIM_KEY,
SB.SUBS_PRODUCT,
0,
TRUNC(S.CONNECT_DT, 'MONTH'));
commit;

update FCT_REVENUE_SUMMARY_JAN_2004
set subscriber_ct = 1
where revenue_key
in (
select max(revenue_key)
from FCT_REVENUE_SUMMARY_JAN_2004
group by
subscriber_key);
commit;
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

It is certainly do-able, but it will take awhile to code and verify you haven't overlooked something.

First of all I'd ask if they have any documentation on the original requirements from when this query was made. Technically you can just look through the query, but it would help to have an understanding of the original requirement here.

Also, if all they need is to get this automated within a datastage driven load of a system you can paste the query into a user-defined sql and make sure you have the same number of columns defined in the stage that the query outputs.

If you plan to break this up in DataStage there are many approaches and when looking at it briefly I would recommend this:

1) save the functions (func, sign, decode, etc...) for the transform stage.

2) this looks like you could select from DWADMIN.DIM_SUBSCRIBER S and
DWADMIN.DIM_SUBSCRIBER_BASIC SB individually and write them to hashed files. Then preload them to memory and join them in a transformer that selects from DWADMIN.FCT_ALL_REVENUE F. This should help performance although it depends of the database server, ds server, and network and provide some tracibility for debugging.

3) the group by could be done in aggregator or still in GROUP BY clauses, this tends to be vary by developer preference and in my experience sometime groups are faster in the queries and sometimes faster in aggregator stages.

4) I'd create a separate job for the insert and the update statement. Its a pain when you get so many jobs but breaking things out still seems to be the best practice.


hope this helps as a start...
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

Post by bobby »

Hi there,
They have no requirmnet all they gave SQL and want it be done in datastage,
i have few question
1)As u said user defined SQL i think that will be quick fix , so u mean inport everything in plugin metadata and run user define query,

2) Making 2 hash files but i think i have to evaulate more data to see business keys rite , i mean i have to make some model so that i can proceed.


So wht u suggest user define query will help more.
Thanks,
Bobby
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Put the SELECT statement in one stage, run the results through a Transformer stage (even if you do nothing to the data) and have the final stage perform the INSERT (or bulk load).

Create another job that updates the control job; similar design: SELECT in one stage, Transformer, update in final stage.

Run the jobs from a job sequence, possibly (if appropriate) making the second dependent upon successful completion of the first.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The quickest approach is to turn the SELECT clause of the insert statement into a database view. Then you can import the metadata definition of the view straight into DataStage and retrieve rows from it.

The long approach is rewrite it using DataStage functions. Transformation code such as DECODE and TRUNC would be removed from the select statement and replaced by derivation code in a DataStage transformer. This makes it easier to maintain and report on.

You can also remove the aggregation from the select clause and put it into a DataStage aggregation stage. You can remove your left outer join and make that table a lookup to a transformer stage. In your current SQL your left outer join gives you a lot of NULL fields, doing this join in a transformer lets you add more logic around the handling of missing fields.

With the quick approach you need someone proficient in Oracle and DataStage in order to make changes to the job, in the second approach you only need to be proficient in DataStage to maintain the job.
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

Post by bobby »

hI,
I imported table in plugin metadata just add all columns to my rows
and try to view the Data , Its taking forever,
plz let me know wht i am thinking now is
make 2 hash files of dim_subscriber and dim_subscriber_basic
and and then matcing that hash files with fct_all_revenue


could u pls guide,
while seeing at query what business keys i can join between dimension and fact table ,
Thanks,
Bobby
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The query will take forever, it contains a lot of decode statements, left outer joins and aggregation. It is placing the load for the transformation and matching on the RDBMS and doing it all in a single request. Moving these functions to DataStage moves the load onto the ETL server and performs them a row at a time.

In end to end processing the RDBMS approach may end up being faster, it all depends on the SQL optimisation and efficiency.

I like the approach of a single input stream with the bulk of the data, feed this to a transformer with two hash file lookups, and then write it out. The aggregation may need to be performed on the input stream and one or both of the hash file lookups. Once you remove the joins you may find aggregation is easier to leave in the RDBMS rather then a DataStage aggregation stage.
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Bobby

It depends on number of records you have in your tables. It won't happen immediate when you start running the job. Check how long it takes to run the job. And have your DBA with you in checking whether anything can be done to the query to run it fast either by adding index etc... Doing aggregation will also take time but at the same time it is better doing in database compared to Datastage Aggregator Stage. If you are dealing with big volume of records then go for the database aggregation.

Thanks
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: sql query to ETL

Post by peternolan9 »

Bobby,
let me guess, the client has written their ETL in PL/SQL + Oracle views etc??? If this is the case I've put my comment on this on my FAQ page.

http://www.peternolan.com/faqs.htm#A004


My advice to customers who have done this and want to 'upgrade' what they have done with DS?

1. Take out a blank piece of paper.
2. Start designing your ETL properly to take advantage of your ETL tool.

I have advised lots of people not to try and write their ETL in PL/SQL. And a good thing they all ignore me because it creates a lot of work for DS/INFA people to replace all that PL/SQL.. ;-)

bobby wrote:Hi,
Client gave a requirment like this they wanna it do with ETL now
i have few quesions if u can help as they are using these 3 source table
FROM clause , my question is how can i proceed in ETL job to get this query .
Thanks
Bobby
Best Regards
Peter Nolan
www.peternolan.com
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I too give the "blank piece of paper" advice, but alas it seems that most of the folks are deaf. Or just want to play with their shiny new DS toy!

Anyway, I can make a living doing it right the second time! :?

Folks, it's GREAT advice!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

Post by bobby »

The quickest approach is to turn the SELECT clause of the insert statement into a database view. Then you can import the metadata definition of the view straight into DataStage and retrieve rows from it

I TRY this in TOAD and datasatge select statement

DECODE(F.REVENUE_TYPE_KEY, 10000003, TRUNC(F.BILL_END_DT, 'MONTH'), TRUNC(POSTED_DT, 'MONTH')),

toad gives me result
datastage result sumthing else
can u plz advice
thanks
Bobby
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Look carefully on the SQL tab to see whether there is any difference at all between the SQL in DataStage and that you're using in TOAD.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bobby
Participant
Posts: 86
Joined: Mon Jul 19, 2004 8:31 pm

Post by bobby »

Hi
TRUNC(F.CYCLE_DT, 'MONTH'),
gives like 01/02/04 in TOAD

In datastage Timestamp doesnot help ,i have to do varchar and it gives
me date like 01-feb-04 (Timestamp gives error attempt to convert string value unsuceesful, but time is in 01/02/04 in source)
plz Advice,
Regards
Bobby
Post Reply