Extracting data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sachin.tilloo
Participant
Posts: 17
Joined: Wed Mar 07, 2007 12:54 am

Extracting data

Post by Sachin.tilloo »

Hi
i have devloped application in that i am extracting data from one table and creating the dat file. i am using datastage7.1 server edition . i am extracting the 90 million records but it is taking 18 hours for extraction i have also applied parallism to the query but only 1 hr reduced in that.
one more thing i am extracting it partition wise so please tell me how can i improve the performance for extractin. what changes i have to make in my job?

thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not enough information. What kind of database? How are you extracting? How are you "introducing parallelism"? (We don't want to advise anything you've already tried.) Do you have Enterprise Edition?

Remember that, as far as the database server is concerned, a running DataStage job is just another client application.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sachin.tilloo
Participant
Posts: 17
Joined: Wed Mar 07, 2007 12:54 am

Post by Sachin.tilloo »

ray.wurlod wrote:Not enough information. What kind of database? How are you extracting? How are you "introducing parallelism"? (We don't want to advise anything you've already tried.) Do you have Enterprise Editi ...
Hi
We r using oracle 9i database as i have stated 90 million records we r extracting the record we are extracting on basis of date wise there is day wise partion in table . and parallelism we have introduced in query itself
for extracting by giving 4 parallelism in query.
so please tell me how can i increase the extraction rate by datastage side not by oracle side.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Put parallelism into the DataStage job. That's why I asked about Enterprise Edition. However it's also possible in server jobs - you can have multiple connections to the database through the same stage, each selecting a subset of the rows (simple example, odds and evens on a two-way split, or day of the week for a seven way split). Keep the parallelism going through your job, writing to seven separate text files, and concatenate them (using cat command) afterwards. Instead of designing parallel streams in one job you could use multi-instance job and parameterize the constants in the WHERE clause.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sachin.tilloo
Participant
Posts: 17
Joined: Wed Mar 07, 2007 12:54 am

Post by Sachin.tilloo »

ray.wurlod wrote:Put parallelism into the DataStage job. That's why I asked about Enterprise Edition. However it's also possible in server jobs - you can have multiple connections to the database through the same st ...
i mean to say i have given parallelism in query we dont have parallel edition.
i ll give u the qyery

SELECT/*+parallel(A,8)*/
SUBSCRIBER_ID,
CALL_DIRECTION,
CALLING_NUMBER,
CALLED_NUMBER,
TO_CHAR(START_DATE, 'YYYY-MM-DD'),
START_TIME_HOUR,
START_TIME_MIN,
START_TIME_SEC,
CALL_TYPE,
DISTANCE_BAND,
CHARGEABLE_TIME,
COS,
MSC_ID,
CELL_ID,
PROMO_BAL_1,
PROMO_BAL_1_CHARGE,
PROMO_BAL_2,
PROMO_BAL_2_CHARGE,
CORE_BAL,CO
RE_BAL_CHARGE,
END_BALANCE,
BALANCE4_CHARGE,
BALANCE5_CHARGE,
BALANCE6_CHARGE,
BALANCE7_CHARGE,
BALANCE8_CHARGE,
BALANCE9_CHARGE,
BALANCE10_CHARGE,
CHARGEABLE_AMOUNT,
HOME_ZONE,
TO_CHAR(RECORD_DATE, 'YYYY-MM-DD HH24:MI:SS') FROM DEVDBA.R_PPS_CDR partition(P_PPS_CDR_20070203) A;


in this only one hour is reduce while doing extraction .
i want to tune the performance through datastage side.
Abburi
Participant
Posts: 31
Joined: Tue May 29, 2007 12:38 pm

Post by Abburi »

Sachin.tilloo wrote:
ray.wurlod wrote:Put parallelism into the DataStage job. That's why I asked about Enterprise Edition. However it's also possible in server jobs - you can have multiple connections to the database through the same st ...
i mean to say i have given parallelism in query we dont have parallel edition.
i ll give u the qyery

SELECT/*+parallel(A,8)*/
SUBSCRIBER_ID,
CALL_DIRECTION,
CALLING_NUMBER,
CALLED_NUMBER,
TO_CHAR(START_DATE, 'YYYY-MM-DD'),
START_TIME_HOUR,
START_TIME_MIN,
START_TIME_SEC,
CALL_TYPE,
DISTANCE_BAND,
CHARGEABLE_TIME,
COS,
MSC_ID,
CELL_ID,
PROMO_BAL_1,
PROMO_BAL_1_CHARGE,
PROMO_BAL_2,
PROMO_BAL_2_CHARGE,
CORE_BAL,CO
RE_BAL_CHARGE,
END_BALANCE,
BALANCE4_CHARGE,
BALANCE5_CHARGE,
BALANCE6_CHARGE,
BALANCE7_CHARGE,
BALANCE8_CHARGE,
BALANCE9_CHARGE,
BALANCE10_CHARGE,
CHARGEABLE_AMOUNT,
HOME_ZONE,
TO_CHAR(RECORD_DATE, 'YYYY-MM-DD HH24:MI:SS') FROM DEVDBA.R_PPS_CDR partition(P_PPS_CDR_20070203) A;


in this only one hour is reduce while doing extraction .
i want to tune the performance through datastage side.
Hi,

Check these points :

1) Check indexes on table DEVDBA.R_PPS_CDR.

2) Increase no of Transformers, distribute the data among these.

3) After second point collect them with a Link_Collector.

4) Load the data.

This will reduce the upload time :)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The best improvement will be obtained with multiple simultaneous SELECT queries, as I already said. The PARALLEL hint is not helping much since the result set is still being delivered sequentially. Read what I said earlier, this time more carefully.

Don't bother with a Link Partitioner - use parallel streams from the extraction phase. The Link Partitioner still uses a single input stream.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sachin.tilloo
Participant
Posts: 17
Joined: Wed Mar 07, 2007 12:54 am

Post by Sachin.tilloo »

Thanx for ur ge8 support . Expecting same in future
thanks again.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then please accept this gratuitous advice. This is a forum for professionals. We prefer - and, indeed, expect - a professional level of communication. Do you use SMS-style abbreviations in your work documentation? No? Then why do so here? You can write good English, please do so. If nothing else, it does help those for whom English is not a first language.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply