Few Questions with respect to Data Stage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
SMKraj
Participant
Posts: 7
Joined: Mon Oct 20, 2003 10:59 pm

Few Questions with respect to Data Stage

Post by SMKraj »

Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!! :oops:

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???. :?:

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.
Thnxs
Mahesh
raju_chvr
Premium Member
Premium Member
Posts: 165
Joined: Sat Sep 27, 2003 9:19 am
Location: USA

Re: Few Questions with respect to Data Stage

Post by raju_chvr »

Please give us more details abt ur environment. but one thing I am sure which u might have noticed is that if you have one processor ur parallel jobs will run slower when compared to Serial jobs because of time sharing..

more details like if ur using Parallel Extender will certainly help..
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Few Questions with respect to Data Stage

Post by ray.wurlod »

SMKraj wrote:Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!! :oops:

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???. :?:

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.
In the before-job subroutine (which you find in the Job Properties window) chooses ExecSH as the name of the routine. This will execute the command that is in the Input Value field.

In the Input Value field put a command to cat the six files, directing the output into a seventh file that DataStage can use. For example:

Code: Select all

cat file1 file2 file3 file4 file5 file6 > file7
Another way, if you're using DS 6.x or later, would be to do a similar thing using the filter capability of a Sequential File stage; first run the command

Code: Select all

cat file1 file2 file3 file4 file5 file6 
as the filter command; the Sequential File stage will read stdout from this command, which will obviate the need for file7.

As to the performance question, are you measuring DataStage's loading of the data file(s) for the DB2 bulk loader, or the performance of the DB2 bulk loader itself? This was not clear in your question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SMKraj
Participant
Posts: 7
Joined: Mon Oct 20, 2003 10:59 pm

Re: Few Questions with respect to Data Stage

Post by SMKraj »

Hi Ray,

Thanks for all of your comments!

I am using Before Routine to Concatenate the Files but the problem is its giving me an error message as follows

***********

UDBBulkLoadJob..BeforeJob (ExecSH): Error when executing command: Cat Stage_Party_Varchar.txt Stage_Party_Integer.txt Stage_Party_Date.txt > Stage_data.txt
*** Output from command was: ***
SH: Cat: not found

************

Please send me your comments!!!!! :oops:

What is the performance issue, When I directly load data into the UDB database and when I use a Seuential Stage to get the outputed data and then use the UDB Bulk load Stage to load the data?

How to improve the performance of an ETL Job? or what are the criterias which are needed to keep in mind when Designing,Developing and Testing an ETL Job using Data Stage v6?What are the performance issues which can be applied to help the performance of an ETL job?

What are the possible testing tools which can be used when using Data Stage?

What are Containers?(Just a short note)

Please let me know these details.
ray.wurlod wrote:
SMKraj wrote:Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!! :oops:

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???. :?:

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.
In the before-job subroutine (which you find in the Job Properties window) chooses ExecSH as the name of the routine. This will execute the command that is in the Input Value field.

In the Input Value field put a command to cat the six files, directing the output into a seventh file that DataStage can use. For example:

Code: Select all

cat file1 file2 file3 file4 file5 file6 > file7
Another way, if you're using DS 6.x or later, would be to do a similar thing using the filter capability of a Sequential File stage; first run the command

Code: Select all

cat file1 file2 file3 file4 file5 file6 
as the filter command; the Sequential File stage will read stdout from this command, which will obviate the need for file7.

As to the performance question, are you measuring DataStage's loading of the data file(s) for the DB2 bulk loader, or the performance of the DB2 bulk loader itself? This was not clear in your question.
Thnxs
Mahesh
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
it seems to me that your sh can't find the cat or did you write Cat?

the cat must be lower case.

in case that the shell DS uses is not setup properly it might be missing the correct PATH setting to find the cat utility ( check with your SYS Admin)
this usually happens when shell defaults for users are set to tcsh or kshell where DS uses the regular shell or one of them.

IHTH (I Hope This Helps)
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
SMKraj
Participant
Posts: 7
Joined: Mon Oct 20, 2003 10:59 pm

Post by SMKraj »

Hi,

Thnxs for ur timely help!!.

I have the following questions!

What is the performance issue, When I directly load data into the UDB database and when I use a Seuential Stage to get the outputed data and then use the UDB Bulk load Stage to load the data?

How to improve the performance of an ETL Job? or what are the criterias which are needed to keep in mind when Designing,Developing and Testing an ETL Job using Data Stage v6?What are the performance issues which can be applied to help the performance of an ETL job?

What are the possible testing tools which can be used when using Data Stage?

What are Containers?(Just a short note)

Please let me know these details.


roy wrote:Hi,
it seems to me that your sh can't find the cat or did you write Cat?

the cat must be lower case.

in case that the shell DS uses is not setup properly it might be missing the correct PATH setting to find the cat utility ( check with your SYS Admin)
this usually happens when shell defaults for users are set to tcsh or kshell where DS uses the regular shell or one of them.

IHTH (I Hope This Helps)
Thnxs
Mahesh
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
I'm no a PX expert but some tihngs can't be done in reg PX jobs that work in regular server jobs and need to be encapsulated in containers (or shared containers)

containers are as their name implies holders of a piece of design that can be used in several jobs using the same,a already written design.

about performance enhancments it depends if your doing regular server jobs or PX jobs since they work differently.
I guess others here can elaborate better then me,
in short, generally bulk loads work faster then regular load stages, but not allways meet your needs
network trafic and memory to be taken in concideration et cetera
and plenty more things.

you probably need get consultant opinion regarding your specific configuration.

IHTI (I Hope This Helps)
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Mahesh,

Your most recent round of questions suggests that you are in need of the DataStage Essentlals class (DS314Svr or DS314PX). As a general rule the Sequential File stage is the fastest for writing. Using the DB2 bulk loader stage, though, gives you the means automatically to invoke the bulk loader and perform a few other tasks around the edges. You trade raw speed for flexibility. The fundamental principle of job design is "keep it simple". Build a series of simple jobs controlled by a sequence (or job control code) rather than one complex job. Troubleshooting tools include the DataStage debugger, stage tracing, the data browser, server side tracing, conditionally compiled statements in Routines, and so on. Testing primarily involves creating test data that include illegal values, out of range values, boundary values and the like, and determining whether actual output corresponds with expected output. A container is a stage type that can encapsulate a subset of the components comprising a job design.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SMKraj
Participant
Posts: 7
Joined: Mon Oct 20, 2003 10:59 pm

Post by SMKraj »

All,

Thank you very much for all your responses. :wink:



ray.wurlod wrote:Mahesh,

Your most recent round of questions suggests that you are in need of the DataStage Essentlals class (DS314Svr or DS314PX). As a general rule the Sequential File stage is the fastest for writing. Using the DB2 bulk loader stage, though, gives you the means automatically to invoke the bulk loader and perform a few other tasks around the edges. You trade raw speed for flexibility. The fundamental principle of job design is "keep it simple". Build a series of simple jobs controlled by a sequence (or job control code) rather than one complex job. Troubleshooting tools include the DataStage debugger, stage tracing, the data browser, server side tracing, conditionally compiled statements in Routines, and so on. Testing primarily involves creating test data that include illegal values, out of range values, boundary values and the like, and determining whether actual output corresponds with expected output. A container is a stage type that can encapsulate a subset of the components comprising a job design.
Thnxs
Mahesh
Post Reply