Few Questions with respect to Data Stage

SMKraj · Post by **SMKraj** » Mon Oct 27, 2003 5:17 am

Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!!

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???.

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.

raju_chvr · Post by **raju_chvr** » Mon Oct 27, 2003 8:25 am

Please give us more details abt ur environment. but one thing I am sure which u might have noticed is that if you have one processor ur parallel jobs will run slower when compared to Serial jobs because of time sharing..

more details like if ur using Parallel Extender will certainly help..

ray.wurlod · Post by **ray.wurlod** » Mon Oct 27, 2003 4:10 pm

SMKraj wrote:Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!!

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???.

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.

In the before-job subroutine (which you find in the Job Properties window) chooses ExecSH as the name of the routine. This will execute the command that is in the Input Value field.

In the Input Value field put a command to cat the six files, directing the output into a seventh file that DataStage can use. For example:

Code: Select all

cat file1 file2 file3 file4 file5 file6 > file7

Another way, if you're using DS 6.x or later, would be to do a similar thing using the filter capability of a Sequential File stage; first run the command

Code: Select all

cat file1 file2 file3 file4 file5 file6

as the filter command; the Sequential File stage will read stdout from this command, which will obviate the need for file7.

As to the performance question, are you measuring DataStage's loading of the data file(s) for the DB2 bulk loader, or the performance of the DB2 bulk loader itself? This was not clear in your question.

SMKraj · Post by **SMKraj** » Mon Oct 27, 2003 10:50 pm

Hi Ray,

Thanks for all of your comments!

I am using Before Routine to Concatenate the Files but the problem is its giving me an error message as follows

***********

UDBBulkLoadJob..BeforeJob (ExecSH): Error when executing command: Cat Stage_Party_Varchar.txt Stage_Party_Integer.txt Stage_Party_Date.txt > Stage_data.txt
*** Output from command was: ***
SH: Cat: not found

************

Please send me your comments!!!!!

What is the performance issue, When I directly load data into the UDB database and when I use a Seuential Stage to get the outputed data and then use the UDB Bulk load Stage to load the data?

How to improve the performance of an ETL Job? or what are the criterias which are needed to keep in mind when Designing,Developing and Testing an ETL Job using Data Stage v6?What are the performance issues which can be applied to help the performance of an ETL job?

What are the possible testing tools which can be used when using Data Stage?

What are Containers?(Just a short note)

Please let me know these details.

ray.wurlod wrote:
SMKraj wrote:Hi,

I have the following questions:-

1. Currently I am doing a Performance Testing in a Job which inserts data into the UDB DB2 Stage serially as well as parallelly.The Question is When I load data serially the performance is faster but when I use the parallel laoding the performance slow compared to the serial load!

I have increased the Array Size and the Transaction Limit in the UDB DB2 Stage.

Please advise!!!!!!

2. Looking at the above scenario now i working on loading the data in to the Sequential file and then using the UDB Bulk Load Stage I would load the data into the table using the REPLACE option.

The question is I need to concatenate 6 different sequential files and the load into the Bulk Stage.

I can either use the CAT cmd in UNIX or use the Before Routine! But I am not sure how to do so???.

Please send me ur invaluable comments for working towards my goal. Also please let me know which of my approach is better suited or if there is any other possible approach!

Awaiting an early reply from ur end.
In the before-job subroutine (which you find in the Job Properties window) chooses ExecSH as the name of the routine. This will execute the command that is in the Input Value field.

In the Input Value field put a command to cat the six files, directing the output into a seventh file that DataStage can use. For example:
Code: Select all
cat file1 file2 file3 file4 file5 file6 > file7
Another way, if you're using DS 6.x or later, would be to do a similar thing using the filter capability of a Sequential File stage; first run the command
Code: Select all
cat file1 file2 file3 file4 file5 file6 
as the filter command; the Sequential File stage will read stdout from this command, which will obviate the need for file7.

As to the performance question, are you measuring DataStage's loading of the data file(s) for the DB2 bulk loader, or the performance of the DB2 bulk loader itself? This was not clear in your question.

roy · Post by **roy** » Tue Oct 28, 2003 1:20 am

Hi,
it seems to me that your sh can't find the cat or did you write Cat?

the cat must be lower case.

in case that the shell DS uses is not setup properly it might be missing the correct PATH setting to find the cat utility ( check with your SYS Admin)
this usually happens when shell defaults for users are set to tcsh or kshell where DS uses the regular shell or one of them.

IHTH (I Hope This Helps)

SMKraj · Post by **SMKraj** » Tue Oct 28, 2003 3:45 am

Hi,

Thnxs for ur timely help!!.

I have the following questions!

What is the performance issue, When I directly load data into the UDB database and when I use a Seuential Stage to get the outputed data and then use the UDB Bulk load Stage to load the data?

How to improve the performance of an ETL Job? or what are the criterias which are needed to keep in mind when Designing,Developing and Testing an ETL Job using Data Stage v6?What are the performance issues which can be applied to help the performance of an ETL job?

What are the possible testing tools which can be used when using Data Stage?

What are Containers?(Just a short note)

Please let me know these details.

roy wrote:Hi,
it seems to me that your sh can't find the cat or did you write Cat?

the cat must be lower case.

in case that the shell DS uses is not setup properly it might be missing the correct PATH setting to find the cat utility ( check with your SYS Admin)
this usually happens when shell defaults for users are set to tcsh or kshell where DS uses the regular shell or one of them.

IHTH (I Hope This Helps)

roy · Post by **roy** » Tue Oct 28, 2003 4:30 am

Hi,
I'm no a PX expert but some tihngs can't be done in reg PX jobs that work in regular server jobs and need to be encapsulated in containers (or shared containers)

containers are as their name implies holders of a piece of design that can be used in several jobs using the same,a already written design.

about performance enhancments it depends if your doing regular server jobs or PX jobs since they work differently.
I guess others here can elaborate better then me,
in short, generally bulk loads work faster then regular load stages, but not allways meet your needs
network trafic and memory to be taken in concideration et cetera
and plenty more things.

you probably need get consultant opinion regarding your specific configuration.

IHTI (I Hope This Helps)

ray.wurlod · Post by **ray.wurlod** » Tue Oct 28, 2003 4:33 pm

Mahesh,

Your most recent round of questions suggests that you are in need of the DataStage Essentlals class (DS314Svr or DS314PX). As a general rule the Sequential File stage is the fastest for writing. Using the DB2 bulk loader stage, though, gives you the means automatically to invoke the bulk loader and perform a few other tasks around the edges. You trade raw speed for flexibility. The fundamental principle of job design is "keep it simple". Build a series of simple jobs controlled by a sequence (or job control code) rather than one complex job. Troubleshooting tools include the DataStage debugger, stage tracing, the data browser, server side tracing, conditionally compiled statements in Routines, and so on. Testing primarily involves creating test data that include illegal values, out of range values, boundary values and the like, and determining whether actual output corresponds with expected output. A container is a stage type that can encapsulate a subset of the components comprising a job design.

SMKraj · Post by **SMKraj** » Tue Oct 28, 2003 9:46 pm

All,

Thank you very much for all your responses.

ray.wurlod wrote:Mahesh,

Your most recent round of questions suggests that you are in need of the DataStage Essentlals class (DS314Svr or DS314PX). As a general rule the Sequential File stage is the fastest for writing. Using the DB2 bulk loader stage, though, gives you the means automatically to invoke the bulk loader and perform a few other tasks around the edges. You trade raw speed for flexibility. The fundamental principle of job design is "keep it simple". Build a series of simple jobs controlled by a sequence (or job control code) rather than one complex job. Troubleshooting tools include the DataStage debugger, stage tracing, the data browser, server side tracing, conditionally compiled statements in Routines, and so on. Testing primarily involves creating test data that include illegal values, out of range values, boundary values and the like, and determining whether actual output corresponds with expected output. A container is a stage type that can encapsulate a subset of the components comprising a job design.

DSXchange

Few Questions with respect to Data Stage

Few Questions with respect to Data Stage

Re: Few Questions with respect to Data Stage

Re: Few Questions with respect to Data Stage

Re: Few Questions with respect to Data Stage