Query on SAN filesystem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Query on SAN filesystem

Post by LD »

Hi,

We are on DS 8.1 on AIX server. The server is shared with few other applications as well. Some info on the system,

1) SMP system
2) 4 logical CPUs
3) 4GB Memory
4) SAN File system
5) Target DB on separate GRID

We are not getting the desired performance even for simple jobs like

OEE ---> Seq File or OEE ---> DataSet

The max performance we get is about 3-4K rec/sec. When I load the data into DataSet then following job gives better result

DataSet ---> Transformer/Copy ---> Dataset

This job runs at about 15-16K rec/sec. When I start loading the transformer with transformation logic for each column the performance falls to ~3K rec/sec. I was surprised to see these transformations can bring down the performance to such a level as most of them were simple if/else statements.

Can some one suggest, what can be done to tune the environment so that we get better performance for simple jobs like

OEE ---> Seq File or OEE ---> DataSet

I understand every section of the environment I listed before might be contributing to the low performance, but I'm more interested in knowing

1) Is 4GB of memory sufficient
2) What is the impact of SAN file system against local disc

Thanks,

Shashank
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

4GB is probably not sufficient. I believe IBM suggest a minimum of 8GB.

Local disk is preferred, but the reality is that most pSeries sites use SAN of some kind. The difference is minor, but local disk is preferred particularly for scratch disk.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Thanks Ray,

Does the drastic drop in performance by adding more and more transformation to Transformer stage is the normal behavior of the stage or it indicates to some sort of resource crunch.

Thanks,

Shashank
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Definitely maybe. It depends. How complex are these transformations? How many are there? I wouldn't expect "drastic" changes unless these number differences were orders of magnitude. But, if the expressions are written inefficiently...
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Hi Ray,

Please find below a sample if/else I'm using in transformer. This is to check the data quality and provide an appropriate value and a message in case of issue.

If IsNull(Lnk_SrcData.PATIENT_ACCOUNT_NO) Or
Trim(Lnk_SrcData.PATIENT_ACCOUNT_NO)='' Then
'~Mandatory Error - PATIENT_ACCOUNT_NO value is Null/Empty String. '
Else
If Len(Lnk_SrcData.PATIENT_ACCOUNT_NO) > 50 Then
'~Value Too Big Error - PATIENT_ACCOUNT_NO value is too big to be stored. '
Else
If Convert('`~!@#$%^&*()_-+=[{]}\|;:,<.>?/','',
Lnk_SrcData.PATIENT_ACCOUNT_NO)='' Then
'~Invalid Value Error - PATIENT_ACCOUNT_NO value is invalid. '
Else
Trim(Lnk_SrcData.PATIENT_ACCOUNT_NO):'~'

There are total 54 such checks. some for text, some for numbers and 5 for dates. Date checking logic is a bit complex but they are less in number.

Below is to test dates,

If IsNull(Lnk_SrcData.DISCHARGE_DATE) Or
Trim(Lnk_SrcData.DISCHARGE_DATE)='' Then
'~Mandatory Error - DISCHARGE_DATE value is Null/Empty String. '
Else
If Count(Lnk_SrcData.DISCHARGE_DATE,' ') = 0 Then
If IsValid('Date',
Field(Lnk_SrcData.DISCHARGE_DATE,'/',3):'-':(If
Len(Field(Lnk_SrcData.DISCHARGE_DATE,'/',1))=1 Then
'0':Field(Lnk_SrcData.DISCHARGE_DATE,'/',1) Else
Field(Lnk_SrcData.DISCHARGE_DATE,'/',1)):'-':(If
Len(Field(Lnk_SrcData.DISCHARGE_DATE,'/',2))=1 Then
'0':Field(Lnk_SrcData.DISCHARGE_DATE,'/',2) Else
Field(Lnk_SrcData.DISCHARGE_DATE,'/',2))) = 0 Then
'~Format Mismatch Error - DISCHARGE_DATE value is invalid
or not in desired format. '
Else
Field(Lnk_SrcData.DISCHARGE_DATE,'/',3):'-':(If
Len(Field(Lnk_SrcData.DISCHARGE_DATE,'/',1))=1 Then
'0':Field(Lnk_SrcData.DISCHARGE_DATE,'/',1) Else
Field(Lnk_SrcData.DISCHARGE_DATE,'/',1)):'-':(If
Len(Field(Lnk_SrcData.DISCHARGE_DATE,'/',2))=1 Then
'0':Field(Lnk_SrcData.DISCHARGE_DATE,'/',2) Else
Field(Lnk_SrcData.DISCHARGE_DATE,'/',2)):' 00:00:00~'
Else
If Count(Lnk_SrcData.DISCHARGE_DATE,' ') > 1 Then
'~Format Mismatch Error - DISCHARGE_DATE value is invalid
or not in desired formate. '
Else
If IsValid('Date',
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',3):'-':(If
Len(Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1))=1 Then
'0':Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1) Else
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1)):'-':(If
Len(Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2))=1 Then
'0':Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2) Else
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2))) = 0 Then
'~Format Mismatch Error - DISCHARGE_DATE value is
invalid or not in desired format. '
Else
If IsValid('Time', '':(If
Count(Field(Lnk_SrcData.DISCHARGE_DATE,' ',2),':')=0 Then
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2):':00:00' Else If
Count(Field(Lnk_SrcData.DISCHARGE_DATE,' ',2),':')=1 Then
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2):':00' Else
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2))) = 0 Then
'~Format Mismatch Error - DISCHARGE_DATE value
is invalid or not in desired format. '
Else
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',3):'-':(If
Len(Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1))=1 Then
'0':Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1) Else
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',1)):'-':(If
Len(Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2))=1 Then
'0':Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2) Else
Field(Field(Lnk_SrcData.DISCHARGE_DATE,' ',1),'/',2)):' ':(If
Count(Field(Lnk_SrcData.DISCHARGE_DATE,' ',2),':')=0 Then
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2):':00:00' Else If
Count(Field(Lnk_SrcData.DISCHARGE_DATE,' ',2),':')=1 Then
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2):':00' Else
Field(Lnk_SrcData.DISCHARGE_DATE,' ',2)):'~'


Thanks,

Shashank
Post Reply