Performance issue in reading 6GB file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Performance issue in reading 6GB file

Post by ethelvina »

Hi, All,

I have Mainframe file as source and it is of 6GB size and have around 3500 fields. I tried to read this file with Sequential file stage using Schema File and RCP options and pass it to a dataset thru' transformer. It is taking around 25 minutes to read the data from dataset in another job (say 2nd job).So Iwrote it to Fileset instead DS (in first job) and tried reading data from File set (2nd job). There is no improvement. Are there any other way to improve the performance?
I used the same jobs when reading a mainframe file of ~3800 fields and it was taking 30 secs to complete it. But the size is minimal not even 1 GB data.

FYI...I tried reading using CFF but it is taking 45 minutes to get it done.

I'd like to thank if any one could help me out on this.

Thanks,
Ethel.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

You have two problem areas, and I'm not sure which you consider:

a) reading from flat file and writing to a dataset
b) reading the dataset.

Is your mainframe file fixed length? If so, you might be able to improve your read performance by using multiple readers and most likely improve your dataset write performance by choosing an optimal APT_CONFIG_FILE configuration.

The second problem might also be addressed by using more parallel nodes. It does depend upon your DataStage server hardware setup, though.
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

ArndW wrote:You have two problem areas, and I'm not sure which you consider:

a) reading from flat file and writing to a dataset
b) reading the dataset.

Is your mainframe file fixed length? If so, you might ...
My Mainframe file is fixed. Here it goes my design for better understanding.

I have Mainframe fixed width file and I have used Sequential file stage to read it (the options i used is already in my earlier post). It is taking ~25 mins to read it. That is the reason I splitted the job in to like reading the data from Seq file and write it to Dataset and used the dataset as source for further process.

Pls correct me if I'm wrong anywhere.

Thanks.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many CPUs does your system have and how many nodes are in your APT_CONFIG_FILE? Can you experiment on timing using multiple readers per node to see if you speed up reading your file (in order to test this, write to a PEEK stage instead of to the dataset)
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

ArndW wrote:How many CPUs does your system have and how many nodes are in your APT_CONFIG_FILE? Can you experiment on timing using multiple readers per node to see if you speed up reading your file (in order to test this, write to a PEEK stage instead of to the dataset)
We have 4 nodes in config file. I've raised readers per node to 4 when using Seq file stage.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

This issue is probably more closely related to how you have defined the record layout and the output record. If you include the Group etc... then it will take a bunch of time to read the data especially with rows that wide. I just had this issue with an 18gb file and 1800 columns and I was able to get it to read very quickly. I need to look at what I did and pass that along to you....

Stand by :D
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I would try turning on "Read from multiple Nodes".
Last edited by ArndW on Mon Oct 11, 2010 7:27 am, edited 2 times in total.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Do you also have "Read from multiple Nodes" turned on?
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

mhester wrote:This issue is probably more closely related to how you have defined the record layout and the output record. If you include the Group etc... then it will take a bunch of time to read the data especially with rows that wide. I just had this issue with an 18gb file and 1800 columns and I was able to get it to read very quickly. I need to look at what I did and pass that along to you....

Stand by :D
I did not include Group in the schema file (but manually expanded it so as to avoid the subreord creation). Also, my source is Sequential file reading Mainframe fixed file passed to transformer and then to Column Inport stage and then to a target dataset. Column Import stage is RCP enabled.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What is your CPU usage during the job run? At the moment you don't know if I/O, I/O per process, or CPU Usage is the bottleneck.
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

ArndW wrote:What is your CPU usage during the job run? At the moment you don't know if I/O, I/O per process, or CPU Usage is the bottleneck. ...
I checked with DS admin for I/O process during that time and it was quite minimal it seems and so the CPU usage.
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

ArndW wrote:Do you also have "Read from multiple Nodes" turned on? ...
Yes.Its turned on.
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

mhester wrote:This issue is probably more closely related to how you have defined the record layout and the output record. If you include the Group etc... then it will take a bunch of time to read the data especially with rows that wide. I just had this issue with an 18gb file and 1800 columns and I was able to get it to read very quickly. I need to look at what I did and pass that along to you....

Stand by :D
Can you Pls update us how you did that in your job?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

ethelvina wrote:...I checked with DS admin for I/O process during that time and it was quite minimal it seems and so the CPU usage.
Something isn't correct here - either your CPU or your I/O Bandwidth is going to max out in this case. Could your disk be on a SAN and thus heavy disk use might show up in the form of network I/O?
ethelvina
Participant
Posts: 17
Joined: Sun Mar 30, 2008 10:05 pm

Post by ethelvina »

ArndW wrote:
ethelvina wrote:...I checked with DS admin for I/O process during that time and it was quite minimal it seems and so the CPU usage.
Something isn't correct here - either your CPU or your ...
I'm sorry..I meant to say "I checked with DS admin for I/O process during that time and it was quite minimal it seems and also the CPU usage"
Post Reply