Performance difference between fixed-width / Delimited file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Performance difference between fixed-width / Delimited file

Post by ak77 »

Hi,

I am updating a table from a file
Today its taking more time than last night

The only difference is last night the file was fixed-width format and today it is pipe-delimited format

Any insight will be helpful

Thanks

Kishan
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Admin -

I posted this message by mistake in this forum
Can you please move to the server

Kishan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Theoretically at least fixed width format is always faster than delimited. The main reason for this is that, in delimited files, the process must read one character at a time, checking whether each is a delimiter character. With fixed-width data, data can be processed (a) in row-sized chunks and (b) in field-sized chunks. "Block move" commands are very, very efficient in most operating systems.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Kishan,
But also make sure this is going to be the only reason.
Re check with the previous file and make sure it runs again with same full speed.
If your server is loaded process, you cannot expect the full effeciency.

-Kumar
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Thanks for the insight,

I understand that Fixed_width is suppose to be faster
I had another issue and this is why I am changing this file to Delimited ones

I had this job running with delimited file and the data was all good

I made some change to this design and wrote to one more file which takes care of the rejects and updates the table with 'N'

I found that some of the records which were not even suppose to be picked for processing were updated with a 'N'

I checked those files and found that particular ID was not there
But it was being updated in the table

I checked the logic and dont seem to have any problem

I am updating with the same key columns for those records with 'Y' which is from a delimted file

Only difference is the file of Y is delimited and file of N is fixed-width

when i had another N file with a delimeted version before, it worked fine

I dont know if any body has had similar problem or its something I am doing wrong

Thanks again

Regards,
Kishan
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Kishan,

Could not get into full depth of your design. Still can you explain, what are the stage you used to acheive this task and what is that you got unexpected.

-Kumar
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Am sorry, this is a server job question
I posted here by mimstake

OK, OCI stage to select data from Oracle
Using a parititioner and doing lookup against a hashed file
writing it to Sequential files


then Append the files as before job routine
Update the columns in a table OCI stage
One job for Yes and One for No

I am filtering some data with the lookup earlier job
When I update with a Fixed-width file, i found that some of the data that were filtered in the earlier job were updated with No

I got it perfectly processed when I was doing the same with a Pipe-delimited file.

For now, I cant find any other issues
So I thought this may be the problem

My job is still running and when it finishes tomorrow I should be able to say if it processed all the data right

Thanks
Kishan
Sreenivasulu
Premium Member
Premium Member
Posts: 892
Joined: Thu Oct 16, 2003 5:18 am

Post by Sreenivasulu »

Hi All,

As Roy says processing a fixed length file is faster since you avoid using the functions to remove the demiliters.

Regards
Sreenivasulu
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Sreenivasulu wrote:Hi All,

As Roy says processing a fixed length file is faster since you avoid using the functions to remove the demiliters.

Regards
Sreenivasulu
Another confusion: Roy Vs Ray :lol:

-Kumar
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

ak77 wrote:Am sorry, this is a server job question
I posted here by mimstake

OK, OCI stage to select data from Oracle
Using a parititioner and doing lookup against a hashed file
writing it to Sequential files


then Append the files as before job routine
Update the columns in a table OCI stage
One job for Yes and One for No

I am filtering some data with the lookup earlier job
When I update with a Fixed-width file, i found that some of the data that were filtered in the earlier job were updated with No

I got it perfectly processed when I was doing the same with a Pipe-delimited file.

For now, I cant find any other issues
So I thought this may be the problem

My job is still running and when it finishes tomorrow I should be able to say if it processed all the data right

Thanks
Kishan

So if i understand properly, the actual update of the column takes place at before job subroutine. May i know what is the aproach you use to update the column?

-Kumar
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

No, I append the files in before job subroutine and pass through a transformer to update to an oracle table using OCI8 stage

Hope it is clear now Kumar

Kishan
Post Reply