Sequential File Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Sequential File Stage

Post by Nagac »

Hi

Will there be any difference in performance when we read file as single column using Sequential File Stage and then use column import stage to divide the multiple columns and Reading the file as multiple columns using Sequential File Stage and do the rest of the transformation process.

Thanks
Naga
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Probably, yes maybe. No clue if it would be faster / better / slower / worser or even if it would be all that different as there are too many variables at play. Honestly, the only way to properly answer the question would be to try both ways on your system with your data and see. And hopefully the volume of data would be large enough to make any metrics statistically meaningful.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Probably, because you are parsing in parallel mode. But, as Craig notes, you won't notice much of a difference with small data volumes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Ray and Craig are correct, there will be better performance when you use multiple columns in the Source itself.
I have had the same situation in one of my jobs but I used Field function in the transformer to divide the columns.
And the surprising part was there was almost negligible change in the performance. When using as muliple columns in the source, the job finished in 200 sec while it took 230 sec when using only 1 column.
Half a minute is not a big deal :lol:
Thanx and Regards,
ETL User
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Usually such jobs are limited by I/O speed and not CPU, and in both cases the same amount of I/O is being done. Parsing the columns directly in the stage should be somewhat more efficient than doing using a field() function, so the results that Chandra has seen are what I would expect; the increase in time is due to higher CPU loads.
Nagac
Premium Member
Premium Member
Posts: 127
Joined: Tue Mar 29, 2011 11:39 am
Location: India

Post by Nagac »

Thanks Everyone.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It would be more interesting had Chandra advised the data volume on which the test was done, and if the Column Import stage had been user rather than a Transformer stage for the parsing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

@Ray,
The job which I have mentioned had around 17 million records.
I didnt used the column Import stage, testing was done for reading as a single column and as multiple columns.
And using the Field function only.
Thanx and Regards,
ETL User
Post Reply