Hi
Will there be any difference in performance when we read file as single column using Sequential File Stage and then use column import stage to divide the multiple columns and Reading the file as multiple columns using Sequential File Stage and do the rest of the transformation process.
Thanks
Naga
Sequential File Stage
Moderators: chulett, rschirm, roy
Probably, yes maybe. No clue if it would be faster / better / slower / worser or even if it would be all that different as there are too many variables at play. Honestly, the only way to properly answer the question would be to try both ways on your system with your data and see. And hopefully the volume of data would be large enough to make any metrics statistically meaningful.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
Ray and Craig are correct, there will be better performance when you use multiple columns in the Source itself.
I have had the same situation in one of my jobs but I used Field function in the transformer to divide the columns.
And the surprising part was there was almost negligible change in the performance. When using as muliple columns in the source, the job finished in 200 sec while it took 230 sec when using only 1 column.
Half a minute is not a big deal
I have had the same situation in one of my jobs but I used Field function in the transformer to divide the columns.
And the surprising part was there was almost negligible change in the performance. When using as muliple columns in the source, the job finished in 200 sec while it took 230 sec when using only 1 column.
Half a minute is not a big deal
Thanx and Regards,
ETL User
ETL User
Usually such jobs are limited by I/O speed and not CPU, and in both cases the same amount of I/O is being done. Parsing the columns directly in the stage should be somewhat more efficient than doing using a field() function, so the results that Chandra has seen are what I would expect; the increase in time is due to higher CPU loads.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It would be more interesting had Chandra advised the data volume on which the test was done, and if the Column Import stage had been user rather than a Transformer stage for the parsing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India