How to read a text file of 5GB
Moderators: chulett, rschirm, roy
How to read a text file of 5GB
Hi,
Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.
Thanks,
Rajee
Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.
Thanks,
Rajee
-
- Participant
- Posts: 15
- Joined: Thu Aug 07, 2008 2:56 am
Re: How to read a text file of 5GB
Use fileset stage instead of sequential file stage in your job as source reader.
Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.
Thanks,
Rajee
Karthik
Make It Work Make It Right Make It Fast
Make It Work Make It Right Make It Fast
Why can't you read a file of over 5Gb in DataStage?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Re: How to read a text file of 5GB
Thanks for your response Karthikkarthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.
Thanks,
Rajee
But can a fileset stage read a text file?
Thanks,
Rajee
Thanks,
Rajee
Rajee
Re: How to read a text file of 5GB
Thanks for your response Karthikkarthikdsexchange wrote:Use fileset stage instead of sequential file stage in your job as source reader.Rajee wrote:Can anyone let me know,how can we process a text file that is pipe delimited and is of more than 5GB in size.
Other than splitting the files into multiple is there any other way/stage through which the file can be read at one stretch.
Thanks,
Rajee
But can a fileset stage read a text file?
Thanks,
Rajee
Thanks,
Rajee
Rajee
Sequential file stage for one. CFF as well.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
CFF is the Complex-flat-file stage. The Sequential stage is one of the fastest possible means of reading a file. Is the file fixed length or variable length?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 99
- Joined: Mon Sep 03, 2007 7:49 am
- Location: Stockholm, Sweden
Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.
Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N
Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N
Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Thanks for your response.stefanfrost1 wrote:Depending on your datastage version you can improve performance (with varied results) based on how your file is structured using sequential file stage by setting the following properties in it.
Read From Multiple Nodes = Yes
or
Number of Readers Per Node = N
Furthermore, your performance is also depending on what you are doing after you've processed your file. As an example, you might improve performance by reading your file without metadata or everything on 1 row as one column and directly after that adding the schema in parallel using a Column Import stage....
Let me cross check with my job to make sure the options you have referred here are addressed.
Iam reading the entire row in a single column only.
I heard that a sequential file stage can read a file maximum of 2GB of size and not more than that and that is the limitation for a sequential file stage.But now,after discussion with you all i understand that whatever i heard is just a myth.
Thanks,
Rajee
Thanks,
Rajee
Rajee
The maximum number of columns is 30,but i do have lookup,transform stages in the job.ArndW wrote:That makes multiple readers a nonoption. How many columns does the file have and how long does it take to read the file when the only other stage in your job is a copy stage (or a peek stage)?
Thanks,
Rajee
Thanks,
Rajee
Rajee