Large file to split
Moderators: chulett, rschirm, roy
Large file to split
Hi folks,
I have a sourcefile size 120gig, i need to split that file into several files and i need to work .
The scenario is this file is loading into multiple tables
By multiple Datastage jobs,
Now in order to split this file into separate files i don't know the metadata.
Please help me to solve this issue.
Thanks in advance.
I have a sourcefile size 120gig, i need to split that file into several files and i need to work .
The scenario is this file is loading into multiple tables
By multiple Datastage jobs,
Now in order to split this file into separate files i don't know the metadata.
Please help me to solve this issue.
Thanks in advance.
Re: Large file to split
Remember. Datastage is metadata driven. Even file split logic will be metadata driven. If you have no idea about the meta data, how are you going to split the file ?vsi wrote: Now in order to split this file into separate files i don't know the metadata.
Please help me to solve this issue.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Re: Large file to split
Thanks for u r response, Dsguru
I am sorry, i loaded the metadata
and the job design is as follow
seq.file -----> Transformer ------> multiple sequential files.
in the transformer i used the constratin
@INROWNUM<10000 ----- FIRSTFILE
@INROWNUM>10001AND@INROWNUM
like this i am using a condition for different sequential file,
but it is not working
is there any other method to split into multiple files.
Thanks in advance.
I am sorry, i loaded the metadata
and the job design is as follow
seq.file -----> Transformer ------> multiple sequential files.
in the transformer i used the constratin
@INROWNUM<10000 ----- FIRSTFILE
@INROWNUM>10001AND@INROWNUM
like this i am using a condition for different sequential file,
but it is not working
is there any other method to split into multiple files.
Thanks in advance.
If your on server engine then use a link partitioner to achieve this.
If your on PX then your condition should work.
for the first link have @INROWNUM < 10000
for second have @INROWNUM >= 10000 and @INROWNUM <20000
and so on....
What error are you getting ?
You can also split the file at the os level by using the split command.
If your on PX then your condition should work.
for the first link have @INROWNUM < 10000
for second have @INROWNUM >= 10000 and @INROWNUM <20000
and so on....
What error are you getting ?
You can also split the file at the os level by using the split command.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Remember that if it is a Parallel job, then @OUTROWNUM and @INROWNUM would get executed on each of the nodes for the job run. Meaning that - if you give a constraint @INROWNUM<=1000 and the job is run on a 4 node config, then the transformer would send out 1000 rows from each of the nodes - giving you 4000 rows in your file.
True. Use this amazing post by vmcburney to handle partition numbers and number of partitions. Constraint it accordingly.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Thanks for u r response folks,
environment:- parallel,
version :- 7.5.2
operating system:- Linux.
inorder to use Split command the file is having Header and Detail records.
like
Ex:- customer id, customer group
with reference to this information insurance, tax, adress, zipcode ----------------------------.
and the file is a fixed width column.
Even it is a parallel job, it was not configured fully i mean APT CONFIGURATION.
Please give u r valuble ideas to resolve the issue.
Thanks in advance.
environment:- parallel,
version :- 7.5.2
operating system:- Linux.
inorder to use Split command the file is having Header and Detail records.
like
Ex:- customer id, customer group
with reference to this information insurance, tax, adress, zipcode ----------------------------.
and the file is a fixed width column.
Even it is a parallel job, it was not configured fully i mean APT CONFIGURATION.
Please give u r valuble ideas to resolve the issue.
Thanks in advance.
Can you give me more information:
Can you give me more information:
Why does splitting this file make your life better? What is the problem if you just leave it alone? (I am sincere about asking this - the reason you want to split the file influences the way you need to split it.)
Run these commands on your file and paste the output here:
What is the record format?
or... what are the record formats?
These detail records you mentioned:
is it like this kind of pattern?
or like this:
I can't tell you without knowing what your data looks like.
Ok... actually I could tell you to do something, but it is likely to cause harm instead of helping you.
Post some example records, if you can - change everybody's name
and tax id if you have to.
John G.
Why does splitting this file make your life better? What is the problem if you just leave it alone? (I am sincere about asking this - the reason you want to split the file influences the way you need to split it.)
Run these commands on your file and paste the output here:
Code: Select all
ls -l file
wc file
or... what are the record formats?
These detail records you mentioned:
is it like this kind of pattern?
Code: Select all
HDR:vsi
DTL:tax#1
DTL:tax#2
DTL:tax#3
HDR:jgreve
DTL:tax#1
DTL:tax#2
HDR:wurlod
DTL:tax#1
DTL:tax#2
DTL:tax#3
DTL:tax#4
Code: Select all
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:vsi:tax#1 :tax#2 :tax#3:::::
:jgreve :tax#1 :tax#2::::::
:wurlod :tax#1 :tax#2 :tax#3 :tax#4::::
Ok... actually I could tell you to do something, but it is likely to cause harm instead of helping you.
Post some example records, if you can - change everybody's name
and tax id if you have to.
John G.
vsi wrote:Thanks for u r response folks,
environment:- parallel,
version :- 7.5.2
operating system:- Linux.
inorder to use Split command the file is having Header and Detail records.
like
Ex:- customer id, customer group
with reference to this information insurance, tax, adress, zipcode ----------------------------.
and the file is a fixed width column.
Even it is a parallel job, it was not configured fully i mean APT CONFIGURATION.
Please give u r valuble ideas to resolve the issue.
Thanks in advance.
Re: Can you give me more information:
Thanks for u r response.
The reason for splitting the file is
1. the file is too large 120gig
2.when i run the parallel jobs with large volumes of files like this i am getting HEAP ALLOCATION ERRORS.
3.For our parallel environment the Configuration of nodes is not done. (still they are doing).
4.The same file is source for 11 ETL JOBS
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:vsi:tax#1 :tax#2 :tax#3:::::
:jgreve :tax#1 :tax#2::::::
:wurlod :tax#1 :tax#2 :tax#3 :tax#4::::
i will provide if u need any further details.
Thanks in advance.
The reason for splitting the file is
1. the file is too large 120gig
2.when i run the parallel jobs with large volumes of files like this i am getting HEAP ALLOCATION ERRORS.
3.For our parallel environment the Configuration of nodes is not done. (still they are doing).
4.The same file is source for 11 ETL JOBS
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:vsi:tax#1 :tax#2 :tax#3:::::
:jgreve :tax#1 :tax#2::::::
:wurlod :tax#1 :tax#2 :tax#3 :tax#4::::
i will provide if u need any further details.
Thanks in advance.
Another clarification needed
Does only you first line of of your big file have the column names like this
If this is the case as mentioned by DSDuru2B you can use split command
or
Does the column name repeat at regular/irregular intervals like below
Then we need a different approach
Does only you first line of of your big file have the column names like this
Code: Select all
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:vsi:tax#1 :tax#2 :tax#3:::::
:jgreve :tax#1 :tax#2::::::
:wurlod :tax#1 :tax#2 :tax#3 :tax#4::::
or
Does the column name repeat at regular/irregular intervals like below
Code: Select all
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:vsi:tax#1 :tax#2 :tax#3:::::
:jgreve :tax#1 :tax#2::::::
:wurlod :tax#1 :tax#2 :tax#3 :tax#4::::
:NAME:TAX_1:TAX_2:TAX_3:TAX_4:TAX_5:TAX_6:TAX_7:TAX_8:ADDR_1:ADD_R2
:abc:tax#1 :tax#2 :tax#3:::::
:def :tax#1 :tax#2::::::
:ghi :tax#1 :tax#2 :tax#3 :tax#4::::
Narasimha Kade
Finding answers is simple, all you need to do is come up with the correct questions.
Finding answers is simple, all you need to do is come up with the correct questions.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Well, this is just a toy example.ray.wurlod wrote:Why am I paying more taxes than the others?
I didn't want to put all the
tax-fields in there, or we'd be looking
at something like this to support
your client list, yes?
Code: Select all
TAX_1:TAX_2: ... :TAX_999:TAX_1000
John G.