I have got a performance issue while reading a .csv file which has 10254768 rows of data in it.
![Shocked :shock:](./images/smilies/icon_eek.gif)
the job flow is as below
Seq file >Transformer > Sort > Transformer > Oracle Stage ( 2 oracle stages ... one for capturing reject data and the other for good data)
Our process runs on two nodes Only
![Sad :(](./images/smilies/icon_sad.gif)
as i was not able to pinpoint the issue, i split the job into two, assuming the performance hindrance is when reading the file.
It was taking 5 minutes to read 10% of the file, which is making the job run for almost 2 hours depending on the server load. The only hiccup is when reading the HUGE volume of data, the data is upserted within 15 minutes into the DB, which i guess is acceptable for now
is there anyway i can improve the performance?
we have started to receive this files from the past five days ..and i am not sure if we will have the same amount in future as well. i will be following up with the source on why the volume has spiked by 50% all of a sudden!!!
any help is appreciated on this issue
PS : I tried searching the forum, but i quit after scanning 50 - 60 pages
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Am sorry if i had over looked anything
![Confused :?](./images/smilies/icon_confused.gif)