Page 1 of 1

How to handle huge historical record loads in Azure DataLake

Posted: Sat Jan 11, 2020 7:11 am
by satheesh_color
Hi All,

We have a scenario that we have to load 500 million historical records from sql server to Azure data lake followed by 200k daily/incremental records data to capture the changed records and then we will need to load into data lake.

The catch here is we don't have timestamp columns. We are looking for your thoughts and assistance as well w.r.t DataStage jobs.


Thanks & Regards,
S.R

Posted: Wed Jan 15, 2020 10:28 am
by asorrell
My first thought is to request they add a timestamp column to the source.

Is your problem that you are trying to identify and update changed records?

I have the same problem with another client with a feed way larger than yours. They say they can't modify the source since its a mainframe file. They threw hardware at the problem to reprocess the entire feed (clear and reload) when it comes in.

Posted: Wed Jan 15, 2020 9:09 pm
by satheesh_color
Hi asorrell,

Thanks for your response. I am also on the same road. No change / source will not be modified.



Regards,
Satheesh.R