Hi All,
We have a scenario that we have to load 500 million historical records from sql server to Azure data lake followed by 200k daily/incremental records data to capture the changed records and then we will need to load into data lake.
The catch here is we don't have timestamp columns. We are looking for your thoughts and assistance as well w.r.t DataStage jobs.
Thanks & Regards,
S.R
How to handle huge historical record loads in Azure DataLake
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 182
- Joined: Thu Jun 16, 2005 2:05 am
My first thought is to request they add a timestamp column to the source.
Is your problem that you are trying to identify and update changed records?
I have the same problem with another client with a feed way larger than yours. They say they can't modify the source since its a mainframe file. They threw hardware at the problem to reprocess the entire feed (clear and reload) when it comes in.
Is your problem that you are trying to identify and update changed records?
I have the same problem with another client with a feed way larger than yours. They say they can't modify the source since its a mainframe file. They threw hardware at the problem to reprocess the entire feed (clear and reload) when it comes in.
-
- Participant
- Posts: 182
- Joined: Thu Jun 16, 2005 2:05 am