Insert and update the records to sqlserver
Moderators: chulett, rschirm, roy
Insert and update the records to sqlserver
HI
i am loading the data into sql server (using Dynamic RDBMS)
there i am using insert or update query to load the data into sql server
record count around 1 million,
i have to refresh the data daily using this job
when i am running the same job again its loading entire records once again
i need to load new records only
Thanks
i am loading the data into sql server (using Dynamic RDBMS)
there i am using insert or update query to load the data into sql server
record count around 1 million,
i have to refresh the data daily using this job
when i am running the same job again its loading entire records once again
i need to load new records only
Thanks
samba
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
1. Your database table should have a primary key or at least a unique constraint on the table.
2. Is there a way that you can only select a delta from your million row source table? An update datetime perhaps? If so you can store this off and pass it into your job as a parameter in your select statement.
3. Now that your unique constraint is in place you should be able to specify an upsert, insert then update (if there are more new rows than older rows) or update then insert (if there are more rows to update than to insert).
2. Is there a way that you can only select a delta from your million row source table? An update datetime perhaps? If so you can store this off and pass it into your job as a parameter in your select statement.
3. Now that your unique constraint is in place you should be able to specify an upsert, insert then update (if there are more new rows than older rows) or update then insert (if there are more rows to update than to insert).
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Then get a unique constraint on the table (which will help with job performance as well) and perform an update then insert and that will do the trick for you.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
If your update and insert volumes are both very large, try to split your updates and your inserts in two separate datasets (change capture will tell you which is which or any timestamp you put in the source data). This way, you can update and insert separately (reduces DataBase operations).
HTH,
Mat
HTH,
Mat
-
- Participant
- Posts: 407
- Joined: Mon Jun 27, 2005 8:54 am
- Location: Walker, Michigan
- Contact:
You could add a column with the datatype "timestamp/ROWVERSION" (I believe will work in both MSSQL and Sybase). This system column type stores the database timestamp automatically for you when an insert or an update is done to the table. If you keep track of this varbinary you can use it determine what records changed and you can pull just those records to use in an upsert. Would be basically the samething if you had a datetime column and all of your applications (or a trigger) always updated that column as well. However, if someone did not update that column, you would lose this update. That is why I like the timestamp datatype.
You would lose the records that need to be deleted using this approach. But, you might not care for your purpose.
One method to implement this approach:
Add a column with the datatype timestamp to your tables that you need to copy.
--> Eg) timestamp timestamp
Create a table to keep track of the database timestamp between runs.
ts_track (table varchar(128) not null,
timestamp timestamp not null,
old_timestamp varbinary(8) null,
row_update_dtime datetime not null)
Insert a record into the ts_track table
insert into ts_track (table, old_timestampe, row_update_dtime) values ('your_table', 0x0, getdate())
Do a full sync.
Update ts_track before next read so that we do not lose record modifications. This could result in a few extra rows, but should be no biggie.
Job1:
UPDATE ts_track
SET old_timestamp = timestamp,
row_update_dtime = getdate()
WHERE table = 'your_table'
Job2:
Next time you read do a join with the new table.
Eg)
SELECT T1.*
FROM your_table T1,
ts_track T2
WHERE T1.timestamp >= T2.old_timestamp
AND T2.table = 'your_table'
You would lose the records that need to be deleted using this approach. But, you might not care for your purpose.
One method to implement this approach:
Add a column with the datatype timestamp to your tables that you need to copy.
--> Eg) timestamp timestamp
Create a table to keep track of the database timestamp between runs.
ts_track (table varchar(128) not null,
timestamp timestamp not null,
old_timestamp varbinary(8) null,
row_update_dtime datetime not null)
Insert a record into the ts_track table
insert into ts_track (table, old_timestampe, row_update_dtime) values ('your_table', 0x0, getdate())
Do a full sync.
Update ts_track before next read so that we do not lose record modifications. This could result in a few extra rows, but should be no biggie.
Job1:
UPDATE ts_track
SET old_timestamp = timestamp,
row_update_dtime = getdate()
WHERE table = 'your_table'
Job2:
Next time you read do a join with the new table.
Eg)
SELECT T1.*
FROM your_table T1,
ts_track T2
WHERE T1.timestamp >= T2.old_timestamp
AND T2.table = 'your_table'
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
What indexes do you have on the table. That is why I said this would improve your performance. My guess is that a unique constraint on your key field (the same key that would be in your where clause) would significantly increase your performance. 1 million is not an unreasonable amount of data, I have a job that does this with 7 million rows in just over an hour.samba wrote:kwwilliams
But its taking time to update or insert the entire data, nearly 25 hours
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
Samba, if you have a column in your source table that maintains the timestamp of when that record was last updated or inserted, then the most easiest thing for you to do is, just select the changed records from your source table.
in the DRS stage, select user defined SQL and in sql select only changed records.
say you have a table XYZ with four columns A, B, C, DT where DT is the column that maintains the timestamp of when that record was last modified(inserted or updated)
in the user defined sql you could do something like this
This sql will get only the changed records,
and in your target DRS stage, choose the option of update and insert or insert and update, depending upon the frequency of updates and inserts you get.
this will also solve you initial problem of records getting loaded twice.
i hope it helps.
in the DRS stage, select user defined SQL and in sql select only changed records.
say you have a table XYZ with four columns A, B, C, DT where DT is the column that maintains the timestamp of when that record was last modified(inserted or updated)
in the user defined sql you could do something like this
Code: Select all
SELECT A, B, C, DT
FROM XYZ
WHERE (DT BETWEEN
(SELECT MAX(DT)
FROM XYZ) AND GETDATE())
and in your target DRS stage, choose the option of update and insert or insert and update, depending upon the frequency of updates and inserts you get.
this will also solve you initial problem of records getting loaded twice.
i hope it helps.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
I agree with GURU2b.
That's why I stated this earlier:
2. Is there a way that you can only select a delta from your million row source table? An update datetime perhaps? If so you can store this off and pass it into your job as a parameter in your select statement.
That's why I stated this earlier:
2. Is there a way that you can only select a delta from your million row source table? An update datetime perhaps? If so you can store this off and pass it into your job as a parameter in your select statement.
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
What if you use Change Capture Stage?jasper wrote:A completely other scenario: If the number of inserts/updates is large and it's difficult to find the changed records: Just truncate and reload.
I'm not used to sql server but on an oracle this is the fastest solution once you reach about 10% inserts/updates.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'