Teradata aborts half way through

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Teradata aborts half way through

Post by siddesai »

Hey Everyone,
I got a problem!

I 'm trying to load about 15 million rows in Teradata enterprise using bulk load. To be specific, using FastLoad. My dataloads keeps on failing around 9-10 million rows and that too inconsistently(meaning could be more or less then specified).

I dont have any access rights to configure Teradata. But, when I check FastLoad report it says "Unable to insert rows in database, increase size of your database". Well, the error pretty much says what is to be done but we cleared lot more data from database to get more space and which was quiet sufficient to load those 15 million rows but still it kept on failing and aborting. I tried to load the queries using Teradata client and it seemed to be working fine. Although, I didn't finish with it.

I noticed that the number of AMPs used by Teradata is about 96 and from what I know if it uses a single AMP then the unit will be 100 which means entire data is loaded in a single AMP.So, is that single AMP running out of space? Do I need to explicitly specify how many AMPs to use? I believe Teradata handles tablespace on its own?

Is there a limit to number of rows that can be processed by teradata using FastLoad? Also, I m joining data from almost 14 tables and funneling them to a single table in a single job.

Is funneling an issue? Or is it FastLoad which is not able to handle amount of data? I guess being a datawarehousing database it should be able to handle it but you never know.

Thanks for all your help.
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

i had similar problem in one of our job, then we found that the ORCH table created for this teradata table was having very high skewness factor..
so even for loading 20 GB file it was taking 400 GB space in the database...

properly defining the number of session and partitions will help.
also try specifying the workdb and target DB seperately.
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

1. No u cannot change how TD handles data distribution. if u have 96 AMPS it will distribute data on them using its own hashing algos. If u think all data is on one amp it means there is a serious skew factor in ur data i.e. non uniqueness and kills the basic theory of parallelism when it comes to data fetching and inserting . amps r virtual processes and u cannot tell TD what amps to use. Yes you can define indexes so u can evenly distribute data.

2. Yes teradata will handle space and DB mang internally and u do not have to worry about that. However as u say, and i hope not, the data is on one amp. if one amp runs out of space the whole system will run out of space even if u physically have space in the DB. As i mentioned earlier, you must define an index to get rid of skewness.

3. If the report files says increase the size of the database. then you should do it.

4. u have said u r using Teradata enterprise using bulk load. To be specific, using FastLoad. TDEnt is a separate stage and only for stage which reads and writes in parallel. it uses teradata fastload protocols hence has the same limitations as fastload which is no updates, appends. I hope u r doing it on an empty table as fast load only permits that.

5. there is no limitations to how many rows you can load using TD client utilities

6. as u r joinin tables. u might want to increase the buffer size of the flows in DS. which is by default 3mb.


Teradata Certified Master V2R5.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

U isn't undertaking this task. siddesai is.
:roll:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Post by siddesai »

@Keshav
we found that the ORCH table created for this teradata table was having very high skewness factor..
ORCH tables will only come into play when you are using TeraData Enterprise stage. But, I was using PX stage(which I should have mentioned) so that wasn't an issue for me.
Thanks mate

@hamzaqk
Yes, you were right about the way TeraData handles data. It basically relies on indexes generated on tables and based on those indexes it allocates AMP's. I was dealing with sufficient number of AMPs but still it used only one AMP which was out of space. I checked my schema and changed index to more unique column and since then it works like a charm ;)
if one amp runs out of space the whole system will run out of space even if u physically have space in the DB
I didn't get a call from our TeraData DBA that he has run out of space so I guess that only applies to a single database within a master database. If tha's exactly what you mean, my apologies.
If the report files says increase the size of the database. then you should do it.
Nope, they are not always trustworthy
TDEnt is a separate stage and only for stage which reads and writes in parallel. it uses teradata fastload protocols hence has the same limitations as fastload which is no updates, appends.
Unfortunately, I didn't design the job so I couldn't switch to TeraData Enterprise but could only modify the existing ones.
I hope u r doing it on an empty table as fast load only permits that.
Naah, its not really required. You can set the property "Not to drop table" and can use your own "SQL query" to empty/delete table. If you have specified not to drop the table then FastLoad simple clears the table.

Also, does TeraData behave like Oracle when you delete rows from database? As in archive them or something like that? or simply empties it?
hamzaqk
Participant
Posts: 249
Joined: Tue Apr 17, 2007 5:50 am
Location: islamabad

Post by hamzaqk »

1. Teradata has its own way when it comes to backing up and archieving data. it uses 'journals' to do so. Depending on your needs you can select and define upon table creation. For examples you can make a table fallback which means for every copy that is present in the table teradata will create a copy of it in another table but on a different AMP. so even when the amp which the original copy is on is down. you still have access to the same record on a different amp. this does adds to storage overhead. Then you have transient and permanent journals which are used to keep track of transactions. i.e before and after images of a the table records based on what kind of journal you use. This makes the data available even when amp is down. Hope this helps!

2. When an amp runs out of space it means the whole system will run out of space as amp is a system vproc and has nothin to do with a single database. for exp if u have a 4 amp system and 2 go down.. the system will go down. however if one amp goes down TD will use other 3 to run the system as it were before by using fallbacks copies of the data.

3. i will stll stick to what i said and that is you can only use Fastload to load empty tables. and i think u r sayin the same but in different words as u said that it clears up the table itself before starting loading.

Teradata Certified Master V2R5
Post Reply