Estimate table or disk space routine.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Estimate table or disk space routine.

Post by kduke »

Here is a routine that if you save the metadata in DS_METADATA and enter the row counts then you can estimate how much disk space it will take up.

Code: Select all

* -----------------------------------------------------------------
* KgdEstimateDiskSpace(DsMetadataId, OracleStyle, RowCount)
* Decription: Estimate disk space from DS_METADATA.
* Written by: Kim Duke
* Notes:
* -----------------------------------------------------------------
* $INCLUDE DSINCLUDE JOBCONTROL.H
      Ans = ''
      RoutineName = "KgdEstimateDiskSpace"
      Usage = RoutineName:'(DsMetadataId, OracleStyle, RowCount)'
* -----------------------------------------------------------------
* initialize variables
* -----------------------------------------------------------------
      True = 1
      False = 0
      Other = True
      Forever = True
      Today = oconv(@DATE, "D4-")
      ErrMsg = ''
      NullRec = ''
      CrLf = char(13):char(10)
      DsMetadataSep = '\'
      DebugMode = True
* -----------------------------------------------------------------
      FileName = 'DS_METADATA'
      open FileName to FilePtr else
         ErrMsg = 'Can not open ':FileName:' file.'
         goto ErrRtn
      end
      DsMetadataPtr = FilePtr
* -----------------------------------------------------------------
      read DsMetadataRec from DsMetadataPtr, DsMetadataId then
         Occurrence = 1
* -----------------------------------------------------------------
FindColumns:
         FindStr "CMetaColumn" In DsMetadataRec, Occurrence Setting AttrNo, MvNo, SvNo Then
* -----------------------------------------------------------------
* number of column definitions in collection
* -----------------------------------------------------------------
            NoOfFields = Field(DsMetadataRec<AttrNo>, "/", 2, 1)
            If NoOfFields = 0 Then
               * Stored procedures use the second columns collection in the record.
               * Go get the next one.
               Occurrence += 1
               GoTo FindColumns
            End
            TableName = field(DsMetadataId, DsMetadataSep, dcount(DsMetadataId, DsMetadataSep))
            DiskSpace = 0
            * NoOfFields = dcount(DsMetadataRec<AttrNo+1>,
            for MvNo=1 to NoOfFields
               ColumnName = DsMetadataRec<AttrNo+1, MvNo>
               if DebugMode then print 'ColumnName=':ColumnName
               ColumnDescription = DsMetadataRec<AttrNo+2, MvNo>
               DataElement = DsMetadataRec<AttrNo+3, MvNo>
               if DebugMode then print 'DataElement=':DataElement
               SQLDataTypeCode = DsMetadataRec<AttrNo+4, MvNo>
               if DebugMode then print 'SQLDataTypeCode=':SQLDataTypeCode
               Precision = DsMetadataRec<AttrNo+5, MvNo>
               Scale = DsMetadataRec<AttrNo+6, MvNo>
               IsNullable = DsMetadataRec<AttrNo+7, MvNo>
               if DebugMode then print 'IsNullable=':IsNullable
               KeyPosition = DsMetadataRec<AttrNo+8, MvNo>
               DisplayWidth = DsMetadataRec<AttrNo+9, MvNo>
               Factor = 1
               Begin case
                  case SQLDataTypeCode = -7
                     SQLType = "BIT"
                     OracleType = "VARCHAR2"
                  case SQLDataTypeCode = -6
                     SQLType = "TINYINT"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = -5
                     SQLType = "BIGINT"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = -4
                     SQLType = "LONGVARBINARY"
                     OracleType = "VARCHAR2"
                  case SQLDataTypeCode = -3
                     SQLType = "VARBINARY"
                     OracleType = "VARCHAR2"
                  case SQLDataTypeCode = -2
                     SQLType = "BINARY"
                     OracleType = "VARCHAR2"
                  case SQLDataTypeCode = -1
                     SQLType = "LONGVARCHAR"
                     OracleType = "VARCHAR2"
                  case SQLDataTypeCode = 0
                     SQLType = "VARCHAR"
                     OracleType = "VARCHAR2"
                     Factor = .8
                  case SQLDataTypeCode = 1
                     SQLType = "CHAR"
                     OracleType = "CHAR"
                  case SQLDataTypeCode = 2
                     SQLType = "NUMERIC"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 3
                     SQLType = "NUMERIC"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 4
                     SQLType = "INTEGER"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 5
                     SQLType = "SMALLINT"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 6
                     SQLType = "FLOAT"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 7
                     SQLType = "REAL"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 8
                     SQLType = "DOUBLE"
                     OracleType = "NUMBER"
                  case SQLDataTypeCode = 9
                     SQLType = "DATE"
                     OracleType = "DATE"
                  case SQLDataTypeCode = 10
                     SQLType = "TIME"
                     OracleType = "DATE"
                  case SQLDataTypeCode = 11
                     SQLType = "TIMESTAMP"
                     OracleType = "DATE"
                  case SQLDataTypeCode = 12
                     SQLType = "VARCHAR"
                     OracleType = "VARCHAR2"
                     Factor = .8
                  case True
                     SQLType = "VARCHAR"
                     OracleType = "VARCHAR2"
               end case
               if OracleStyle then
                  If OracleType = "NUMBER" Then
                     Factor = .25
                     If Precision = @NULL Then
                        DataType = OracleType
                     End Else
                        If Scale = @NULL Then
                           DataType = OracleType : "(" : Precision : ")"
                        End Else
                           DataType = OracleType : "(" : Precision : "," : Scale : ")"
                        End
                     End
                  End Else
                     if OracleType = "DATE" then
                        DataType = OracleType
                        Factor = .5
                     end else
                        DataType = OracleType : "(" : Precision : ")"
                     end
                  End
               end else
                  If SQLType = "NUMERIC" Then
                     Factor = 1
                     If Precision = @NULL Then
                        DataType = SQLType
                     End Else
                        If Scale = @NULL Then
                           DataType = SQLType : "(" : Precision : ")"
                        End Else
                           DataType = SQLType : "(" : Precision : "," : Scale : ")"
                        End
                     End
                  End Else
                     if SQLType = "DATE" or SQLType = "TIME" or SQLType = "TIMESTAMP" then
                        DataType = SQLType
                        Factor = 1
                     end else
                        DataType = SQLType : "(" : Precision : ")"
                     end
                  End
               end
               if DebugMode then print 'DataType=':DataType
               if OracleStyle = 2 then
                  Factor = 1
               end
               DiskSpace += Factor * Precision * RowCount
            next MvNo

            Ans = DiskSpace "20R,"
         end else
            ErrMsg = 'Could not find CMetaColumn in ':DsMetadataId
            goto ErrRtn
         end
      end else
         ErrMsg = 'Could not find ':DsMetadataId:' in DS_METADATA.'
         goto ErrRtn
      end

      goto TheEnd
* -----------------------------------------------------------------
ErrRtn:
      if DebugMode then print
      if DebugMode then print ErrMsg
      if DebugMode then print
      Ans = ErrMsg
* -----------------------------------------------------------------
TheEnd:
      if DebugMode then print
      if DebugMode then print
      if DebugMode then print Ans
      if DebugMode then print
Mamu Kim
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Lot of wasted code in there from KgdCreateDdlMetadata. It was copied from this routine posted on here as well. Someone else can clean out the code they do not want also change the factors.
Mamu Kim
Athorne
Participant
Posts: 57
Joined: Wed Feb 04, 2004 1:37 pm

Post by Athorne »

:oops: Forgive my ignorance, can you give an example of how to use this script. The language looks a lot like perl code, I'm just wondering how I would invoke this script to estimate disk space usage. I'm running DataStage 7.1r1 on AIX 5.2.

Thanks,
Andy
Andrew Thorne

Lead Software Engineer
PeopleSoft Technical Architecture
Enterprise Rent-A-Car
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

This is a server routine. When you save metadata in a link then it gets saved to DS_METADATA. If you know how many records are in a table and you haved saved or imported the metadata then you can use this to estimate disk space. This would work on sequential and hash files as well. I have never wanted to know all disk space of a whole database but this would calculate it.

I use it in test mode. If I am low on disk space and I need to load 10 million rows to a sequential file then I would save the metadata. Run this routine in test mode and see if I have enough disk space to load this file.
Mamu Kim
Post Reply