profiling time and space ?

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

profiling time and space ?

Post by vairus »

Hi,

Am new to information analyzer and i had lot of question.

Am doing column analysis on flat file size of 2.3GB. It had 6.6 million rows.

What is the amount of space needed for this table in DB2?

am using 2 proc of 3.6GHZ ,160 GB harddisk and 3GB Ram..how long it will take to finish column analysis for above table?

while doing analysis it creating dataset in ibm/informationserver/server/dataset folder and clear all the files after the process. Is the dataset folder is tempspace for IA?

How to monitor the status of job while its running?

Thanks in advance guys

Vairamuthu
vairamuthu
Aruna Gutti
Premium Member
Premium Member
Posts: 145
Joined: Fri Sep 21, 2007 9:35 am
Location: Boston

Post by Aruna Gutti »

I can give an answer for one of your questions.

You can monitor the Column Analysis jobs in Director Client in ANALYZERPROJECT.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use DB2 Control Center to see which tablespaces are used in IADB and ANALYZERPROJECT and the amount of space allocated to and used by each. Table space management is usually set to "automatic".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vairus
Participant
Posts: 52
Joined: Thu Feb 07, 2008 8:02 am
Location: Johannesburg

Post by vairus »

Thanks for your reply Aruna and ray .
vairamuthu
mee
Participant
Posts: 23
Joined: Sat Mar 20, 2004 12:22 am
Location: None

Post by mee »

Can someone shed any light as to what IA is doing under the cover? And why does it need the IADB and ANALYZERPROJECT tables?

Thanks in advance.
mee
Participant
Posts: 23
Joined: Sat Mar 20, 2004 12:22 am
Location: None

Post by mee »

Can someone shed any light as to what IA is doing under the cover? And why does it need the IADB and ANALYZERPROJECT tables?

Thanks in advance.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Information Analyzer analysis tasks are run as DataStage jobs in the ANALYZERPROJECT DataStage project.

The Information Analyzer database (IADB) is used to store the results of analysis - for example the results of column analysis are used in performing table analysis and cross-table analysis.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mee
Participant
Posts: 23
Joined: Sat Mar 20, 2004 12:22 am
Location: None

Post by mee »

Ray, Thanks for the response.

I do have some follow up questions. Will appreciate getting further clarity on these.

We have some large files (~ few GB) that we need to get from outside vendors and one major problem is quality of the data files. We also have a fixed time window in which the profiling must complete and report back issues to the vendors. We are likely to do column profiling as well as primary key inference against these files. The column type is of varchar 256. What are some guidance on HW and storage? I am looking for approximate number of CPUs/cores, memory size and disk size to complete the job in approximately in 2 hours.

Secondly, it's likely that file sizes will grow down the line (but the prolfing functionality will remain same). Is there any way I can maintain the same 2 hour window for column profiling and key inference by doing some data partitioning and parallel processing? If so how would that be done?

Lastly, how do I perform "join" analysis between two files to determine the "join" key between two files?

Thanks in advance.
Post Reply