Limiting the number of rows being analyzed

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
pk7
Participant
Posts: 44
Joined: Fri Aug 03, 2007 8:35 am
Location: Kansas CIty

Limiting the number of rows being analyzed

Post by pk7 »

Does anyone know of a creative way to limit the number of rows being analyzed by Info Analyzer for large tables? The WHERE CLAUSE under Project Properties is not very useful.

Ideally I'd like to limit by percentage (i.e. I want to analyze a RANDOM 10% of the rows) or better still, by date range (i.e. I want to analyze all records where Load_Date, a column in most of our tables, is sometime in 2010).

Creating new tables or views with a subset of the original table is not an option for us.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

You can select to profile a sample when you run the column analysis, with options like random x%, or every nth record.

Otherwise, create a virtual table with the exact sql you would use to filter on, and then profile the virtual table.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Stuart beat me to it! You can specify sampling anywhere you change analysis settings - from project-wide down to at run time.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pk7
Participant
Posts: 44
Joined: Fri Aug 03, 2007 8:35 am
Location: Kansas CIty

Post by pk7 »

stuartjvnorton wrote:You can select to profile a sample when you run the column analysis, with options like random x%, or every nth record.

Otherwise, create a virtual table with the exact sql you would use to filter on, and then profile the virtual table.
Stuart and Ray! Thanks for responding (you can always count on the Aussies!)

My understanding is that the sampling is done at Col Analysis time but is used at Table Analysis time. That does not limit the number of rows actually being analyzed during Col Analysis. I have the sample size set to ONE row. The net effect is that Col Analysis is still profiling ALL rows in the source table but creating a very small sample set. So, that's not my problem.

The problem is that I want to reduce the number of rows that come in for Col Analysis. So, the suggestion to create a virtual table would work (I assume you mean to create an Oracle view on the source table.) However, that is not going to help me in this case as the tables are now all Col Analyzed and baselined. Now I need to do a new analysis on the same tables to compare to the baseline. So I am forced to hit the same big tables again! Creating virtual tables won't help in this case.

Any other ideas to limit WITHIN Info Analyzer rather that outside? it would be great if I could limit by one of the date fields using the I.A. WHERE clause but it won't accomodate single or double quotes needed in the query.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

pk7 wrote:The problem is that I want to reduce the number of rows that come in for Col Analysis. So, the suggestion to create a virtual table would work (I assume you mean to create an Oracle view on the source table.) However, that is not going to help me in this case as the tables are now all Col Analyzed and baselined. Now I need to do a new analysis on the same tables to compare to the baseline. So I am forced to hit the same big tables again! Creating virtual tables won't help in this case.
I'm talking about inside IA.
When you select Column Analysis from the Invetsigate menu, one of the options beside "Run Column Analysis" is "Create Virtual Table".
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... and the virtual table can be based on a sample of the rows in the base table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
pk7
Participant
Posts: 44
Joined: Fri Aug 03, 2007 8:35 am
Location: Kansas CIty

Post by pk7 »

stuartjvnorton wrote:
pk7 wrote:The problem is that I want to reduce the number of rows that come in for Col Analysis. So, the suggestion to create a virtual table would work (I assume you mean to create an Oracle view on the source table.) However, that is not going to help me in this case as the tables are now all Col Analyzed and baselined. Now I need to do a new analysis on the same tables to compare to the baseline. So I am forced to hit the same big tables again! Creating virtual tables won't help in this case.
I'm talking about inside IA.
When you select Column Analysis from the Invetsigate menu, one of the options beside "Run Column Analysis" is "Create Virtual Table".

Thank you, Stuart and Ray!
Post Reply