How to find tables containing lots of unused space?

Have you ever wondered why the table segment consumes as much space as it does and how does one know if the space allocated by each of the segments is actually used for storing data and is not mostly empty? Those question did bother me time to time and I was looking for a method that would not require to license any packs (like Diagnostic Pack for Segment Advisor, because it requires AWR) and which would not do lots of IOs by scanning the segments. In the end I found a simple solution for this…

Starting with 10gR1 Oracle provides procedure DBMS_SPACE.CREATE TABLE_COST that helps estimate the size of the table segment based on input about the expected number of rows and the average size of rows (in bytes), pct_free setting for the table and the name of the tablespace where the table would be created. The output contains information on the size of the user data and size consumed by the table segment.

DBMS_SPACE.CREATE_TABLE_COST (
   tablespace_name    IN VARCHAR2,
   avg_row_size       IN NUMBER,
   row_count          IN NUMBER,
   pct_free           IN NUMBER,
   used_bytes         OUT NUMBER,
   alloc_bytes        OUT NUMBER);

I found the procedure could be used to estimate if the actual size of the table differs from the expected size as all the input metrics are available - pct_free and the tablespace name from the properties of the table and the number of rows and the average size of the row metrics are collected while gathering the statistics for the table. The same procedure can be used for table partition segments as well as the same metrics are available.

To make things simpler, I wrote a function that returns the estimated size of the segment, so I can use it in query (it returns null in case of error, errors usually happen if any of the input data is null or provided number or size of rows is 0):

create or replace function table_space_estimate(p_tablespace_name in varchar2,
                                                p_avg_row_size    in number,
                                                p_num_rows        in number,
                                                p_pct_free        in number)
  return number as
  v_ub number;
  v_ab number;
begin
  sys.dbms_space.create_table_cost(tablespace_name => p_tablespace_name,
                                   avg_row_size    => p_avg_row_size,
                                   row_count       => p_num_rows,
                                   pct_free        => p_pct_free,
                                   used_bytes      => v_ub,
                                   alloc_bytes     => v_ab);
  return v_ab;
exception
  when others then
    return null;
end;

I wrote a query that estimates the size of top 500 largest table and table partition segments and reports them sorted by estimated size of the “wasted space” in descending order. It uses information in DBA_SEGMENTS, DBA_TABLES and DBA_TAB_PARTITIONS to provide the estimates. Try it in your test environment, the results might surprise you! And I guess you’ll know how to change it if estimates for other tables are needed. Here it is!

select j.*,
       decode(mbytes,0,0,trunc((est_mbytes - mbytes) / mbytes * 100)) est_size_incr_pct,
       est_mbytes-mbytes est_size_incr_mbytes
  from (select g.*,
               trunc(table_space_estimate(g.tablespace_name,
                                         g.avg_row_len,
                                         g.num_rows,
                                         g.pct_free)/1024/1024) est_mbytes
          from (select s.owner,
                       s.segment_name,
                       s.partition_name,
                       s.tablespace_name,
                       decode(segment_type, 'TABLE PARTITION', p.AVG_ROW_LEN, t.AVG_ROW_LEN) avg_row_len,
                       decode(segment_type, 'TABLE PARTITION', p.NUM_ROWS, t.NUM_ROWS) num_rows,
                       decode(segment_type, 'TABLE PARTITION', p.pct_free, t.pct_free) pct_free,
                       decode(segment_type, 'TABLE PARTITION', p.last_analyzed, t.last_analyzed) last_analyzed,
                       s.mbytes
                  from (select *
                          from (select tablespace_name,
                                       owner,
                                       segment_name,
                                       trunc(bytes/1024/1024) mbytes,
                                       segment_type,
                                       a.partition_name
                                  from dba_segments a
                                 where segment_type in ('TABLE', 'TABLE PARTITION')
                                   and owner!='SYS'
                                 order by bytes desc)
                         where rownum <= 500) s,
                       dba_tables t,
                       dba_tab_partitions p
                 where t.owner(+) = s.owner
                   and t.table_name(+) = s.segment_name
                   and p.table_owner(+) = s.owner
                   and p.table_name(+) = s.segment_name
                   and p.partition_name(+) = s.partition_name) g
         order by mbytes desc) j order by est_mbytes-mbytes;

And here is a sample output:

Estimating Table Sizes

The most important columns are:

MBYTES - actual size of the segment;
EST_MBYTES - estimated size of the segment.

I found the actual benefit after moving the table can be different - for some tables it was as estimated, but for some it was not. Even if the statistics were gathered having estimate_percent=100 the situation was not better. I think the procedure is not able to handle all the cases properly, but anyway, that’s a very good starting point to find where the space might have been wasted.