How to find tables containing lots of unused space?
Have you ever wondered why the table segment consumes as much space as it does and how does one know if the space allocated by each of the segments is actually used for storing data and is not mostly empty? Those question did bother me time to time and I was looking for a method that would not require to license any packs (like Diagnostic Pack for Segment Advisor, because it requires AWR) and which would not do lots of IOs by scanning the segments. In the end I found a simple solution for this…
Starting with 10gR1 Oracle provides procedure DBMS_SPACE.CREATE TABLE_COST
that helps estimate the size of the table segment based on input about the expected number of rows and the average size of rows (in bytes), pct_free setting for the table and the name of the tablespace where the table would be created. The output contains information on the size of the user data and size consumed by the table segment.
DBMS_SPACE.CREATE_TABLE_COST (
tablespace_name IN VARCHAR2,
avg_row_size IN NUMBER,
row_count IN NUMBER,
pct_free IN NUMBER,
used_bytes OUT NUMBER,
alloc_bytes OUT NUMBER);
I found the procedure could be used to estimate if the actual size of the table differs from the expected size as all the input metrics are available - pct_free and the tablespace name from the properties of the table and the number of rows and the average size of the row metrics are collected while gathering the statistics for the table. The same procedure can be used for table partition segments as well as the same metrics are available.
To make things simpler, I wrote a function that returns the estimated size of the segment, so I can use it in query (it returns null in case of error, errors usually happen if any of the input data is null or provided number or size of rows is 0):
create or replace function table_space_estimate(p_tablespace_name in varchar2,
p_avg_row_size in number,
p_num_rows in number,
p_pct_free in number)
return number as
v_ub number;
v_ab number;
begin
sys.dbms_space.create_table_cost(tablespace_name => p_tablespace_name,
avg_row_size => p_avg_row_size,
row_count => p_num_rows,
pct_free => p_pct_free,
used_bytes => v_ub,
alloc_bytes => v_ab);
return v_ab;
exception
when others then
return null;
end;
I wrote a query that estimates the size of top 500 largest table and table partition segments and reports them sorted by estimated size of the “wasted space” in descending order. It uses information in DBA_SEGMENTS
, DBA_TABLES
and DBA_TAB_PARTITIONS
to provide the estimates. Try it in your test environment, the results might surprise you! And I guess you’ll know how to change it if estimates for other tables are needed. Here it is!
select j.*,
decode(mbytes,0,0,trunc((est_mbytes - mbytes) / mbytes * 100)) est_size_incr_pct,
est_mbytes-mbytes est_size_incr_mbytes
from (select g.*,
trunc(table_space_estimate(g.tablespace_name,
g.avg_row_len,
g.num_rows,
g.pct_free)/1024/1024) est_mbytes
from (select s.owner,
s.segment_name,
s.partition_name,
s.tablespace_name,
decode(segment_type, 'TABLE PARTITION', p.AVG_ROW_LEN, t.AVG_ROW_LEN) avg_row_len,
decode(segment_type, 'TABLE PARTITION', p.NUM_ROWS, t.NUM_ROWS) num_rows,
decode(segment_type, 'TABLE PARTITION', p.pct_free, t.pct_free) pct_free,
decode(segment_type, 'TABLE PARTITION', p.last_analyzed, t.last_analyzed) last_analyzed,
s.mbytes
from (select *
from (select tablespace_name,
owner,
segment_name,
trunc(bytes/1024/1024) mbytes,
segment_type,
a.partition_name
from dba_segments a
where segment_type in ('TABLE', 'TABLE PARTITION')
and owner!='SYS'
order by bytes desc)
where rownum <= 500) s,
dba_tables t,
dba_tab_partitions p
where t.owner(+) = s.owner
and t.table_name(+) = s.segment_name
and p.table_owner(+) = s.owner
and p.table_name(+) = s.segment_name
and p.partition_name(+) = s.partition_name) g
order by mbytes desc) j order by est_mbytes-mbytes;
And here is a sample output:
The most important columns are:
- MBYTES - actual size of the segment;
- EST_MBYTES - estimated size of the segment.
I found the actual benefit after moving the table can be different - for some tables it was as estimated, but for some it was not. Even if the statistics were gathered having estimate_percent=100 the situation was not better. I think the procedure is not able to handle all the cases properly, but anyway, that’s a very good starting point to find where the space might have been wasted.