I was looking the other day to get a general view about the sizes of databases within my estate. To do this I started to write a query which basically summed up the sizes of the tablespaces and then did a count based on a minimum and maximum value. The query looked a bit like this:

select
‘Size 1TB to 1.2TB’ , count(*) from (
select sum(tablespace_size/1024/1024/1024)
from mgmt$db_tablespaces
group by target_name
having sum(tablespace_size/1024/1024/1024)
between 1000 and 1200);
UNION ALL
select ‘Size 0.8TB to 1TB’, count(*) from (
select sum(tablespace_size/1024/1024/1024)
from mgmt$db_tablespaces
group by target_name
having sum(tablespace_size/1024/1024/1024)
between 800 and 1000);

As I was writing this I did think there must be a better way to generate this without having to specify specific boundaries and repeat the query over and over again, maybe generate some sort of histogram to see the spread of data. With a bit of digging I came across this:

http://www.oralytics.com/2013/04/part-1getting-started-with-statistics.html

http://www.oralytics.com/2013/04/part-2getting-start-with-statistics-for.html

http://www.oralytics.com/2014/07/bucketwidth-calculating-size-of-bucket.html

**These are definitely worth a read and were the basis for this blog, thank you @brendantierney**

This lead me to the SQL function WIDTH_BUCKET. The next thing I needed to do was translate the example given to calculating the spread of database sizes within 12c OEM.

WIDTH_BUCKET requires 4 inputs:

- Expression
- Min value of the first bucket
- Max value of the last bucket
- Number of buckets

Easiest way to do this is build up the query in parts:

The expression is the size of the databases:

sum(tablespace_size/1024/1024/1024)

So for the min and max values I can use this:

select min(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group by target_name

select max(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group by target_name

Number of buckets is:

10

So the first iteration of the query is:

select
target_name,
sum(tablespace_size/1024/1024/1024),
width_bucket(sum(tablespace_size/1024/1024/1024),
(select min(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group
by target_name),
(select max(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group
by target_name),10)
from
mgmt$db_tablespaces
group by target_name

The output is as follows (not all of it):

Target Name Size GB Width Bucket
DB1 249.8046875 1
DB2 906.9814453125 1
DB3 2091.9165802001953125 2
DB4 8.25 1
DB5 82.8408050537109375 1

So this tells us the size of the database and which bucket in terms of size that fits into (between bucket 1 and 10, 1 being the lowest). So the next thing I need to do is look at the estate as a whole and look at the numbers of databases in each bucket. The width_bucket column is given an alias and we simply count that alias for each width bucket. The output is below.

select intvl, count(*) freq from
(select width_bucket(sum(tablespace_size/1024/1024/1024),(select min(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group by target_name),(select max(sum(tablespace_size/1024/1024/1024)) from mgmt$db_tablespaces group by target_name),10) intvl
from
mgmt$db_tablespaces
group by target_name)
group by intvl
order by 1

Bucket Count
1 695
2 15
3 2
4 1
5 2
6 5
7 3
8 4
9 1
10 3
11 1

Where did 11 come from? The 11^{th} bucket is for anything that doesn’t fit in the maximum and minimum values. (but you specified the max and min values based on the data!!!). This is due to the way that the data is rounded up or down by the width_bucket function. Hence if you change width_bucket expression from:

sum(tablespace_size/1024/1024/1024)

To:

round(sum(tablespace_size/1024/1024/1024),0)

All the data falls into the relevant buckets.

The final piece of the puzzle is to provide some idea on what those bucket sizes are (note that we round all values to 0 apart from the max size to ensure everything falls within our 10 bucket values):

select bucket, round(max(db_size),0) MAX_SIZE_GB, count(*) from (
select target_name,
sum(tablespace_size/1024/1024/1024) db_size,
width_bucket(round(sum(tablespace_size/1024/1024/1024),0),
(select round(min(sum(tablespace_size/1024/1024/1024)),0) from mgmt$db_tablespaces group by target_name),
(select round(max(sum(tablespace_size/1024/1024/1024)),1) from mgmt$db_tablespaces group by target_name),10) bucket
from
mgmt$db_tablespaces
group by target_name)
group by bucket
order by 1

Bucket Database Max Size GB Count(*)
1 1207 695
2 2425 15
3 3404 2
4 4768 1
5 5590 2
6 6301 5
7 7859 3
8 9716 4
9 10328 1
10 12168 4

So it looks like most of my databases are under 1.2TB. Now I know my general spread of data I can re-run the query and input my own value:

select bucket, round(max(db_size),0) MAX_SIZE_GB, count(*) from (
select target_name,
sum(tablespace_size/1024/1024/1024) db_size,
width_bucket(round(sum(tablespace_size/1024/1024/1024),0),
(select round(min(sum(tablespace_size/1024/1024/1024)),0) from mgmt$db_tablespaces group by target_name),
1207,10) bucket
from
mgmt$db_tablespaces
group by target_name)
group by bucket
order by 1

Bucket Database Max Size GB Count(*)
1 120 607
2 234 33
3 326 18
4 419 7
5 584 4
6 704 10
7 798 4
8 907 2
9 1038 4
10 1158 5
11 12168 38

Ok so further analysis makes it look like most of my databases are less than 120GB, this is getting embarrassing! Also notice that the max bucket now stops just under 1207 GB and everything above this is captured in 11^{th} bucket. I am going to stop there but hopefully you can see the benefit of this v cool SQL Function.