I’ve been toying with the idea of writing some more articles on statistics (maybe an additional chapter in the SQLT book), because it is a subject that keeps coming up and and seems to need more explanation than can be found in the documentation. Just as I’d got round to it, a new version of Oracle was released with new features, so this time I’m going to jump straight into 12c and talk about that instead.
As this small area of Oracle is vast just by itself, I’ll chew small pieces in the order that they come across my desk.
One of the new statistics features that was of interest to me (because I was always fixing tuning problems caused by situations that this feature is designed to fix) was the Top Frequency Histogram.
Let’s just quickly recap on what a Frequency histogram is. 254 bins (or fewer) that count the number of values of the specific value. So for example if I had only three rows: 1, 2 and 3 I would have 3 bins (labelled 1, 2 and 3) each with a value of 1. If I had four values of 1,2,3,3, I would still have 3 bins with values 1,1,2. So the last bin indicates that “3″ was seen twice. Simple enough.
The maximum number of bins in Oracle is 254 and as we all know there are almost always more than 254 bins for frequency distributions (what would the frequency histogram look like for that!). With 12c we have a new concept, the Top Frequency histogram. In this kind of histogram we assume that even though we have more than 254 distinct values, the popular values are still within 254 bins. Or to put it another way, by ignoring some values we can make a histogram which includes the vast majority of actual values thus giving us a better chance to develop a good execution plan. This type of histogram should be an improvement over HEIGHT BALANCED histograms which were always losing popular values.
So when you next see TOP-FREQUENCY in the HISTOGRAM column of DBA_TAB_COL_STATISTICS you should be reassured.