According to the AASHTO's Highway Safety Manual, a Safety Performance Function (SPF) is an equation developed by statistical modeling, used to predict the average number of crashes per year at a location as a function of exposure and, in some cases, roadway or intersection characteristics (e.g., number of lanes, traffic control, or median type).
To get to know what these functions are and what their purpose is we need to know a few concepts first, some definitions and maybe some sample illustrations. I'll try to regress from the terminology "safety performance function" to the essential knowledge required for generating one of them. There are a few terms we should know the meaning for in the definition above so that we could have the best understanding about an SPF. First term is safety, then what is meant by the performance of safety?, what is a statistical model? and what is exposure? Other terms will pop up as we travel through this journey to developing a SPF.
Safety
When do we feel safe while on the roads? Probably you would say when we can comfortably drive and do not feel any imminent danger, right? So what happens if we do not feel safe, or the road is dangerous? Probably accidents happen, accidents of different kinds, you may run off the road, you may hit a deer, a pedestrian, hit another vehicle or bump into a road sign, etc. So, safety is somehow directly related to the potential for accidents, and if we could quantify that we can come up with a measure of safety. In the field of traffic or transportation engineering, the safety is defined by the number of accidents (of a specific type and/or severity) in a population of transportation units (such as road segments, intersections, etc.).Okay now we have new terminologies to define such as accident type, severity, or population of units.
Population
A population is defined by some common features. For example, the population of women with a height between 5 to 6 feet. Similarly, the signalized T-intersections in Denver create a population. As it can be inferred from these examples, it is we who define the population using the common features (e.g. all intersections are T-shaped, signalized, and located in Denver). Now, let's do an exercise, let's say the common features of the transportation units are just being intersections and located in Denver. What would be the size of this population? let's assume 2000 units. Now add another feature that has to be common among the units, say their shape is T-intersection, we see that the size will reduce to, say, 500 and if we add another feature for them being signalized, it can reduce to maybe 150 units. We see that as we add more common features in the definition of our population its size decreases but it will become more homogeneous, i.e. more similar to each other. There can be time that we have inserted too many common features in our definition of the population that there can be no unit left in there. Even that is OK, and we still have a population, although it includes no units.
Question is, what populations are used in developing SPFs then? As it was demonstrated above, we do not want to have a population with too many common features, so that there will be a good sample size for the statistical power when conducting the modeling procedure. Low sample size will result in unreliable estimates for the SPF and we do not want that. Therefore, using a few most important common features will suffice in determining the population; however, we know that the units in our sample are not exactly the same; they are just similar to each other on the basis of the chosen features.
Sample
So what is the sample we talked about above?! what is the difference between a sample and population? I would like to say that a population is all the units that possibly exist with a set of certain common features. Can we get the data for all of them? No! So we collect the data for a sample from that population and assume that this sample is representative of the population. Then we do our calculation and whatever statistical modeling we want to do on this collected sample and attribute the results of it to the population of interest from which the sample was collected. One, however, must know that the exact true safety of the population cannot be determined and what is found from the sample analysis is an estimate of that actual true population safety. That is why, for example, the safety determined for a sample is said to be the expected value of the true safety (for the population).
Units
Now let's look at the matter from a different angle. Let's pick a transportation unit in Denver, for example an intersection with certain features (e.g. T-intersection, signalized, with exclusive right and left turn lanes, with lighting at night, equal AADTs from all legs, lane widths of 12 ft, with sidewalks). We can say that this intersection belongs to a population of intersections with the same features, we do not know how many units are or can be in that population but we know that they are very similar to each other because we have selected many common features for defining that population. We can also say that since the intersections in that population are very similar to each other their safety is supposed to be similar too, otherwise there is some safety related feature we are not considering in defining the population. By safety-related feature it means that the feature affects the potential of accidents happening, e.g. the bird's nest close to the intersection is not a safety-related feature. We can say that there is an average value (expected value) for the number of accidents in this population. Let's call this average µ1. They are not exactly the same because we may have still missed some safety-related feature in our definition and also there is always some extent of randomness in nature.
Now we pick another intersection with certain features (e.g. 4-leg intersection, un-signalized, no lighting at night, lane widths of 10 ft, no sidewalks). Similarly, there is an average value (expected value) for the number of accidents in this population. Let's call this average µ2.
Each one of these hypothetical populations have their own average safety measures (number of accidents). Now, if we choose all the intersections in Denver, we come up with a sample that includes all the previously picked intersections, because we have broadened our search criteria to only intersection (any type) located in Denver. This is still a population of units with common feature (all intersections) and we can determine the safety measure for it. Here is when the statistical modeling comes into play to find the average value for the number of accidents in this bigger sample of interest.
What we know is that each of the intersections in our sample, itself is coming from a hypothetical population with an average value for its safety. And now for this big sample we want to find an average safety which is essentially the average of all those averages to be the final safety measure. That is again an expected value and we can call it µ(µi), or E(µ) the overall average of the number of accidents over all the units (intersections in Denver).
One might say this is not so accurate as there is too many differences among the safety-related features of these units. Well, that is true and that is why it is called an estimation of the safety and not the exact true value. It will all still be a property of such a sample, and not the population from which this (huge) sample is being drawn from.
(TO BE CONTINUED ON VARIANCE OF AVERAGES)