### Tags: error, matlab, programming, regression, regstatsfunction, returning, root, squared, value, working

# Root Mean Squared Error

On Programmer » Matlab

6,370 words with 5 Comments; publish: Sat, 26 Apr 2008 22:54:00 GMT; (200140.63, « »)

I am working on a regression problem and am using the regstats

function. I am returning the mean squared error value of my result

(and taking the square root) and comparing it to my own calculation

of the RMS error. I am finding a difference between the two. The

matlab mse is dividing the squared error by (n - var - 1) where n is

the number of data points and var is the number of variables. My

question is why does matlab subtract the number of variables from the

number of data points? Thanks.

*http://matlab.questionfor.info/q_matlab_52374.html*

All Comments

Leave a comment...

- 5 Comments
- In article <ef56432.-1.matlab.questionfor.info.webcrossing.raydaftYaTP>,
John Sheldon <john.sheldon.matlab.questionfor.info.gmail.com> wrote:

>I am working on a regression problem and am using the regstats

>function. I am returning the mean squared error value of my result

>(and taking the square root) and comparing it to my own calculation

>of the RMS error. I am finding a difference between the two. The

>matlab mse is dividing the squared error by (n - var - 1) where n is

>the number of data points and var is the number of variables. My

>question is why does matlab subtract the number of variables from the

>number of data points? Thanks.

In my (limited) experience, when the number of variables is subtracted out,

it usually has to do with "degrees of freedom".

--

I was very young in those days, but I was also rather dim.

-- Christopher Priest

#1; Sat, 26 Apr 2008 22:56:00 GMT

- In article <ef56432.-1.matlab.questionfor.info.webcrossing.raydaftYaTP>,
- John Sheldon wrote:
> I am working on a regression problem and am using the regstats

> function. I am returning the mean squared error value of my result

> (and taking the square root) and comparing it to my own calculation

> of the RMS error. I am finding a difference between the two. The

> matlab mse is dividing the squared error by (n - var - 1) where n is

> the number of data points and var is the number of variables. My

> question is why does matlab subtract the number of variables from the

> number of data points? Thanks.

John, that's the standard definition for the MSE as an estimator of the

population variance sigma^2. If it helps, imagine what would it be if you h

ad

no vars, only a constant -- the usual (unbiased) estimator of the variance f

or a

normal distribution. Hope this helps.

- Peter Perkins

The MathWorks, Inc.

#2; Sat, 26 Apr 2008 22:57:00 GMT

- John Sheldon wrote:
- On May 8, 4:13 pm, "John Sheldon" <john.shel....matlab.questionfor.info.gmail.com> wrote:
> I am working on aregressionproblem and am using the regstats

> function. I am returning the mean squared error value of my result

> (and taking the square root) and comparing it to my own calculation

> of the RMS error. I am finding a difference between the two. The

> matlab mse is dividing the squared error by (n - var - 1) where n is

> the number of data points and var is the number of variables. My

> question is why does matlab subtract the number of variables from the

> number of data points? Thanks.

If you estimate the MSE of a model with p+1 parameters using n

observations that were not used to estimate the parameters, then

MSE = SSE/n

yields an unbiased estimate.

However, that formula will yield a biased estimate if those n

observations were used to estimate the p+1 parameters.

In the latter case the number of independent observations

is n-(p+1) and the formula for an unbiased estimate is

MSE = SSE/(n-p-1).

In the simplest case where the model is just the mean value,

p = 0 and the MSE is just the unbiased estimate of the

sample variance with n-1 in the denominator.

It is assumed that n > p+1 and the system of n equations

for p+1 variables is overdetermined.

If n = p+1 then the system of n equations for n variables

should have an exact solution. Therefore SSE = 0,

the ratio SSE/(n-p-1) is indeterminate (0/0), and MSE

is undefined.

Hope this helps.

Greg

#3; Sat, 26 Apr 2008 22:58:00 GMT

- On May 8, 4:13 pm, "John Sheldon" <john.shel....matlab.questionfor.info.gmail.com> wrote:
- On May 9, 8:56 am, Greg Heath <h....matlab.questionfor.info.alumni.brown.edu> wrote:
> On May 8, 4:13 pm, "John Sheldon" <john.shel....matlab.questionfor.info.gmail.com> wrote:

>

> If you estimate the MSE of a model with p+1 parameters using n

> observations that were not used to estimate the parameters, then

> MSE = SSE/n

> yields an unbiased estimate.

> However, that formula will yield a biased estimate if those n

> observations were used to estimate the p+1 parameters.

> In the latter case the number of independent observations

> is n-(p+1) and the formula for an unbiased estimate is

> MSE = SSE/(n-p-1).

> In the simplest case where the model is just the mean value,

> p = 0 and the MSE is just the unbiased estimate of the

> sample variance with n-1 in the denominator.

> It is assumed that n > p+1 and the system of n equations

> for p+1 variables is overdetermined.

> If n = p+1 then the system of n equations for n variables

> should have an exact solution. Therefore SSE = 0,

> the ratio SSE/(n-p-1) is indeterminate (0/0), and MSE

> is undefined.

If n = p+1 then the system of n equations for n variables

should have an exact solution. Therefore SSE = 0,

and the biased estimate SSE/n = 0. However, the ratio

SSE/(n-p-1) is indeterminate (0/0), and the unbiased

estimate for MSE is undefined.

Hope this helps.

Greg

#4; Sat, 26 Apr 2008 22:59:00 GMT

- On May 9, 8:56 am, Greg Heath <h....matlab.questionfor.info.alumni.brown.edu> wrote:
- Greg Heath wrote:
>

> On May 9, 8:56 am, Greg Heath <h....matlab.questionfor.info.alumni.brown.edu> wrote:

<john.shel....matlab.questionfor.info.gmail.com> wrote:

regstats

my

> result

> calculation

two.

> The

where

> n is

variables.

> My

variables

> from the

then

> If n = p+1 then the system of n equations for n variables

> should have an exact solution. Therefore SSE = 0,

> and the biased estimate SSE/n = 0. However, the ratio

> SSE/(n-p-1) is indeterminate (0/0), and the unbiased

> estimate for MSE is undefined.

> Hope this helps.

> Greg

>

Thanks Greg,

Your explanation was very clear and helpful. I thought it was

something simple, I just couldn't put my finger on it at the time.

Cheers,

John

#5; Sat, 26 Apr 2008 23:00:00 GMT

- Greg Heath wrote: