Statistically Significant

Andrew Landgraf's Blog

Rounding in R

| Comments

Forgive me if you are already aware of this, but I found it quite alarming. I know that most code is interpreted by the computer in binary and we input in decimal, so problems can arise in conversion and with floating point. But the example I have below is so simple that it really surprised me.

I was converting a function from R into MATLAB so that a colleague could use it. I tested it out on the same data and got slightly different results. Digging into the problem, the difference was due to the fact that R was rounding 4.5 to 4 and MATLAB was rounding it to 5. I thought the “4.5” must have really been “4.49999…”. But that was not so.

For example, this is the result of the round function for a few numbers.
> round(0.5,0)
[1] 0
> round(1.5,0)
[1] 2
> round(2.5,0)
[1] 2
> round(3.5,0)
[1] 4
> round(4.5,0)
[1] 4
> round(5.5,0)
[1] 6
> round(6.5,0)
[1] 6

Do you see a pattern?

I tried this on versions 2.13.1 and 2.14.0. I ran the same with MATLAB and it gave the expected results. I am not any kind of expert on computer sciences, so I was not sure why this is happening. Converting any decimal number that ends in .5 into binary results in a finite length binary number. For example, 4.5 is 100.1 in binary. Because of this, I wouldn’t think the error would be due to floating points, but I really don’t know.

Looking at the documentation for round, I found the reason. It states in the notes, “Note that for rounding off a 5, the IEC 60559 standard is expected to be used, ‘go to the even digit’.” It is a little comforting knowing that there is a logic behind it and that R is abiding to some standard. But why isn’t MATLAB abiding by the same standard? Also, I think most people expect numbers ending in .5 to round up, not the nearest even digit.

Comments

Analytic Bastard
kudos Blaise
Anonymous
Andrew wrote "Also, I think most people expect numbers ending in .5 to round up (not the nearest even digit)". This kind of rounding is in German called "kaufmännische Rundung" (rounding in commerce). For this purpose I use the following function:

#Definition of a function for "rounding in commerce"
cround = function(x,n){
vorz = sign(x)
z = abs(x)*10^n
z = z + 0.5
z = trunc(z)
z = z/10^n
z*vorz
}

# Example
> round(seq(0.5,6.5,1),0)
[1] 0 2 2 4 4 6 6
> cround(seq(0.5,6.5,1),0)
[1] 1 2 3 4 5 6 7
cellocgw
This "round to even" approach has been accepted by just about everyone (except matlab, and no surprise, except Msoft Excel).
Sadly, the flame wars over "round to even" vs. "round up" continue, rather the way people argue about "0.999… != 1"

PS: @a Tom: I'm highly skeptical of your
claim about 2.46–>3. Do you have a citation?
a Tom
I'm ever amazed that something so seemingly basic can have so many different approaches.

I understand that in many middle east countries they start with the far right digit and round up or down, so 2.46 is rounded to 3!
Blaise
This is discussed in Don Knuth's 1973 classic Seminumerical Algorithms. He gives the following example of what can happen when 5s are always rounded upwards. Suppose u = 10000000 and v = 0.5555556. Then u + v = 1.5555556. If we subtract v from this result we get u' = 1.0000001. Adding and then subtracting v from u' and we get 1.0000002 and if we do it again we get 1.0000003 and so on. He says "This phenomenon, called drift, will not occur when we use a stable rounding rule based on the parity of the least significant digit."
Anonymous
I was the #2 anonymous poster. Echoing Ben, I think that for ease of teaching, the "round 5 up" method is taught to children (and adults?) below the university level, and only if you go on for advance work is the more complicated method taught.

Can you imagine trying to teach a 10 or 12 year old the IEC 60559 standard? Unfortunately, this is the method most adults are used to…

I agree, it is a little troubling that Matlab doesn't abide by the standard. Yet another reason to stick with R!
Ben Bolker
Wikipedia ( http://en.wikipedia.org/wiki/Rounding#Round_half_to_even ) says of round-to-even:

This method also treats positive and negative values symmetrically, and therefore is free of overall bias if the original numbers are positive or negative with equal probability. In addition, for most reasonable distributions of y values, the expected (average) value of the rounded numbers is essentially the same as that of the original numbers, even if the latter are all positive (or all negative). However, this rule will still introduce a positive bias for even numbers (including zero), and a negative bias for the odd ones.

So round-to-even seems to have *slightly* better numerical properties than "round ties away from zero", which is what is (I think) most often taught, because it's easier to understand. http://www.mathworks.com/matlabcentral/fileexchange/6752 gives a MATLAB function for "round to even".

If I had to guess I would predict that in borderline cases (which this certainly is) MATLAB would favor "do what will lead to happier users" and R would favor "do what is thought to be the best numerical practice".
Anonymous
Hi,
I'm not sure I understand what you mean by "expected results"?

Regarding rounding, I was taught to round numbers ending in "1, 2, 3, and 4" *down*, and numbers that ended in "6, 7, 8, 9" *up*. Then, specifically regarding "5", if the preceding digit is odd, round up and if the preceding digit is even, to round down.

As you can see, this will then result in 50% of the numbers being rounded up, and 50% rounded down. If you round *down* on "1, 2, 3, 4" and round up on "5, 6, 7, 8, 9" you are rounding up 5/9th's of the time, and so introducing a bias.

It sounds like R is handling it the way I would. Is that what you were wondering about?
Anonymous
To learn something about how computers handle numbers, especially as it relates to statistics and econometrics:

B. D. McCullough and H. D. Vinod
"The Numerical Reliability of Econometric Software,"
Journal of Economic Literature 37(2), 633-665, 1999

A temporary link is available here:
http://www.pages.drexel.edu/~bdm25/jel.pdf

Comments