Class Notes
(810,577)

Canada
(494,155)

University of Waterloo
(18,173)

Computer Science
(752)

CS 137
(16)

Andrew Morton
(15)

Lecture 8

# Programming Lecture 8.docx

Unlock Document

University of Waterloo

Computer Science

CS 137

Andrew Morton

Fall

Description

Programming Lecture 8
October 9, 2012
Floating-Point Numbers
Scientific notation
o -2.603 * 10 30 Type Storage Precision Exponent
float 4 bytes ~ 7 digits ± 38
double* 8 bytes ~ 16 digits ± 308
long double 12 bytes ~ 34 digits ± 4931
Integer types
o float x = 1.0f;
double y = 1.0; // defaults to “double”
o #include
int main ()
{
double x = -2.6302e30;
printf(“%d\n”, sizeof(double)); // floating point
printf(“%.2e\n”, x) // scientific notation with 2 decimal points
printf(“%g\n”, x); // shortest of %f or %e
return 0;
}
Output:
8
-26 302 000 000 000 000 128 242 488 967 168.000 000
-2.63e+30
-2.6302e+30
o double precision
Sign
Exponent
11 bits
Fraction
52 bits
Value = (-1) x 1.f x 2’-1023
Provides an APPROXIMATE representation for real numbers
No irrational numbers
0.110 0.000110011…
Representation error
o r: real number to represent
π, √2
o p: approximate representation
o Absolute error = |p-r|
|3.14- π| ~~ 0.0015927
o Relative error
|p-r|/|r| ~~ 0.000507
Can be large if r is small o Care is required when
Subtracting nearly equal values
Dividing by very large numbers
Testing for equality
Eg:
o if (x==y)… RISKY
o if (x-y < 0.0001 && y-x < 0.0001)… SAFER
o if (fabs(x-y)<0.0001)...
Testing against constants is sometimes okay
Eg:

More
Less
Related notes for CS 137