Programming Lecture 8 October 9, 2012 Floating-Point Numbers  Scientific notation o -2.603 * 10 30 Type Storage Precision Exponent float 4 bytes ~ 7 digits ± 38 double* 8 bytes ~ 16 digits ± 308 long double 12 bytes ~ 34 digits ± 4931  Integer types o float x = 1.0f; double y = 1.0; // defaults to “double” o #include int main () { double x = -2.6302e30; printf(“%d\n”, sizeof(double)); // floating point printf(“%.2e\n”, x) // scientific notation with 2 decimal points printf(“%g\n”, x); // shortest of %f or %e return 0; }  Output:  8  -26 302 000 000 000 000 128 242 488 967 168.000 000  -2.63e+30  -2.6302e+30 o double precision  Sign  Exponent  11 bits  Fraction  52 bits  Value = (-1) x 1.f x 2’-1023  Provides an APPROXIMATE representation for real numbers  No irrational numbers  0.110 0.000110011…  Representation error o r: real number to represent  π, √2 o p: approximate representation o Absolute error = |p-r|  |3.14- π| ~~ 0.0015927 o Relative error  |p-r|/|r| ~~ 0.000507  Can be large if r is small o Care is required when  Subtracting nearly equal values  Dividing by very large numbers  Testing for equality  Eg: o if (x==y)… RISKY o if (x-y < 0.0001 && y-x < 0.0001)… SAFER o if (fabs(x-y)<0.0001)...  Testing against constants is sometimes okay  Eg:
