====================================================================
Float
====================================================================

FriCAS provides two kinds of floating point numbers.  The domain Float
implements a model of arbitrary precision floating point numbers.  The
domain DoubleFloat is intended to make available hardware floating
point arithmetic in FriCAS.  The actual model of floating point that
DoubleFloat provides is system-dependent.  For example, on the IBM
system 370 FriCAS uses IBM double precision which has fourteen
hexadecimal digits of precision or roughly sixteen decimal digits.
Arbitrary precision floats allow the user to specify the precision at
which arithmetic operations are computed.  Although this is an
attractive facility, it comes at a cost.  Arbitrary-precision
floating-point arithmetic typically takes twenty to two hundred times
more time than hardware floating point.

---------------------
Introduction to Float
---------------------

Scientific notation is supported for input and output of floating
point numbers.  A floating point number is written as a string of
digits containing a decimal point optionally followed by the letter
"E", and then the exponent.

We begin by doing some calculations using arbitrary precision floats.
The default precision is twenty decimal digits. ::

  1.234
   1.234
                      Type: Float

A decimal base for the exponent is assumed, so the number 
1.234E2 denotes 1.234x10^2. ::

  1.234E2
    123.4
                      Type: Float
                      
The normal arithmetic operations are available for floating point numbers. ::

  sqrt(1.2 + 2.3 / 3.4 ** 4.5)
    1.0996972790 671286226
                      Type: Float

--------------------
Conversion Functions
--------------------

You can use conversion to go back and forth between Integer, Fraction
Integer and Float, as appropriate. ::

  i := 3 :: Float
    3.0
                      Type: Float

  i :: Integer
    3
                      Type: Integer

  i :: Fraction Integer 
    3
                      Type: Fraction Integer

Since you are explicitly asking for a conversion, you must take
responsibility for any loss of exactness. ::

  r := 3/7 :: Float 
    0.4285714285 7142857143
                      Type: Float

  r :: Fraction Integer
    3
    -
    7
                      Type: Fraction Integer

This conversion cannot be performed: use truncate or round if that is
what you intend. ::

  r :: Integer
   Cannot convert from type Float to Integer for value
   0.4285714285 7142857143

The operations truncate and round truncate ::

   truncate 3.6
    3.0
                      Type: Float

and round to the nearest integral Float respectively. ::

  round 3.6
    4.0
                      Type: Float

  truncate(-3.6)
    - 3.0
                      Type: Float

  round(-3.6)
    - 4.0
                      Type: Float

The operation fractionPart computes the fractional part of x, that is,
x - truncate x. ::

  fractionPart 3.6
    0.6
                      Type: Float

The operation digits allows the user to set the precision.  It returns
the previous value it was using. ::

  digits 40 
    20
                      Type: PositiveInteger

  sqrt 0.2
    0.4472135954 9995793928 1834733746 2552470881
                      Type: Float

  pi()$Float
    3.1415926535 8979323846 2643383279 502884197
                      Type: Float

The precision is only limited by the computer memory available.
Calculations at 500 or more digits of precision are not difficult. ::

  digits 500
    40
                      Type: PositiveInteger

  pi()$Float
  3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 592307816
  4 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 505822317
  2 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 442881097
  5 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 454326648
  2 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 917153643
  6 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 575959195
  3 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 891227938
  1 830119491
                      Type: Float

Reset digits to its default value. ::

  digits 20
    500
                      Type: PositiveInteger

Numbers of type Float are represented as a record of two
integers, namely, the mantissa and the exponent where the base of the
exponent is binary.  That is, the floating point number (m,e)
represents the number m x 2^e.  A consequence of using a binary
base is that decimal numbers can not, in general, be represented
exactly.

----------------
Output Functions
----------------

A number of operations exist for specifying how numbers of type Float
are to be displayed.  By default, spaces are inserted every ten digits
in the output for readability. Note that you cannot include spaces in
the input form of a floating point number, though you can use
underscores.

Output spacing can be modified with the outputSpacing operation.  This
inserts no spaces and then displays the value of x. ::

  outputSpacing 0; x := sqrt 0.2
    0.44721359549995793928
                      Type: Float

Issue this to have the spaces inserted every 5 digits. ::

  outputSpacing 5; x
    0.44721 35954 99957 93928
                      Type: Float

By default, the system displays floats in either fixed format
or scientific format, depending on the magnitude of the number. ::

  y := x/10**10
    0.44721 35954 99957 93928 E -10
                      Type: Float

A particular format may be requested with the operations
outputFloating and outputFixed. ::

  outputFloating(); x 
    0.44721 35954 99957 93928 E 0
                      Type: Float

  outputFixed(); y 
    0.00000 00000 44721 35954 99957 93928
                      Type: Float

Additionally, you can ask for n digits to be displayed after the
decimal point. ::

  outputFloating 2; y 
    0.45 E -10
                      Type: Float

  outputFixed 2; x 
    0.45
                      Type: Float

This resets the output printing to the default behavior. ::

  outputGeneral()
                      Type: Void

----------------------------------------
Example: Determinant of a Hilbert Matrix
----------------------------------------

Consider the problem of computing the determinant of a 10 by 10 
Hilbert matrix.  The (i,j)-th entry of a Hilbert matrix is given 
by 1/(i+j+1).

First do the computation using rational numbers to obtain the
exact result. ::

  a: Matrix Fraction Integer:=matrix[ [1/(i+j+1) for j in 0..9] for i in 0..9]
         +    1   1   1   1   1   1   1   1    1+
         |1   -   -   -   -   -   -   -   -   --|
         |    2   3   4   5   6   7   8   9   10|
         |                                      |
         |1   1   1   1   1   1   1   1    1   1|
         |-   -   -   -   -   -   -   -   --  --|
         |2   3   4   5   6   7   8   9   10  11|
         |                                      |
         |1   1   1   1   1   1   1    1   1   1|
         |-   -   -   -   -   -   -   --  --  --|
         |3   4   5   6   7   8   9   10  11  12|
         |                                      |
         |1   1   1   1   1   1    1   1   1   1|
         |-   -   -   -   -   -   --  --  --  --|
         |4   5   6   7   8   9   10  11  12  13|
         |                                      |
         |1   1   1   1   1    1   1   1   1   1|
         |-   -   -   -   -   --  --  --  --  --|
         |5   6   7   8   9   10  11  12  13  14|
         |                                      |
         |1   1   1   1    1   1   1   1   1   1|
         |-   -   -   -   --  --  --  --  --  --|
         |6   7   8   9   10  11  12  13  14  15|
         |                                      |
         |1   1   1    1   1   1   1   1   1   1|
         |-   -   -   --  --  --  --  --  --  --|
         |7   8   9   10  11  12  13  14  15  16|
         |                                      |
         |1   1    1   1   1   1   1   1   1   1|
         |-   -   --  --  --  --  --  --  --  --|
         |8   9   10  11  12  13  14  15  16  17|
         |                                      |
         |1    1   1   1   1   1   1   1   1   1|
         |-   --  --  --  --  --  --  --  --  --|
         |9   10  11  12  13  14  15  16  17  18|
         |                                      |
         | 1   1   1   1   1   1   1   1   1   1|
         |--  --  --  --  --  --  --  --  --  --|
         +10  11  12  13  14  15  16  17  18  19+
                     Type: Matrix Fraction Integer

This version of determinant uses Gaussian elimination. ::

  d:= determinant a
                              1
    -----------------------------------------------------
    46206893947914691316295628839036278726983680000000000
                     Type: Fraction Integer

  d :: Float
    0.21641 79226 43149 18691 E -52
                     Type: Float

Now use hardware floats. Note that a semicolon (;) is used to prevent
the display of the matrix. ::

  b: Matrix DoubleFloat:=matrix[ [1/(i+j+1\$DoubleFloat) for j in 0..9] for i in 0..9]; 
 

                     Type: Matrix DoubleFloat

The result given by hardware floats is correct only to four
significant digits of precision.  In the jargon of numerical analysis,
the Hilbert matrix is said to be *ill-conditioned*. ::

  determinant b
    2.1643677945721411E-53
                     Type: DoubleFloat

Now repeat the computation at a higher precision using Float. ::

  digits 40 
    20
                     Type: PositiveInteger

  c: Matrix Float := matrix [ [1/(i+j+1\$Float) for j in 0..9] for i in 0..9];
                     Type: Matrix Float

  determinant c
    0.21641 79226 43149 18690 60594 98362 26174 36159 E -52
                     Type: Float

Reset digits to its default value ::

  digits 20
    40
                     Type: PositiveInteger

See Also:

* ``)help DoubleFloat``
* ``)show Float``