Float

FriCAS provides two kinds of floating point numbers. The domain Float implements a model of arbitrary precision floating point numbers. The domain DoubleFloat is intended to make available hardware floating point arithmetic in FriCAS. The actual model of floating point that DoubleFloat provides is system-dependent. For example, on the IBM system 370 FriCAS uses IBM double precision which has fourteen hexadecimal digits of precision or roughly sixteen decimal digits. Arbitrary precision floats allow the user to specify the precision at which arithmetic operations are computed. Although this is an attractive facility, it comes at a cost. Arbitrary-precision floating-point arithmetic typically takes twenty to two hundred times more time than hardware floating point.

Introduction to Float

Scientific notation is supported for input and output of floating point numbers. A floating point number is written as a string of digits containing a decimal point optionally followed by the letter “E”, and then the exponent.

We begin by doing some calculations using arbitrary precision floats. The default precision is twenty decimal digits.

1.234
 1.234
                    Type: Float

A decimal base for the exponent is assumed, so the number 1.234E2 denotes 1.234x10^2.

1.234E2
  123.4
                    Type: Float

The normal arithmetic operations are available for floating point numbers.

sqrt(1.2 + 2.3 / 3.4 ** 4.5)
  1.0996972790 671286226
                    Type: Float

Conversion Functions

You can use conversion to go back and forth between Integer, Fraction Integer and Float, as appropriate.

i := 3 :: Float
  3.0
                    Type: Float

i :: Integer
  3
                    Type: Integer

i :: Fraction Integer
  3
                    Type: Fraction Integer

Since you are explicitly asking for a conversion, you must take responsibility for any loss of exactness.

r := 3/7 :: Float
  0.4285714285 7142857143
                    Type: Float

r :: Fraction Integer
  3
  -
  7
                    Type: Fraction Integer

This conversion cannot be performed: use truncate or round if that is what you intend.

r :: Integer
 Cannot convert from type Float to Integer for value
 0.4285714285 7142857143

The operations truncate and round truncate

truncate 3.6
 3.0
                   Type: Float

and round to the nearest integral Float respectively.

round 3.6
  4.0
                    Type: Float

truncate(-3.6)
  - 3.0
                    Type: Float

round(-3.6)
  - 4.0
                    Type: Float

The operation fractionPart computes the fractional part of x, that is, x - truncate x.

fractionPart 3.6
  0.6
                    Type: Float

The operation digits allows the user to set the precision. It returns the previous value it was using.

digits 40
  20
                    Type: PositiveInteger

sqrt 0.2
  0.4472135954 9995793928 1834733746 2552470881
                    Type: Float

pi()$Float
  3.1415926535 8979323846 2643383279 502884197
                    Type: Float

The precision is only limited by the computer memory available. Calculations at 500 or more digits of precision are not difficult.

digits 500
  40
                    Type: PositiveInteger

pi()$Float
3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 592307816
4 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 505822317
2 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 442881097
5 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 454326648
2 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 917153643
6 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 575959195
3 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 891227938
1 830119491
                    Type: Float

Reset digits to its default value.

digits 20
  500
                    Type: PositiveInteger

Numbers of type Float are represented as a record of two integers, namely, the mantissa and the exponent where the base of the exponent is binary. That is, the floating point number (m,e) represents the number m x 2^e. A consequence of using a binary base is that decimal numbers can not, in general, be represented exactly.

Output Functions

A number of operations exist for specifying how numbers of type Float are to be displayed. By default, spaces are inserted every ten digits in the output for readability. Note that you cannot include spaces in the input form of a floating point number, though you can use underscores.

Output spacing can be modified with the outputSpacing operation. This inserts no spaces and then displays the value of x.

outputSpacing 0; x := sqrt 0.2
  0.44721359549995793928
                    Type: Float

Issue this to have the spaces inserted every 5 digits.

outputSpacing 5; x
  0.44721 35954 99957 93928
                    Type: Float

By default, the system displays floats in either fixed format or scientific format, depending on the magnitude of the number.

y := x/10**10
  0.44721 35954 99957 93928 E -10
                    Type: Float

A particular format may be requested with the operations outputFloating and outputFixed.

outputFloating(); x
  0.44721 35954 99957 93928 E 0
                    Type: Float

outputFixed(); y
  0.00000 00000 44721 35954 99957 93928
                    Type: Float

Additionally, you can ask for n digits to be displayed after the decimal point.

outputFloating 2; y
  0.45 E -10
                    Type: Float

outputFixed 2; x
  0.45
                    Type: Float

This resets the output printing to the default behavior.

outputGeneral()
                    Type: Void

Example: Determinant of a Hilbert Matrix

Consider the problem of computing the determinant of a 10 by 10 Hilbert matrix. The (i,j)-th entry of a Hilbert matrix is given by 1/(i+j+1).

First do the computation using rational numbers to obtain the exact result.

a: Matrix Fraction Integer:=matrix[ [1/(i+j+1) for j in 0..9] for i in 0..9]
       +    1   1   1   1   1   1   1   1    1+
       |1   -   -   -   -   -   -   -   -   --|
       |    2   3   4   5   6   7   8   9   10|
       |                                      |
       |1   1   1   1   1   1   1   1    1   1|
       |-   -   -   -   -   -   -   -   --  --|
       |2   3   4   5   6   7   8   9   10  11|
       |                                      |
       |1   1   1   1   1   1   1    1   1   1|
       |-   -   -   -   -   -   -   --  --  --|
       |3   4   5   6   7   8   9   10  11  12|
       |                                      |
       |1   1   1   1   1   1    1   1   1   1|
       |-   -   -   -   -   -   --  --  --  --|
       |4   5   6   7   8   9   10  11  12  13|
       |                                      |
       |1   1   1   1   1    1   1   1   1   1|
       |-   -   -   -   -   --  --  --  --  --|
       |5   6   7   8   9   10  11  12  13  14|
       |                                      |
       |1   1   1   1    1   1   1   1   1   1|
       |-   -   -   -   --  --  --  --  --  --|
       |6   7   8   9   10  11  12  13  14  15|
       |                                      |
       |1   1   1    1   1   1   1   1   1   1|
       |-   -   -   --  --  --  --  --  --  --|
       |7   8   9   10  11  12  13  14  15  16|
       |                                      |
       |1   1    1   1   1   1   1   1   1   1|
       |-   -   --  --  --  --  --  --  --  --|
       |8   9   10  11  12  13  14  15  16  17|
       |                                      |
       |1    1   1   1   1   1   1   1   1   1|
       |-   --  --  --  --  --  --  --  --  --|
       |9   10  11  12  13  14  15  16  17  18|
       |                                      |
       | 1   1   1   1   1   1   1   1   1   1|
       |--  --  --  --  --  --  --  --  --  --|
       +10  11  12  13  14  15  16  17  18  19+
                   Type: Matrix Fraction Integer

This version of determinant uses Gaussian elimination.

d:= determinant a
                            1
  -----------------------------------------------------
  46206893947914691316295628839036278726983680000000000
                   Type: Fraction Integer

d :: Float
  0.21641 79226 43149 18691 E -52
                   Type: Float

Now use hardware floats. Note that a semicolon (;) is used to prevent the display of the matrix.

b: Matrix DoubleFloat:=matrix[ [1/(i+j+1\$DoubleFloat) for j in 0..9] for i in 0..9];


                   Type: Matrix DoubleFloat

The result given by hardware floats is correct only to four significant digits of precision. In the jargon of numerical analysis, the Hilbert matrix is said to be ill-conditioned.

determinant b
  2.1643677945721411E-53
                   Type: DoubleFloat

Now repeat the computation at a higher precision using Float.

digits 40
  20
                   Type: PositiveInteger

c: Matrix Float := matrix [ [1/(i+j+1\$Float) for j in 0..9] for i in 0..9];
                   Type: Matrix Float

determinant c
  0.21641 79226 43149 18690 60594 98362 26174 36159 E -52
                   Type: Float

Reset digits to its default value

digits 20
  40
                   Type: PositiveInteger

See Also:

  • )help DoubleFloat
  • )show Float

Table Of Contents

This Page