[mirror_edk2.git] / StdLib / LibC / Softfloat / softfloat.txt

$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $\r
\r
SoftFloat Release 2a General Documentation\r
\r
John R. Hauser\r
1998 December 13\r
\r
\r
-------------------------------------------------------------------------------\r
Introduction\r
\r
SoftFloat is a software implementation of floating-point that conforms to\r
the IEC/IEEE Standard for Binary Floating-Point Arithmetic.  As many as four\r
formats are supported:  single precision, double precision, extended double\r
precision, and quadruple precision.  All operations required by the standard\r
are implemented, except for conversions to and from decimal.\r
\r
This document gives information about the types defined and the routines\r
implemented by SoftFloat.  It does not attempt to define or explain the\r
IEC/IEEE Floating-Point Standard.  Details about the standard are available\r
elsewhere.\r
\r
\r
-------------------------------------------------------------------------------\r
Limitations\r
\r
SoftFloat is written in C and is designed to work with other C code.  The\r
SoftFloat header files assume an ISO/ANSI-style C compiler.  No attempt\r
has been made to accommodate compilers that are not ISO-conformant.  In\r
particular, the distributed header files will not be acceptable to any\r
compiler that does not recognize function prototypes.\r
\r
Support for the extended double-precision and quadruple-precision formats\r
depends on a C compiler that implements 64-bit integer arithmetic.  If the\r
largest integer format supported by the C compiler is 32 bits, SoftFloat is\r
limited to only single and double precisions.  When that is the case, all\r
references in this document to the extended double precision, quadruple\r
precision, and 64-bit integers should be ignored.\r
\r
\r
-------------------------------------------------------------------------------\r
Contents\r
\r
    Introduction\r
    Limitations\r
    Contents\r
    Legal Notice\r
    Types and Functions\r
    Rounding Modes\r
    Extended Double-Precision Rounding Precision\r
    Exceptions and Exception Flags\r
    Function Details\r
        Conversion Functions\r
        Standard Arithmetic Functions\r
        Remainder Functions\r
        Round-to-Integer Functions\r
        Comparison Functions\r
        Signaling NaN Test Functions\r
        Raise-Exception Function\r
    Contact Information\r
\r
\r
\r
-------------------------------------------------------------------------------\r
Legal Notice\r
\r
SoftFloat was written by John R. Hauser.  This work was made possible in\r
part by the International Computer Science Institute, located at Suite 600,\r
1947 Center Street, Berkeley, California 94704.  Funding was partially\r
provided by the National Science Foundation under grant MIP-9311980.  The\r
original version of this code was written as part of a project to build\r
a fixed-point vector processor in collaboration with the University of\r
California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.\r
\r
THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE.  Although reasonable effort\r
has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT\r
TIMES RESULT IN INCORRECT BEHAVIOR.  USE OF THIS SOFTWARE IS RESTRICTED TO\r
PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY\r
AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.\r
\r
\r
-------------------------------------------------------------------------------\r
Types and Functions\r
\r
When 64-bit integers are supported by the compiler, the `softfloat.h' header\r
file defines four types:  `float32' (single precision), `float64' (double\r
precision), `floatx80' (extended double precision), and `float128'\r
(quadruple precision).  The `float32' and `float64' types are defined in\r
terms of 32-bit and 64-bit integer types, respectively, while the `float128'\r
type is defined as a structure of two 64-bit integers, taking into account\r
the byte order of the particular machine being used.  The `floatx80' type\r
is defined as a structure containing one 16-bit and one 64-bit integer, with\r
the machine's byte order again determining the order of the `high' and `low'\r
fields.\r
\r
When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'\r
header file defines only two types:  `float32' and `float64'.  Because\r
ISO/ANSI C guarantees at least one built-in integer type of 32 bits,\r
the `float32' type is identified with an appropriate integer type.  The\r
`float64' type is defined as a structure of two 32-bit integers, with the\r
machine's byte order determining the order of the fields.\r
\r
In either case, the types in `softfloat.h' are defined such that if a system\r
implements the usual C `float' and `double' types according to the IEC/IEEE\r
Standard, then the `float32' and `float64' types should be indistinguishable\r
in memory from the native `float' and `double' types.  (On the other hand,\r
when `float32' or `float64' values are placed in processor registers by\r
the compiler, the type of registers used may differ from those used for the\r
native `float' and `double' types.)\r
\r
SoftFloat implements the following arithmetic operations:\r
\r
-- Conversions among all the floating-point formats, and also between\r
   integers (32-bit and 64-bit) and any of the floating-point formats.\r
\r
-- The usual add, subtract, multiply, divide, and square root operations\r
   for all floating-point formats.\r
\r
-- For each format, the floating-point remainder operation defined by the\r
   IEC/IEEE Standard.\r
\r
-- For each floating-point format, a ``round to integer'' operation that\r
   rounds to the nearest integer value in the same format.  (The floating-\r
   point formats can hold integer values, of course.)\r
\r
-- Comparisons between two values in the same floating-point format.\r
\r
The only functions required by the IEC/IEEE Standard that are not provided\r
are conversions to and from decimal.\r
\r
\r
-------------------------------------------------------------------------------\r
Rounding Modes\r
\r
All four rounding modes prescribed by the IEC/IEEE Standard are implemented\r
for all operations that require rounding.  The rounding mode is selected\r
by the global variable `float_rounding_mode'.  This variable may be set\r
to one of the values `float_round_nearest_even', `float_round_to_zero',\r
`float_round_down', or `float_round_up'.  The rounding mode is initialized\r
to nearest/even.\r
\r
\r
-------------------------------------------------------------------------------\r
Extended Double-Precision Rounding Precision\r
\r
For extended double precision (`floatx80') only, the rounding precision\r
of the standard arithmetic operations is controlled by the global variable\r
`floatx80_rounding_precision'.  The operations affected are:\r
\r
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt\r
\r
When `floatx80_rounding_precision' is set to its default value of 80, these\r
operations are rounded (as usual) to the full precision of the extended\r
double-precision format.  Setting `floatx80_rounding_precision' to 32\r
or to 64 causes the operations listed to be rounded to reduced precision\r
equivalent to single precision (`float32') or to double precision\r
(`float64'), respectively.  When rounding to reduced precision, additional\r
bits in the result significand beyond the rounding point are set to zero.\r
The consequences of setting `floatx80_rounding_precision' to a value other\r
than 32, 64, or 80 is not specified.  Operations other than the ones listed\r
above are not affected by `floatx80_rounding_precision'.\r
\r
\r
-------------------------------------------------------------------------------\r
Exceptions and Exception Flags\r
\r
All five exception flags required by the IEC/IEEE Standard are\r
implemented.  Each flag is stored as a unique bit in the global variable\r
`float_exception_flags'.  The positions of the exception flag bits within\r
this variable are determined by the bit masks `float_flag_inexact',\r
`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and\r
`float_flag_invalid'.  The exception flags variable is initialized to all 0,\r
meaning no exceptions.\r
\r
An individual exception flag can be cleared with the statement\r
\r
    float_exception_flags &= ~ float_flag_<exception>;\r
\r
where `<exception>' is the appropriate name.  To raise a floating-point\r
exception, the SoftFloat function `float_raise' should be used (see below).\r
\r
In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess\r
for underflow either before or after rounding.  The choice is made by\r
the global variable `float_detect_tininess', which can be set to either\r
`float_tininess_before_rounding' or `float_tininess_after_rounding'.\r
Detecting tininess after rounding is better because it results in fewer\r
spurious underflow signals.  The other option is provided for compatibility\r
with some systems.  Like most systems, SoftFloat always detects loss of\r
accuracy for underflow as an inexact result.\r
\r
\r
-------------------------------------------------------------------------------\r
Function Details\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Conversion Functions\r
\r
All conversions among the floating-point formats are supported, as are all\r
conversions between a floating-point format and 32-bit and 64-bit signed\r
integers.  The complete set of conversion functions is:\r
\r
   int32_to_float32      int64_to_float32\r
   int32_to_float64      int64_to_float32\r
   int32_to_floatx80     int64_to_floatx80\r
   int32_to_float128     int64_to_float128\r
\r
   float32_to_int32      float32_to_int64\r
   float32_to_int32      float64_to_int64\r
   floatx80_to_int32     floatx80_to_int64\r
   float128_to_int32     float128_to_int64\r
\r
   float32_to_float64    float32_to_floatx80   float32_to_float128\r
   float64_to_float32    float64_to_floatx80   float64_to_float128\r
   floatx80_to_float32   floatx80_to_float64   floatx80_to_float128\r
   float128_to_float32   float128_to_float64   float128_to_floatx80\r
\r
Each conversion function takes one operand of the appropriate type and\r
returns one result.  Conversions from a smaller to a larger floating-point\r
format are always exact and so require no rounding.  Conversions from 32-bit\r
integers to double precision and larger formats are also exact, and likewise\r
for conversions from 64-bit integers to extended double and quadruple\r
precisions.\r
\r
Conversions from floating-point to integer raise the invalid exception if\r
the source value cannot be rounded to a representable integer of the desired\r
size (32 or 64 bits).  If the floating-point operand is a NaN, the largest\r
positive integer is returned.  Otherwise, if the conversion overflows, the\r
largest integer with the same sign as the operand is returned.\r
\r
On conversions to integer, if the floating-point operand is not already an\r
integer value, the operand is rounded according to the current rounding\r
mode as specified by `float_rounding_mode'.  Because C (and perhaps other\r
languages) require that conversions to integers be rounded toward zero, the\r
following functions are provided for improved speed and convenience:\r
\r
   float32_to_int32_round_to_zero    float32_to_int64_round_to_zero\r
   float64_to_int32_round_to_zero    float64_to_int64_round_to_zero\r
   floatx80_to_int32_round_to_zero   floatx80_to_int64_round_to_zero\r
   float128_to_int32_round_to_zero   float128_to_int64_round_to_zero\r
\r
These variant functions ignore `float_rounding_mode' and always round toward\r
zero.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Standard Arithmetic Functions\r
\r
The following standard arithmetic functions are provided:\r
\r
   float32_add    float32_sub    float32_mul    float32_div    float32_sqrt\r
   float64_add    float64_sub    float64_mul    float64_div    float64_sqrt\r
   floatx80_add   floatx80_sub   floatx80_mul   floatx80_div   floatx80_sqrt\r
   float128_add   float128_sub   float128_mul   float128_div   float128_sqrt\r
\r
Each function takes two operands, except for `sqrt' which takes only one.\r
The operands and result are all of the same type.\r
\r
Rounding of the extended double-precision (`floatx80') functions is affected\r
by the `floatx80_rounding_precision' variable, as explained above in the\r
section _Extended_Double-Precision_Rounding_Precision_.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Remainder Functions\r
\r
For each format, SoftFloat implements the remainder function according to\r
the IEC/IEEE Standard.  The remainder functions are:\r
\r
   float32_rem\r
   float64_rem\r
   floatx80_rem\r
   float128_rem\r
\r
Each remainder function takes two operands.  The operands and result are all\r
of the same type.  Given operands x and y, the remainder functions return\r
the value x - n*y, where n is the integer closest to x/y.  If x/y is exactly\r
halfway between two integers, n is the even integer closest to x/y.  The\r
remainder functions are always exact and so require no rounding.\r
\r
Depending on the relative magnitudes of the operands, the remainder\r
functions can take considerably longer to execute than the other SoftFloat\r
functions.  This is inherent in the remainder operation itself and is not a\r
flaw in the SoftFloat implementation.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Round-to-Integer Functions\r
\r
For each format, SoftFloat implements the round-to-integer function\r
specified by the IEC/IEEE Standard.  The functions are:\r
\r
   float32_round_to_int\r
   float64_round_to_int\r
   floatx80_round_to_int\r
   float128_round_to_int\r
\r
Each function takes a single floating-point operand and returns a result of\r
the same type.  (Note that the result is not an integer type.)  The operand\r
is rounded to an exact integer according to the current rounding mode, and\r
the resulting integer value is returned in the same floating-point format.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Comparison Functions\r
\r
The following floating-point comparison functions are provided:\r
\r
   float32_eq    float32_le    float32_lt\r
   float64_eq    float64_le    float64_lt\r
   floatx80_eq   floatx80_le   floatx80_lt\r
   float128_eq   float128_le   float128_lt\r
\r
Each function takes two operands of the same type and returns a 1 or 0\r
representing either _true_ or _false_.  The abbreviation `eq' stands for\r
``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands\r
for ``less than'' (<).\r
\r
The standard greater-than (>), greater-than-or-equal (>=), and not-equal\r
(!=) functions are easily obtained using the functions provided.  The\r
not-equal function is just the logical complement of the equal function.\r
The greater-than-or-equal function is identical to the less-than-or-equal\r
function with the operands reversed; and the greater-than function can be\r
obtained from the less-than function in the same way.\r
\r
The IEC/IEEE Standard specifies that the less-than-or-equal and less-than\r
functions raise the invalid exception if either input is any kind of NaN.\r
The equal functions, on the other hand, are defined not to raise the invalid\r
exception on quiet NaNs.  For completeness, SoftFloat provides the following\r
additional functions:\r
\r
   float32_eq_signaling    float32_le_quiet    float32_lt_quiet\r
   float64_eq_signaling    float64_le_quiet    float64_lt_quiet\r
   floatx80_eq_signaling   floatx80_le_quiet   floatx80_lt_quiet\r
   float128_eq_signaling   float128_le_quiet   float128_lt_quiet\r
\r
The `signaling' equal functions are identical to the standard functions\r
except that the invalid exception is raised for any NaN input.  Likewise,\r
the `quiet' comparison functions are identical to their counterparts except\r
that the invalid exception is not raised for quiet NaNs.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Signaling NaN Test Functions\r
\r
The following functions test whether a floating-point value is a signaling\r
NaN:\r
\r
   float32_is_signaling_nan\r
   float64_is_signaling_nan\r
   floatx80_is_signaling_nan\r
   float128_is_signaling_nan\r
\r
The functions take one operand and return 1 if the operand is a signaling\r
NaN and 0 otherwise.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
Raise-Exception Function\r
\r
SoftFloat provides a function for raising floating-point exceptions:\r
\r
    float_raise\r
\r
The function takes a mask indicating the set of exceptions to raise.  No\r
result is returned.  In addition to setting the specified exception flags,\r
this function may cause a trap or abort appropriate for the current system.\r
\r
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
\r
\r
-------------------------------------------------------------------------------\r
Contact Information\r
\r
At the time of this writing, the most up-to-date information about\r
SoftFloat and the latest release can be found at the Web page `http://\r
HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.\r
\r
\r
Commit	Line	Data
3352b62b N	1	$NetBSD: softfloat.txt,v 1.2 2006/11/24 19:46:58 christos Exp $\r
	2	\r
	3	SoftFloat Release 2a General Documentation\r
	4	\r
	5	John R. Hauser\r
	6	1998 December 13\r
	7	\r
	8	\r
	9	-------------------------------------------------------------------------------\r
	10	Introduction\r
	11	\r
	12	SoftFloat is a software implementation of floating-point that conforms to\r
	13	the IEC/IEEE Standard for Binary Floating-Point Arithmetic. As many as four\r
	14	formats are supported: single precision, double precision, extended double\r
	15	precision, and quadruple precision. All operations required by the standard\r
	16	are implemented, except for conversions to and from decimal.\r
	17	\r
	18	This document gives information about the types defined and the routines\r
	19	implemented by SoftFloat. It does not attempt to define or explain the\r
	20	IEC/IEEE Floating-Point Standard. Details about the standard are available\r
	21	elsewhere.\r
	22	\r
	23	\r
	24	-------------------------------------------------------------------------------\r
	25	Limitations\r
	26	\r
	27	SoftFloat is written in C and is designed to work with other C code. The\r
	28	SoftFloat header files assume an ISO/ANSI-style C compiler. No attempt\r
	29	has been made to accommodate compilers that are not ISO-conformant. In\r
	30	particular, the distributed header files will not be acceptable to any\r
	31	compiler that does not recognize function prototypes.\r
	32	\r
	33	Support for the extended double-precision and quadruple-precision formats\r
	34	depends on a C compiler that implements 64-bit integer arithmetic. If the\r
	35	largest integer format supported by the C compiler is 32 bits, SoftFloat is\r
	36	limited to only single and double precisions. When that is the case, all\r
	37	references in this document to the extended double precision, quadruple\r
	38	precision, and 64-bit integers should be ignored.\r
	39	\r
	40	\r
	41	-------------------------------------------------------------------------------\r
	42	Contents\r
	43	\r
	44	Introduction\r
	45	Limitations\r
	46	Contents\r
	47	Legal Notice\r
	48	Types and Functions\r
	49	Rounding Modes\r
	50	Extended Double-Precision Rounding Precision\r
	51	Exceptions and Exception Flags\r
	52	Function Details\r
	53	Conversion Functions\r
	54	Standard Arithmetic Functions\r
	55	Remainder Functions\r
	56	Round-to-Integer Functions\r
	57	Comparison Functions\r
	58	Signaling NaN Test Functions\r
	59	Raise-Exception Function\r
	60	Contact Information\r
	61	\r
	62	\r
	63	\r
	64	-------------------------------------------------------------------------------\r
65	Legal Notice\r
66	\r
67	SoftFloat was written by John R. Hauser. This work was made possible in\r
68	part by the International Computer Science Institute, located at Suite 600,\r
69	1947 Center Street, Berkeley, California 94704. Funding was partially\r
70	provided by the National Science Foundation under grant MIP-9311980. The\r
71	original version of this code was written as part of a project to build\r
72	a fixed-point vector processor in collaboration with the University of\r
73	California at Berkeley, overseen by Profs. Nelson Morgan and John Wawrzynek.\r
74	\r
75	THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort\r
76	has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT\r
77	TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO\r
78	PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ANY\r
79	AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE.\r
80	\r
81	\r
82	-------------------------------------------------------------------------------\r
83	Types and Functions\r
84	\r
85	When 64-bit integers are supported by the compiler, the `softfloat.h' header\r
86	file defines four types: `float32' (single precision), `float64' (double\r
87	precision), `floatx80' (extended double precision), and `float128'\r
88	(quadruple precision). The `float32' and `float64' types are defined in\r
89	terms of 32-bit and 64-bit integer types, respectively, while the `float128'\r
90	type is defined as a structure of two 64-bit integers, taking into account\r
91	the byte order of the particular machine being used. The `floatx80' type\r
92	is defined as a structure containing one 16-bit and one 64-bit integer, with\r
93	the machine's byte order again determining the order of the `high' and `low'\r
94	fields.\r
95	\r
96	When 64-bit integers are _not_ supported by the compiler, the `softfloat.h'\r
97	header file defines only two types: `float32' and `float64'. Because\r
98	ISO/ANSI C guarantees at least one built-in integer type of 32 bits,\r
99	the `float32' type is identified with an appropriate integer type. The\r
100	`float64' type is defined as a structure of two 32-bit integers, with the\r
101	machine's byte order determining the order of the fields.\r
102	\r
103	In either case, the types in `softfloat.h' are defined such that if a system\r
104	implements the usual C `float' and `double' types according to the IEC/IEEE\r
105	Standard, then the `float32' and `float64' types should be indistinguishable\r
106	in memory from the native `float' and `double' types. (On the other hand,\r
107	when `float32' or `float64' values are placed in processor registers by\r
108	the compiler, the type of registers used may differ from those used for the\r
109	native `float' and `double' types.)\r
110	\r
111	SoftFloat implements the following arithmetic operations:\r
112	\r
113	-- Conversions among all the floating-point formats, and also between\r
114	integers (32-bit and 64-bit) and any of the floating-point formats.\r
115	\r
116	-- The usual add, subtract, multiply, divide, and square root operations\r
117	for all floating-point formats.\r
118	\r
119	-- For each format, the floating-point remainder operation defined by the\r
120	IEC/IEEE Standard.\r
121	\r
122	-- For each floating-point format, a ``round to integer'' operation that\r
123	rounds to the nearest integer value in the same format. (The floating-\r
124	point formats can hold integer values, of course.)\r
125	\r
126	-- Comparisons between two values in the same floating-point format.\r
127	\r
128	The only functions required by the IEC/IEEE Standard that are not provided\r
129	are conversions to and from decimal.\r
130	\r
131	\r
132	-------------------------------------------------------------------------------\r
133	Rounding Modes\r
134	\r
135	All four rounding modes prescribed by the IEC/IEEE Standard are implemented\r
136	for all operations that require rounding. The rounding mode is selected\r
137	by the global variable `float_rounding_mode'. This variable may be set\r
138	to one of the values `float_round_nearest_even', `float_round_to_zero',\r
139	`float_round_down', or `float_round_up'. The rounding mode is initialized\r
140	to nearest/even.\r
141	\r
142	\r
143	-------------------------------------------------------------------------------\r
144	Extended Double-Precision Rounding Precision\r
145	\r
146	For extended double precision (`floatx80') only, the rounding precision\r
147	of the standard arithmetic operations is controlled by the global variable\r
148	`floatx80_rounding_precision'. The operations affected are:\r
149	\r
150	floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt\r
151	\r
152	When `floatx80_rounding_precision' is set to its default value of 80, these\r
153	operations are rounded (as usual) to the full precision of the extended\r
154	double-precision format. Setting `floatx80_rounding_precision' to 32\r
155	or to 64 causes the operations listed to be rounded to reduced precision\r
156	equivalent to single precision (`float32') or to double precision\r
157	(`float64'), respectively. When rounding to reduced precision, additional\r
158	bits in the result significand beyond the rounding point are set to zero.\r
159	The consequences of setting `floatx80_rounding_precision' to a value other\r
160	than 32, 64, or 80 is not specified. Operations other than the ones listed\r
161	above are not affected by `floatx80_rounding_precision'.\r
162	\r
163	\r
164	-------------------------------------------------------------------------------\r
165	Exceptions and Exception Flags\r
166	\r
167	All five exception flags required by the IEC/IEEE Standard are\r
168	implemented. Each flag is stored as a unique bit in the global variable\r
169	`float_exception_flags'. The positions of the exception flag bits within\r
170	this variable are determined by the bit masks `float_flag_inexact',\r
171	`float_flag_underflow', `float_flag_overflow', `float_flag_divbyzero', and\r
172	`float_flag_invalid'. The exception flags variable is initialized to all 0,\r
173	meaning no exceptions.\r
174	\r
175	An individual exception flag can be cleared with the statement\r
176	\r
177	float_exception_flags &= ~ float_flag_<exception>;\r
178	\r
179	where `<exception>' is the appropriate name. To raise a floating-point\r
180	exception, the SoftFloat function `float_raise' should be used (see below).\r
181	\r
182	In the terminology of the IEC/IEEE Standard, SoftFloat can detect tininess\r
183	for underflow either before or after rounding. The choice is made by\r
184	the global variable `float_detect_tininess', which can be set to either\r
185	`float_tininess_before_rounding' or `float_tininess_after_rounding'.\r
186	Detecting tininess after rounding is better because it results in fewer\r
187	spurious underflow signals. The other option is provided for compatibility\r
188	with some systems. Like most systems, SoftFloat always detects loss of\r
189	accuracy for underflow as an inexact result.\r
190	\r
191	\r
192	-------------------------------------------------------------------------------\r
193	Function Details\r
194	\r
195	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
196	Conversion Functions\r
197	\r
198	All conversions among the floating-point formats are supported, as are all\r
199	conversions between a floating-point format and 32-bit and 64-bit signed\r
200	integers. The complete set of conversion functions is:\r
201	\r
202	int32_to_float32 int64_to_float32\r
203	int32_to_float64 int64_to_float32\r
204	int32_to_floatx80 int64_to_floatx80\r
205	int32_to_float128 int64_to_float128\r
206	\r
207	float32_to_int32 float32_to_int64\r
208	float32_to_int32 float64_to_int64\r
209	floatx80_to_int32 floatx80_to_int64\r
210	float128_to_int32 float128_to_int64\r
211	\r
212	float32_to_float64 float32_to_floatx80 float32_to_float128\r
213	float64_to_float32 float64_to_floatx80 float64_to_float128\r
214	floatx80_to_float32 floatx80_to_float64 floatx80_to_float128\r
215	float128_to_float32 float128_to_float64 float128_to_floatx80\r
216	\r
217	Each conversion function takes one operand of the appropriate type and\r
218	returns one result. Conversions from a smaller to a larger floating-point\r
219	format are always exact and so require no rounding. Conversions from 32-bit\r
220	integers to double precision and larger formats are also exact, and likewise\r
221	for conversions from 64-bit integers to extended double and quadruple\r
222	precisions.\r
223	\r
224	Conversions from floating-point to integer raise the invalid exception if\r
225	the source value cannot be rounded to a representable integer of the desired\r
226	size (32 or 64 bits). If the floating-point operand is a NaN, the largest\r
227	positive integer is returned. Otherwise, if the conversion overflows, the\r
228	largest integer with the same sign as the operand is returned.\r
229	\r
230	On conversions to integer, if the floating-point operand is not already an\r
231	integer value, the operand is rounded according to the current rounding\r
232	mode as specified by `float_rounding_mode'. Because C (and perhaps other\r
233	languages) require that conversions to integers be rounded toward zero, the\r
234	following functions are provided for improved speed and convenience:\r
235	\r
236	float32_to_int32_round_to_zero float32_to_int64_round_to_zero\r
237	float64_to_int32_round_to_zero float64_to_int64_round_to_zero\r
238	floatx80_to_int32_round_to_zero floatx80_to_int64_round_to_zero\r
239	float128_to_int32_round_to_zero float128_to_int64_round_to_zero\r
240	\r
241	These variant functions ignore `float_rounding_mode' and always round toward\r
242	zero.\r
243	\r
244	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
245	Standard Arithmetic Functions\r
246	\r
247	The following standard arithmetic functions are provided:\r
248	\r
249	float32_add float32_sub float32_mul float32_div float32_sqrt\r
250	float64_add float64_sub float64_mul float64_div float64_sqrt\r
251	floatx80_add floatx80_sub floatx80_mul floatx80_div floatx80_sqrt\r
252	float128_add float128_sub float128_mul float128_div float128_sqrt\r
253	\r
254	Each function takes two operands, except for `sqrt' which takes only one.\r
255	The operands and result are all of the same type.\r
256	\r
257	Rounding of the extended double-precision (`floatx80') functions is affected\r
258	by the `floatx80_rounding_precision' variable, as explained above in the\r
259	section _Extended_Double-Precision_Rounding_Precision_.\r
260	\r
261	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
262	Remainder Functions\r
263	\r
264	For each format, SoftFloat implements the remainder function according to\r
265	the IEC/IEEE Standard. The remainder functions are:\r
266	\r
267	float32_rem\r
268	float64_rem\r
269	floatx80_rem\r
270	float128_rem\r
271	\r
272	Each remainder function takes two operands. The operands and result are all\r
273	of the same type. Given operands x and y, the remainder functions return\r
274	the value x - n*y, where n is the integer closest to x/y. If x/y is exactly\r
275	halfway between two integers, n is the even integer closest to x/y. The\r
276	remainder functions are always exact and so require no rounding.\r
277	\r
278	Depending on the relative magnitudes of the operands, the remainder\r
279	functions can take considerably longer to execute than the other SoftFloat\r
280	functions. This is inherent in the remainder operation itself and is not a\r
281	flaw in the SoftFloat implementation.\r
282	\r
283	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
284	Round-to-Integer Functions\r
285	\r
286	For each format, SoftFloat implements the round-to-integer function\r
287	specified by the IEC/IEEE Standard. The functions are:\r
288	\r
289	float32_round_to_int\r
290	float64_round_to_int\r
291	floatx80_round_to_int\r
292	float128_round_to_int\r
293	\r
294	Each function takes a single floating-point operand and returns a result of\r
295	the same type. (Note that the result is not an integer type.) The operand\r
296	is rounded to an exact integer according to the current rounding mode, and\r
297	the resulting integer value is returned in the same floating-point format.\r
298	\r
299	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
300	Comparison Functions\r
301	\r
302	The following floating-point comparison functions are provided:\r
303	\r
304	float32_eq float32_le float32_lt\r
305	float64_eq float64_le float64_lt\r
306	floatx80_eq floatx80_le floatx80_lt\r
307	float128_eq float128_le float128_lt\r
308	\r
309	Each function takes two operands of the same type and returns a 1 or 0\r
310	representing either _true_ or _false_. The abbreviation `eq' stands for\r
311	``equal'' (=); `le' stands for ``less than or equal'' (<=); and `lt' stands\r
312	for ``less than'' (<).\r
313	\r
314	The standard greater-than (>), greater-than-or-equal (>=), and not-equal\r
315	(!=) functions are easily obtained using the functions provided. The\r
316	not-equal function is just the logical complement of the equal function.\r
317	The greater-than-or-equal function is identical to the less-than-or-equal\r
318	function with the operands reversed; and the greater-than function can be\r
319	obtained from the less-than function in the same way.\r
320	\r
321	The IEC/IEEE Standard specifies that the less-than-or-equal and less-than\r
322	functions raise the invalid exception if either input is any kind of NaN.\r
323	The equal functions, on the other hand, are defined not to raise the invalid\r
324	exception on quiet NaNs. For completeness, SoftFloat provides the following\r
325	additional functions:\r
326	\r
327	float32_eq_signaling float32_le_quiet float32_lt_quiet\r
328	float64_eq_signaling float64_le_quiet float64_lt_quiet\r
329	floatx80_eq_signaling floatx80_le_quiet floatx80_lt_quiet\r
330	float128_eq_signaling float128_le_quiet float128_lt_quiet\r
331	\r
332	The `signaling' equal functions are identical to the standard functions\r
333	except that the invalid exception is raised for any NaN input. Likewise,\r
334	the `quiet' comparison functions are identical to their counterparts except\r
335	that the invalid exception is not raised for quiet NaNs.\r
336	\r
337	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
338	Signaling NaN Test Functions\r
339	\r
340	The following functions test whether a floating-point value is a signaling\r
341	NaN:\r
342	\r
343	float32_is_signaling_nan\r
344	float64_is_signaling_nan\r
345	floatx80_is_signaling_nan\r
346	float128_is_signaling_nan\r
347	\r
348	The functions take one operand and return 1 if the operand is a signaling\r
349	NaN and 0 otherwise.\r
350	\r
351	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
352	Raise-Exception Function\r
353	\r
354	SoftFloat provides a function for raising floating-point exceptions:\r
355	\r
356	float_raise\r
357	\r
358	The function takes a mask indicating the set of exceptions to raise. No\r
359	result is returned. In addition to setting the specified exception flags,\r
360	this function may cause a trap or abort appropriate for the current system.\r
361	\r
362	- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\r
363	\r
364	\r
365	-------------------------------------------------------------------------------\r
366	Contact Information\r
367	\r
368	At the time of this writing, the most up-to-date information about\r
369	SoftFloat and the latest release can be found at the Web page `http://\r
370	HTTP.CS.Berkeley.EDU/~jhauser/arithmetic/SoftFloat.html'.\r
371	\r
372	\r