[ceph.git] / ceph / src / boost / libs / math / doc / roots / root_comparison.qbk

[/
Copyright 2015 Paul A. Bristow.
Copyright 2015 John Maddock.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]

[section:root_comparison  Comparison of Root Finding Algorithms]

[section:cbrt_comparison  Comparison of Cube Root Finding Algorithms]

In the table below, the cube root of 28 was computed for three __fundamental_types floating-point types,
and one __multiprecision type __cpp_bin_float using 50 decimal digit precision, using four algorithms.

The 'exact' answer was computed using a 100 decimal digit type:

   cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");

Times were measured using  __boost_timer using `class cpu_timer`.

* ['Its] is the number of iterations taken to find the root.
* ['Times] is the CPU time-taken in arbitrary units.
* ['Norm] is a normalized time, in comparison to the quickest algorithm (with value 1.00).
* ['Dis] is the distance from the nearest representation of the 'exact' root in bits.
Distance from the 'exact' answer is measured by using function __float_distance.
One or two bits distance means that all results are effectively 'correct'.
Zero means 'exact' - the nearest __representable value for the floating-point type.

The cube-root function is a simple function, and is a contrived example for root-finding.
It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to
more complex functions.

The program used was [@ ../../example/root_finding_algorithms.cpp root_finding_algorithms.cpp].
100000 evaluations of each floating-point type and algorithm were used and the CPU times were
judged from repeat runs to have an uncertainty of 10 %.  Comparing MSVC for `double` and `long double`
(which are identical on this patform) may give a guide to uncertainty of timing.

The requested precision was set as follows:

[table
[[Function][Precision Requested]]
[[TOMS748][numeric_limits<T>::digits - 2]]
[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
[[Schr'''&#xf6;'''der][floor(numeric_limits<T>::digits * 0.4)]]
]

* The C++ Standard cube root function [@http://en.cppreference.com/w/cpp/numeric/math/cbrt std::cbrt]
is only defined for built-in or fundamental types,
so cannot be used with any User-Defined floating-point types like __multiprecision.
This, and that the cube function is so impeccably-behaved,
allows the implementer to use many tricks to achieve a fast computation.
On some platforms,`std::cbrt` appeared several times as quick as the more general `boost::math::cbrt`,
on other platforms / compiler options `boost::math::cbrt` is noticeably faster.  In general, the results are highly
dependent on the code-generation / processor architecture selection compiler options used.  One can
assume that the standard library will have been compiled with options ['nearly] optimal for the platform
it was installed on, where as the user has more choice over the options used for Boost.Math.  Pick something
too general/conservative and performance suffers, while selecting options that make use of the latest
instruction set opcodes speed's things up noticeably.

* Two compilers in optimise mode were compared: GCC 4.9.1 using Netbeans IDS
and Microsoft Visual Studio 2013 (Update 1) on the same hardware.
The number of iterations seemed consistent, but the relative run-times surprisingly different.

* `boost::math::cbrt` allows use with ['any user-defined floating-point type], conveniently
__multiprecision.  It too can take some advantage of the good-behaviour of the cube function,
compared to the more general implementation in the nth root-finding examples.  For example,
it uses a polynomial approximation to generate a better guess than dividing the exponent by three,
and can avoid the complex checks in __newton required to prevent the
search going wildly off-track.  For a known precision, it may also be possible to
fix the number of iterations, allowing inlining and loop unrolling.  It also
algebraically simplifies the Halley steps leading to a big reduction in the
number of floating point operations required compared to a "black box" implementation
that calculates the derivatives seperately and then combines them in the Halley code.
Typically, it was found that computation using type `double`
took a few times longer when using the various root-finding algorithms directly rather
than the hand coded/optimized `cbrt` routine.

* The importance of getting a good guess can be seen by the iteration count for the multiprecision case:
here we "cheat" a little and use the cube-root calculated to double precision as the initial guess.
The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.

* For __fundamental_types, there was little to choose between the three derivative methods,
but for __cpp_bin_float, __newton was twice as fast.  Note that the cube-root is an extreme
test case as the cost of calling the functor is so cheap that the runtimes are largely
dominated by the complexity of the iteration code.

* Compiling with optimisation halved computation times, and any differences between algorithms
became nearly negligible.  The optimisation speed-up of the __TOMS748 was especially noticable.

* Using a multiprecision type like `cpp_bin_float_50` for a precision of 50 decimal digits
took a lot longer, as expected because most computation
uses software rather than 64-bit floating-point hardware.
Speeds are often more than 50 times slower.

* Using `cpp_bin_float_50`,  __TOMS748 was much slower showing the benefit of using derivatives.
__newton was found to be twice as quick as either of the second-derivative methods:
this is an extreme case though, the function and its derivatives are so cheap to compute that we're
really measuring the complexity of the boilerplate root-finding code.

* For multiprecision types only one or two extra ['iterations] are needed to get the remaining 35 digits, whatever the algorithm used.
(The time taken was of course much greater for these types).

* Using a 100 decimal-digit type only doubled the time and required only a very few more iterations,
so the cost of extra precision is mainly the underlying cost of computing more digits,
not in the way the algorithm works.  This confirms previous observations using __NTL high-precision types.

[include root_comparison_tables_msvc.qbk]
[include root_comparison_tables_gcc.qbk]

[endsect] [/section:cbrt_comparison  Comparison of Cube Root Finding Algorithms]

[section:root_n_comparison Comparison of Nth-root Finding Algorithms]

A second example compares four generalized nth-root finding algorithms for various n-th roots (5, 7 and 13)
of a single value 28.0, for four floating-point types, `float`, `double`,
`long double` and a __multiprecision type `cpp_bin_float_50`.
In each case the target accuracy was set using our "recomended" accuracy limits 
(or at least limits that make a good starting point - which is likely to give
close to full accuracy without resorting to unnecessary iterations).

[table
[[Function][Precision Requested]]
[[TOMS748][numeric_limits<T>::digits - 2]]
[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
[[Schr'''&#xf6;'''der][floor(numeric_limits<T>::digits * 0.4)]]
]
Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
[@../../example/root_n_finding_algorithms.cpp root_n_finding_algorithms.cpp].

The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.

To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
More than one result can be 'best' when normalized times are indistinguishable
within the uncertainty.

[/include roots_table_100_msvc.qbk]
[/include roots_table_75_msvc.qbk]

[/include roots_table_75_msvc_X86.qbk]
[/include roots_table_100_msvc_X86.qbk]

[/include roots_table_100_msvc_AVX.qbk]
[/include roots_table_75_msvc_AVX.qbk]

[/include roots_table_75_msvc_X86_SSE2.qbk]
[/include roots_table_100_msvc_X86_SSE2.qbk]

[/include roots_table_100_gcc_X64_SSE2.qbk]
[/include roots_table_75_gcc_X64_SSE2.qbk]

[/include type_info_table_100_msvc.qbk]
[/include type_info_table_75_msvc.qbk]

[include roots_table_100_msvc_X86_SSE2.qbk]
[include roots_table_100_msvc_X64_AVX.qbk]
[include roots_table_100_gcc_X64_SSE2.qbk]

Some tentative conclusions can be drawn from this limited exercise.

* Perhaps surprisingly, there is little difference between the various algorithms for __fundamental_types floating-point types.
Using the first derivatives (__newton) is usually the best, but while the improvement over the no-derivative
__TOMS748 is considerable in number of iterations, but little in execution time.  This reflects the fact that the function
we are finding the root for is trivial to evaluate, so runtimetimes are dominated by the time taken by the boilerplate code
in each method.

* The extra cost of evaluating the second derivatives (__halley or __schroder) is usually too much for any net benefit:
as with the cube root, these functors are so cheap to evaluate that the runtime is largely dominated by the
complexity of the root finding method.

* For a __multiprecision floating-point type, the __newton is a clear winner with a several-fold gain over __TOMS748,
and again no improvement from the second-derivative algorithms.

* The run-time of 50 decimal-digit __multiprecision is about 30-fold greater than `double`.

* The column 'dis' showing the number of bits distance from the correct result.
The Newton-Raphson algorithm shows a bit or two better accuracy than __TOMS748.

* The goodness of the 'guess' is especially crucial for __multiprecision.
Separate experiments show that evaluating the 'guess' using `double` allows
convergence to the final exact result in one or two iterations.
So in this contrived example, crudely dividing the exponent by N for a 'guess',
it would be far better to use a `pow<double>` or ,
if more precise `pow<long double>`, function to estimate a 'guess'.
The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.

* Using floating-point extension __SSE2 made a modest ten-percent speedup.

*Using MSVC, there was some improvement using 64-bit, markedly for __multiprecision.

* The GCC compiler 4.9.1 using 64-bit was at least five-folder faster that 32-bit,
apparently reflecting better optimization.

Clearly, your mileage [*will vary], but in summary, __newton seems the first choice of algorithm,
and effort to find a good 'guess' the first speed-up target, especially for __multiprecision.
And of course, compiler optimisation is crucial for speed.

[endsect] [/section:root_n_comparison Comparison of Nth-root Finding Algorithms]

[section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]

A second example compares four root finding algorithms for locating
the second radius of an ellipse with first radius 28 and arc length 300, 
for four floating-point types, `float`, `double`,
`long double` and a __multiprecision type `cpp_bin_float_50`.

Which is to say we're solving:

[pre 4xE(sqrt(1 - 28[super 2] / x[super 2])) - 300 = 0]

In each case the target accuracy was set using our "recomended" accuracy limits 
(or at least limits that make a good starting point - which is likely to give
close to full accuracy without resorting to unnecessary iterations).

[table
[[Function][Precision Requested]]
[[TOMS748][numeric_limits<T>::digits - 2]]
[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
[[Schr'''&#xf6;'''der][floor(numeric_limits<T>::digits * 0.4)]]
]
Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
[@../../example/root_elliptic_finding.cpp root_elliptic_finding.cpp].

The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.

To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
More than one result can be 'best' when normalized times are indistinguishable
within the uncertainty.

[include elliptic_table_100_msvc_X86_SSE2.qbk]
[include elliptic_table_100_msvc_X64_AVX.qbk]
[include elliptic_table_100_gcc_X64_SSE2.qbk]

Remarks:

* The function being solved is now moderately expensive to call, and twice as expensive to call
when obtaining the derivative than when not.  Consequently there is very little improvement in moving
from a derivative free method, to Newton iteration.  However, once you've calculated the first derivative
the second comes almost for free, consequently the third order methods (Halley) does much the best.
* Of the two second order methods, Halley does best as would be expected: the Schroder method offers better
guarantees of ['quadratic] convergence, while Halley relies on a smooth function with a single root to
give ['cubic] convergence.  It's not entirely clear why Schroder iteration often does worse than Newton.

[endsect][/section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]

[endsect] [/section:root_comparison  Comparison of Root Finding Algorithms]
Commit	Line	Data
7c673cae FG	1	[/
	2	Copyright 2015 Paul A. Bristow.
	3	Copyright 2015 John Maddock.
	4	Distributed under the Boost Software License, Version 1.0.
	5	(See accompanying file LICENSE_1_0.txt or copy at
	6	http://www.boost.org/LICENSE_1_0.txt).
	7	]
	8
	9	[section:root_comparison Comparison of Root Finding Algorithms]
	10
	11	[section:cbrt_comparison Comparison of Cube Root Finding Algorithms]
	12
	13	In the table below, the cube root of 28 was computed for three __fundamental_types floating-point types,
	14	and one __multiprecision type __cpp_bin_float using 50 decimal digit precision, using four algorithms.
	15
	16	The 'exact' answer was computed using a 100 decimal digit type:
	17
	18	cpp_bin_float_100 full_answer ("3.036588971875662519420809578505669635581453977248111123242141654169177268411884961770250390838097895");
	19
	20	Times were measured using __boost_timer using `class cpu_timer`.
	21
	22	* ['Its] is the number of iterations taken to find the root.
	23	* ['Times] is the CPU time-taken in arbitrary units.
	24	* ['Norm] is a normalized time, in comparison to the quickest algorithm (with value 1.00).
	25	* ['Dis] is the distance from the nearest representation of the 'exact' root in bits.
	26	Distance from the 'exact' answer is measured by using function __float_distance.
	27	One or two bits distance means that all results are effectively 'correct'.
	28	Zero means 'exact' - the nearest __representable value for the floating-point type.
	29
	30	The cube-root function is a simple function, and is a contrived example for root-finding.
	31	It does allow us to investigate some of the factors controlling efficiency that may be extrapolated to
	32	more complex functions.
	33
	34	The program used was [@ ../../example/root_finding_algorithms.cpp root_finding_algorithms.cpp].
	35	100000 evaluations of each floating-point type and algorithm were used and the CPU times were
	36	judged from repeat runs to have an uncertainty of 10 %. Comparing MSVC for `double` and `long double`
	37	(which are identical on this patform) may give a guide to uncertainty of timing.
	38
	39	The requested precision was set as follows:
	40
	41	[table
	42	[[Function][Precision Requested]]
	43	[[TOMS748][numeric_limits<T>::digits - 2]]
	44	[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
	45	[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
	46	[[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
	47	]
	48
	49	* The C++ Standard cube root function [@http://en.cppreference.com/w/cpp/numeric/math/cbrt std::cbrt]
	50	is only defined for built-in or fundamental types,
	51	so cannot be used with any User-Defined floating-point types like __multiprecision.
	52	This, and that the cube function is so impeccably-behaved,
	53	allows the implementer to use many tricks to achieve a fast computation.
	54	On some platforms,`std::cbrt` appeared several times as quick as the more general `boost::math::cbrt`,
	55	on other platforms / compiler options `boost::math::cbrt` is noticeably faster. In general, the results are highly
	56	dependent on the code-generation / processor architecture selection compiler options used. One can
	57	assume that the standard library will have been compiled with options ['nearly] optimal for the platform
	58	it was installed on, where as the user has more choice over the options used for Boost.Math. Pick something
	59	too general/conservative and performance suffers, while selecting options that make use of the latest
	60	instruction set opcodes speed's things up noticeably.
	61
	62	* Two compilers in optimise mode were compared: GCC 4.9.1 using Netbeans IDS
	63	and Microsoft Visual Studio 2013 (Update 1) on the same hardware.
	64	The number of iterations seemed consistent, but the relative run-times surprisingly different.
65
66	* `boost::math::cbrt` allows use with ['any user-defined floating-point type], conveniently
67	__multiprecision. It too can take some advantage of the good-behaviour of the cube function,
68	compared to the more general implementation in the nth root-finding examples. For example,
69	it uses a polynomial approximation to generate a better guess than dividing the exponent by three,
70	and can avoid the complex checks in __newton required to prevent the
71	search going wildly off-track. For a known precision, it may also be possible to
72	fix the number of iterations, allowing inlining and loop unrolling. It also
73	algebraically simplifies the Halley steps leading to a big reduction in the
74	number of floating point operations required compared to a "black box" implementation
75	that calculates the derivatives seperately and then combines them in the Halley code.
76	Typically, it was found that computation using type `double`
77	took a few times longer when using the various root-finding algorithms directly rather
78	than the hand coded/optimized `cbrt` routine.
79
80	* The importance of getting a good guess can be seen by the iteration count for the multiprecision case:
81	here we "cheat" a little and use the cube-root calculated to double precision as the initial guess.
82	The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
83
84	* For __fundamental_types, there was little to choose between the three derivative methods,
85	but for __cpp_bin_float, __newton was twice as fast. Note that the cube-root is an extreme
86	test case as the cost of calling the functor is so cheap that the runtimes are largely
87	dominated by the complexity of the iteration code.
88
89	* Compiling with optimisation halved computation times, and any differences between algorithms
90	became nearly negligible. The optimisation speed-up of the __TOMS748 was especially noticable.
91
92	* Using a multiprecision type like `cpp_bin_float_50` for a precision of 50 decimal digits
93	took a lot longer, as expected because most computation
94	uses software rather than 64-bit floating-point hardware.
95	Speeds are often more than 50 times slower.
96
97	* Using `cpp_bin_float_50`, __TOMS748 was much slower showing the benefit of using derivatives.
98	__newton was found to be twice as quick as either of the second-derivative methods:
99	this is an extreme case though, the function and its derivatives are so cheap to compute that we're
100	really measuring the complexity of the boilerplate root-finding code.
101
102	* For multiprecision types only one or two extra ['iterations] are needed to get the remaining 35 digits, whatever the algorithm used.
103	(The time taken was of course much greater for these types).
104
105	* Using a 100 decimal-digit type only doubled the time and required only a very few more iterations,
106	so the cost of extra precision is mainly the underlying cost of computing more digits,
107	not in the way the algorithm works. This confirms previous observations using __NTL high-precision types.
108
109	[include root_comparison_tables_msvc.qbk]
110	[include root_comparison_tables_gcc.qbk]
111
112	[endsect] [/section:cbrt_comparison Comparison of Cube Root Finding Algorithms]
113
114	[section:root_n_comparison Comparison of Nth-root Finding Algorithms]
115
116	A second example compares four generalized nth-root finding algorithms for various n-th roots (5, 7 and 13)
117	of a single value 28.0, for four floating-point types, `float`, `double`,
118	`long double` and a __multiprecision type `cpp_bin_float_50`.
119	In each case the target accuracy was set using our "recomended" accuracy limits
120	(or at least limits that make a good starting point - which is likely to give
121	close to full accuracy without resorting to unnecessary iterations).
122
123	[table
124	[[Function][Precision Requested]]
125	[[TOMS748][numeric_limits<T>::digits - 2]]
126	[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
127	[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
128	[[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
129	]
130	Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
131	[@../../example/root_n_finding_algorithms.cpp root_n_finding_algorithms.cpp].
132
133	The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
134
135	To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
136	More than one result can be 'best' when normalized times are indistinguishable
137	within the uncertainty.
138
139	[/include roots_table_100_msvc.qbk]
140	[/include roots_table_75_msvc.qbk]
141
142	[/include roots_table_75_msvc_X86.qbk]
143	[/include roots_table_100_msvc_X86.qbk]
144
145	[/include roots_table_100_msvc_AVX.qbk]
146	[/include roots_table_75_msvc_AVX.qbk]
147
148	[/include roots_table_75_msvc_X86_SSE2.qbk]
149	[/include roots_table_100_msvc_X86_SSE2.qbk]
150
151	[/include roots_table_100_gcc_X64_SSE2.qbk]
152	[/include roots_table_75_gcc_X64_SSE2.qbk]
153
154	[/include type_info_table_100_msvc.qbk]
155	[/include type_info_table_75_msvc.qbk]
156
157	[include roots_table_100_msvc_X86_SSE2.qbk]
158	[include roots_table_100_msvc_X64_AVX.qbk]
159	[include roots_table_100_gcc_X64_SSE2.qbk]
160
161	Some tentative conclusions can be drawn from this limited exercise.
162
163	* Perhaps surprisingly, there is little difference between the various algorithms for __fundamental_types floating-point types.
164	Using the first derivatives (__newton) is usually the best, but while the improvement over the no-derivative
165	__TOMS748 is considerable in number of iterations, but little in execution time. This reflects the fact that the function
166	we are finding the root for is trivial to evaluate, so runtimetimes are dominated by the time taken by the boilerplate code
167	in each method.
168
169	* The extra cost of evaluating the second derivatives (__halley or __schroder) is usually too much for any net benefit:
170	as with the cube root, these functors are so cheap to evaluate that the runtime is largely dominated by the
171	complexity of the root finding method.
172
173	* For a __multiprecision floating-point type, the __newton is a clear winner with a several-fold gain over __TOMS748,
174	and again no improvement from the second-derivative algorithms.
175
176	* The run-time of 50 decimal-digit __multiprecision is about 30-fold greater than `double`.
177
178	* The column 'dis' showing the number of bits distance from the correct result.
179	The Newton-Raphson algorithm shows a bit or two better accuracy than __TOMS748.
180
181	* The goodness of the 'guess' is especially crucial for __multiprecision.
182	Separate experiments show that evaluating the 'guess' using `double` allows
183	convergence to the final exact result in one or two iterations.
184	So in this contrived example, crudely dividing the exponent by N for a 'guess',
185	it would be far better to use a `pow<double>` or ,
186	if more precise `pow<long double>`, function to estimate a 'guess'.
187	The limitation of this tactic is that the range of possible (exponent) values may be less than the multiprecision type.
188
189	* Using floating-point extension __SSE2 made a modest ten-percent speedup.
190
191	*Using MSVC, there was some improvement using 64-bit, markedly for __multiprecision.
192
193	* The GCC compiler 4.9.1 using 64-bit was at least five-folder faster that 32-bit,
194	apparently reflecting better optimization.
195
196	Clearly, your mileage [*will vary], but in summary, __newton seems the first choice of algorithm,
197	and effort to find a good 'guess' the first speed-up target, especially for __multiprecision.
198	And of course, compiler optimisation is crucial for speed.
199
200	[endsect] [/section:root_n_comparison Comparison of Nth-root Finding Algorithms]
201
202	[section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]
203
204	A second example compares four root finding algorithms for locating
205	the second radius of an ellipse with first radius 28 and arc length 300,
206	for four floating-point types, `float`, `double`,
207	`long double` and a __multiprecision type `cpp_bin_float_50`.
208
209	Which is to say we're solving:
210
211	[pre 4xE(sqrt(1 - 28[super 2] / x[super 2])) - 300 = 0]
212
213	In each case the target accuracy was set using our "recomended" accuracy limits
214	(or at least limits that make a good starting point - which is likely to give
215	close to full accuracy without resorting to unnecessary iterations).
216
217	[table
218	[[Function][Precision Requested]]
219	[[TOMS748][numeric_limits<T>::digits - 2]]
220	[[Newton][floor(numeric_limits<T>::digits * 0.6)]]
221	[[Halley][floor(numeric_limits<T>::digits * 0.4)]]
222	[[Schr'''ö'''der][floor(numeric_limits<T>::digits * 0.4)]]
223	]
224	Tests used Microsoft Visual Studio 2013 (Update 1) and GCC 4.9.1 using source code
225	[@../../example/root_elliptic_finding.cpp root_elliptic_finding.cpp].
226
227	The timing uncertainty (especially using MSVC) is at least 5% of normalized time 'Norm'.
228
229	To pick out the 'best' and 'worst' algorithms are highlighted in blue and red.
230	More than one result can be 'best' when normalized times are indistinguishable
231	within the uncertainty.
232
233	[include elliptic_table_100_msvc_X86_SSE2.qbk]
234	[include elliptic_table_100_msvc_X64_AVX.qbk]
235	[include elliptic_table_100_gcc_X64_SSE2.qbk]
236
237	Remarks:
238
239	* The function being solved is now moderately expensive to call, and twice as expensive to call
240	when obtaining the derivative than when not. Consequently there is very little improvement in moving
241	from a derivative free method, to Newton iteration. However, once you've calculated the first derivative
242	the second comes almost for free, consequently the third order methods (Halley) does much the best.
243	* Of the two second order methods, Halley does best as would be expected: the Schroder method offers better
244	guarantees of ['quadratic] convergence, while Halley relies on a smooth function with a single root to
245	give ['cubic] convergence. It's not entirely clear why Schroder iteration often does worse than Newton.
246
247	[endsect][/section:elliptic_comparison Comparison of Elliptic Integral Root Finding Algoritghms]
248
249	[endsect] [/section:root_comparison Comparison of Root Finding Algorithms]
250