Higher accuracy order in diﬀerentiation-by-integration

In this text explicit forms of several higher precision order kernel functions (to be used in the diﬀerentiation-by integration procedure) are given for particular derivative orders. A system of linear equations is formulated which allows to construct kernels with arbitrary precision for arbitrary derivative order. A study on a real computer is performed and it is shown that numerical diﬀerentiation based on these kernels performs much better (w.r.t errors) than the same procedure based on usual Legendre-polynomial kernels. Presented results may have implications for numerical implementations of the diﬀerentiation-by-integration method.


Introduction
Derivative, if exists, can be expressed in an alternative way where k is an appropriate kernel function and n indicates the derivative order.The method is know as "differentiation by integration" (DbI) and an example was first published by Cioranescu [1].Later, Lanczos [2] studied a version of (1) with kernel and his name became associated with this type of differentiation 1 .DbI has some interesting features because it generalizes the ordinary derivative in several aspects.It may be used in situations where the latter does not exist, or, when generalized to higher orders, it can be (in some cases) seen as a way to define a fractional-order derivative [3].The DbI also attracts attention for its potential applications in numerical analysis [4,5,6,7,8].
In published texts [9,10,11,12,13,14] two dominant ideas can be seen: • The least-squares property: the derivative as defined by the Lanczos' formula (1)(2) behaves as the slope parameter of a line which is in the leastsquares manner adjusted to the function f on the interval [−h, h].The idea was further generalized to higher derivatives in [10] where the kernel functions are based on the Legendre polynomials.This property is to be seen as very closely linked to the Savitzky-Golay filter [15] and related way of numerically computing a derivative from discrete data sets.
• Orthogonality approach: if one expands a function f into Taylor series one can search for a function k orthogonal to desired powers of t n so as to make vanish all terms except those which we are interested in.For instance, to extract Dth derivative one can use a polynomial orthogonal to all t n for n < D (e.g.Legendre polynomial L D ).If one then multiplies the resulting expression by an appropriate power of h, the limit h → 0 makes disappear all higher terms n > D and isolates the one with the Dth derivative.This approach can provide large generalizations (as presented in [13]) but remains limited to analytic functions f .
A universal procedure for constructing a kernel function for the first derivative was presented in [5], a full generalization to all derivative orders was given in [16].A function k is a valid kernel function for computing Dth derivative if and only if where ω is a weight function satisfying assuming that all appearing integrals and derivatives exist.The Dth derivative is then computed as In the standard finite-difference approach is a large emphasis (in relation to numerical applications) given to the accuracy order in the discretizationparameter (h) expansion.It is know that the higher-order approaches perform much better than the lower-order ones2 .Systematic study of these effects is somewhat missing with respect to the DbI methods.Higher accuracy orders might break the "least-square" behavior, but lead (as demonstrated later) to largely improved overall numerical precision.In existing literature, formulas with higher accuracy order can be found (e.g.formula 4.1 in [13]), however explicit kernel function forms which could be directly used as a recipe for an immediate implementation are missing.One also lacks a numerical study of performance of such higher-order methods.The aim of this text is to • provide explicit expressions for kernel functions for first few orders in precision and derivative (Sec.2).Further, we want to • give an explicit formulation of a linear system which, when solved, leads to kernel functions for any derivative order and any order in precision (Sec. 2).We also want to • perform a numerical study of the higher accuracy order approach (Sec.3).
We will briefly discuss the presented results in Sec. 4 and close the text by providing summary and conclusion (Sec.5).

Higher order formulas
Different strategies can be adopted when constructing higher precision order kernels.Orthogonality approach is certainly an option: by an appropriate choice of a kernel k one can set to zero all h n terms in (3) up to a desired value of n < N , with exception of the term containing the derivative one is interested in.Yet, we prefer to chose a different strategy which stems from basic principles (4), ( 5) and (6).We adopt the most natural choice and search for kernels in form of least-degree polynomials.This minimalist approach is allowed by a "brute force" computation which provides us with full control over coefficients and leads to explicit relations between them.A general polynomial respecting ( 6) is written as where N is normalization, D is derivative order and C a,b represents binomial coefficient a b .Formally appearing non-integer powers in the last line are canceled by the 1 + (−1) j term.The two sums can be multiplied where with H indicating a step function The condition (5) implies the normalization where symbol . . .stands for the floor function.For the needs of formula (4) one has to compute higher-order derivatives (12) Expanding a function f into Taylor series and plugging ( 12) into (7) using (4) one gets3 where one controls the accuracy order by adjusting the coefficients in front of the powers of h.The series formally contains also negative powers (for m < D), formula (7) however implies that their coefficients are identically zero.Further more, formula (7) also implies4 that the overall coefficient in front of the constant term (h 0 ) is f (D) .The interesting terms are so those with m > D.
An important observation is that the odd powers of h are controlled only by odd-indexed coefficients a 1 , a 3 , a 5 , . ...It can be verified in an explicit way, that the index of Ω has the same parity as the power of h (symbol ∼ means conservation of index and power-related properties) where the factor 1 + (−1) m+n allows only for even-valued sums of the two numbers m + n = 2i.The expression for Ω Z is, on its turn, given only by terms whose coefficients a i have index of the same parity as Z: If k is even, then 1 + k is odd and 1 + (−1) 1+k is zero for all related a k terms.Therefore Ω 2i+1 depends only on the odd-index a i coefficients.This being established, in what follows we will study only even-index based weight functions 5 , i.e. even function of x on the interval [−1, 1].Even-indexed coefficients, controlling the even powers of h, cannot be set all to zero because one needs to fulfill the normalization condition (5).
We are now in the situation, where we need to adjust the even-indexed coefficients a 0 , a 2 , a 4 , . . .appearing in . .so as to make chosen c i coefficients vanish.Let us denote the highest even-power h term which is meant to be set to zero (together with all lower terms) by h 2i .For achieving that we need to match i (even-indexed) coefficients and formulate i equations.In a compact form the jth equation can be written (1 ≤ j ≤ i) where the dependence of α n on coefficients is is assumed in all expressions.Here, without loss of generality, we set the first coefficient to one a 0 = 1.Initially a general number, N (as an unspecified number of polynomial terms) can be cut to 2i (it comprises all coefficients, oddindexed included) and is so determined only by the accuracy order.Once the system is solved and a i coefficients are found, one computes the normalization N from (11) and constructs the weight function (8).The kernel function is given as its Dth derivative (4).We will note ω D those functions, which provide Dth derivative with order of precision O h 2i+2 . This notations reflects the fact that the highest even-power term which is annihilated is h 2i .Since the next one is also zero (because has an odd power), the first nonzero term corresponds to the power 2i + 2. Solving system (13) one gets as example results The functions are depicted in Fig.

Numerical investigations
The topic under consideration is, from the perspective of numerical study6 , rather large and multidimensional.One could by interested in the dependence of results on the • size of the discretization parameter h, or on the • precision order of the kernel function, or on the • derivative order of the kernel function, or on the • choice of the test functions.
To reduce the complexity, we opt for our purposes only for the O h 6 precision kernels k 2i=4

D
. Next, the optimal size of the discretization parameter is analyzed.For that we select 3 test functions with fixed point of differentiation x 0 sin (x) with x 0 = 1; exp (x) with x 0 = π and ln (x) with x 0 = 1 2 .
Using these settings we scan 8 orders of magnitude in discretization parameter 10 −8 ≤ h ≤ 10 −1 , changing the step in geometrical progression with factor 10.
For comparison purposes we add results from the least-squares (LS) approach, which is based on the Legendre-polynomial kernels.
As seen from the summary Table (1), the optimal h for higher precision order (HO) methods is h ∼ 10 −1 , 10 −2 , for the LS methods it is systematically smaller 7 h ∼ 10 −3 , 10 −4 .The important observation is that the HO kernels significantly over-perform the LS kernels (if each method uses its own optimal value of h).If the HO kernels are used with an h-size that is optimal for the LS methods, the HO results remain competitive.
Fixing the discretization parameter size to near-optimal values h HO = 10 −2 and h LS = 10 −3 , we study the error behavior on a whole interval: [−π, π] for sin (x), [−2, 2] for exp (x) and [0.1, 2] for ln (x).Results are shown in Fig. (2).Analyzing them, one can draw a straightforward conclusion: when interested in absolute errors, the HO differentiation provide several orders of magnitude more precise results than the LS methods and, when it comes to implementing a numerical differentiation-by-integration method on a computer, the approach based on the HO kernels is a clearly the preferred option.

Discussion
An interesting lesson can be learnt from previous observations: if a functionrelated quantity Q is to be extracted in a numerical way from the function behavior on the interval [−h, h], where h is small enough and can play a role of a power-expansion parameter, then the "least-squares" optimum does not need to be the "Q-extraction" optimum.The results suggest, that a more effective way might consist in expanding Q into powers of h: Q = q 0 + q 1 h + q 2 h 2 + . . .and finding a method which reaches a higher precision with respect to this expansion.
One should also address the question of numerical (i.e.round-off) errors.This text deals with the so-called discretization error which can be made arbitrary small and can be handled with full mathematical rigorosity.However, on a real computer the use of a (very) high-accuracy approach would fail: as seen from Fig. (1) higher-order weight functions become more sharply peaked near zero which can, in some sense, be interpreted as effective shrinking of the interval [−h, h].If integral becomes small, numerical errors grow.In addition, higher-order weight functions (and thus kernel functions too) become more and more oscillatory, which further worsens numerical uncertainties.One can also notice, that with increasing precision order kernel function become numerically large.
Numerical errors are difficult to handle: a reliable mathematical treatment of the float-point arithmetic in computer registers is presumably a very hard task.In principle one cannot exclude a different behavior of numerical errors for HO and LS approaches, but such differences (if existing and significant) are hard to estimate in an exact way.Rather than trying to quantify this issue, a trustful (we believe) conclusion can be deduced from the numerical tests in Sec. 3. Whatever the behavior of numerical errors is, the HO differentiation provides in studied examples a significantly smaller overall (total) error.It is reasonable to assume that this is a standard behavior8 of the HO method when compared to the LS approach.

D
Opt.
for Table 1: Absolute error for 4 derivatives orders (D), two kernel-function types (higher order HO vs. least-squares LS) and three test functions with fixed test points.Optimal value of the discretization parameter h is tuned for one of the methods (in bold), the the result of other method is shown also.

Summary and conclusion
By using straightforward calculations we constructed a system of linear equations (13) which leads to arbitrary precision order kernel functions for arbitrary derivative in the DbI procedure.Numerical tests show that these kernels significantly over-perform (with respect to precision) standard least-squares based kernels, which were most widely used and discussed up to now.If a DbI method is to be implemented on a computer, then the results provide strong evidence for preferring higher precision order kernels.The text also opens a general question of extracting a function-related quantity from behavior of the function on a small interval and promotes a method of higher precision order linked to a small parameter (interval length) expansion.

Appendix
List of higher accuracy order weight and kernel functions.Kernel functions k are to be used in formula (7), notation from the end of Sec. 2 is adopted.
• First derivative

Figure 2 :
Figure 2: Absolute errors of the HO approach (full blue line) and LS-based methods (dotted red line) for 3 functions (in columns) and 4 derivative orders (rows).