Density Results by Deep Neural Network Operators with Integer Weights

. In the present paper, a new family of multi-layers (deep) neural network (NN) operators is introduced. Density results have been established in the space of continuous functions on [ − 1 , 1], with respect to the uniform norm. First, the case of the operators with two-layers is considered in detail, then the definition and the corresponding density results have been extended to the general case of multi-layers operators. All the above definitions allow us to prove approximation results by a constructive approach, in the sense that, for any given f all the weights, the thresholds, and the coefficients of the deep NN operators can be explicitly determined. Finally, examples of activation functions have been provided, together with graphical examples. The main motivation of this work resides in the aim to provide the corresponding multi-layers version of the well-known (shallow) NN operators, according to what is done in the applications with the construction of deep neural models.


Introduction
The study of deep neural networks (NNs) currently represents one of the most studied topic (see, e.g., [25,39,40]), in view of the possible implications regarding several application aspects ( [26,28]), including artificial intelligence and machine learning.A very complete overview on the topic can be found, e.g., in [35].
In particular, it is known since the end of the 80's that among the main tasks that can be performed by NNs there is the functions approximation.For this regard, in the literature can be found a wide number of articles, in particular for what concerns the approximation by one-layer (shallow) NNs.One of the pioneering work in this sense is due to Cybenko [20], who established a uniform approximation result in case of NNs activated by sigmoidal function.Cybenko's approximation theorem was proved by resorting to non-constructive arguments, and exploiting the well-known Hahn-Banach theorem of functional analysis.More precisely, he proved that the vector space containing one-layer NNs activated by continuous and non-decreasing sigmoidal functions is dense with respect to the uniform topology in the space of continuous functions defined on the multivariate set [0, 1] In other words, he claims that a continuous function of several variables can be uniformly approximated by any degree of accuracy by the superposition of certain (univariate) sigmoidal functions using a sufficiently high number of artificial neurons.The latter fact is known in the literature as the so-called "universal approximation property" in the spirit of the well-known Kolmogorov approximation theorem (see, e.g., [27]).
Later on, several authors studied the above problem; the peculiarity of many of them was that they are faced by non-constructive techniques.Among them we can quote, e.g., [5,31,32,33].
However, especially for the applications, it can be important to have at disposal constructive approximation algorithms; this has motivated several research in this direction, that mainly concerned shallow NNs (see, e.g., [11,29] for some old papers, or, e.g., [8,9,21] for some more recent ones).
For instance, the theory of neural network (NN) operators introduced with the work of Cardaliaguet and Euvrard [10] is one of the possible ways to approach the above problem.The main limit of the theory proposed in [10] was that it could be applied only in case of bell-shaped activation functions with compact support.This limit has been overcome in [8], where NN operators (of one-variable) activated by the logistic function have been considered.Subsequently, in [16,17] the theory of the NN operators has been formulated (for functions of one and several variables) for a wide class of sigmoidal activation functions (including, of course, the logistic function) satisfying suitable assumptions.
For the above reason, in the present paper we introduce a family of multilayers (deep) NN operators, and we provide constructive approximation results.More precisely, we provide density theorems for the family of multi-layers (deep) NN operators in the space of continuous functions on the interval [−1, 1].
The main motivation of this work resides in the aim to provide the corresponding multi-layers version of the well-known (shallow) NN operators, according to what is done in the applications with the construction of deep neural models.
From the mathematical point of view, the family of operators here introduced consists in the evaluation of the approximation performances of the nested application of families of positive linear operators.This is done exploiting the "density approach", as typically occurs with NN type approximation, see, e.g., [22,23]; due to this property one can usually claim that NNs are universal approximators.
The proposed approach can be defined "constructive" in the sense that for any given f all the weights, the thresholds, and the coefficients of the NN operators can be explicitly determined.
For the sake of clarity, in the paper we first propose the theory in case of two-layers operators (see Section 3), and then we propose its multi-layers generalization (see Section 4).
The above results have been proved in case of sigmoidal activation functions σ; all the required assumptions on σ, together with some preliminary considerations have been recalled in Section 2. However, other than sigmoidal activation functions, we also show that (see Section 5) deep NN operators based upon the well-known ReLU (Rectified Linear Unit, [38]) activation function, and also on RePUs (rectified powers units functions, [30]) are included in the present theory.Finally, graphical examples have been provided with the purpose of illustration.

Preliminaries
A measurable function σ : R → R is called a sigmoidal function if: From now on, we always consider non-decreasing sigmoidal functions σ, satisfying the following assumptions: (Σ1) σ(x) − 1/2 is an odd function; for some α > 0. Now, we can recall the definition of the density function (see, e.g., [16]): Note that, based on the above properties, it is not difficult to see that 0 ≤ σ ′ (x) ≤ σ ′ (0), x ∈ R, then it turns out that σ (and obviously also ϕ σ ) is Lipschitz continuous on R. Further, it is well-known (see [16]) that under the above assumptions the function ϕ σ turns out to be non-negative, even, and it satisfies the following property: ϕ σ (x) is non-decreasing for x < 0 and non-increasing for x ≥ 0, ( with ϕ σ (1) > 0, and ϕ σ (x) = O(|x| −α ), as x → ±∞, where α is the positive constant of condition (Σ3).Hence, it turns out that Moreover, based on the above assumptions, it is not difficult to see that (see [16] again): for every x ∈ [−1, 1] and n ∈ N + .
Remark 1.Note that, if we remove condition (Σ2) on σ, and we assume directly that ϕ σ satisfies (2.1) together with ϕ σ (1) > 0, the theory still holds, see e.g.[16].The consequence of the above observation is that, we can apply the above theory to C 2 as well as to non-smooth sigmoidal functions, such that the corresponding ϕ σ satisfies (2.1) and ϕ σ (1) > 0.
Now, we can define the discrete absolute moments of ϕ σ , as follows: It is well-known (see, e.g., [7]) that under the above assumptions, and assuming in addition that the parameter α > 1, it turns out that: Now, we recall the (shallow) neural network (NN) operators, introduced in [16].
Let σ be a sigmoidal function assumed as above.We define the (shallow) NN operators, by: Notice that, F n are well-defined; this is a consequence of the previous properties of the function ϕ σ .Further, it is well-known ( [16]) the following convergence theorem holds.

Two-layers NN operators: approximation results
We now introduce the following family of deep NN operators.Definition 1.Let σ be a sigmoidal function, assumed as in Section 2. We introduce the two-layers (deep) neural network operators as follows: Note that, the operators D 2 (n1,n2) f turns out to be well-defined, e.g., for bounded functions.Indeed, using (2.2), we have: for every x ∈ I, and Remark 2. We can observe that the two-layers NN operators are in fact deep NNs, organized in two layers.This can be simply seen observing that, in fact, the above operators are constructed by a mathematical composition of two, one-layer NNs.In practice, two simple NNs have been nested to get a more complex architecture composed of (2n 1 + 1) + (2n 2 + 1) artificial neurons.Since the structure of the considered operators is slightly different from classical NNs, below we can provide a comparison of D 2 (n1,n2) f (for any fixed f ) with classical feed-forward neural network models.Indeed, one and two layer classical feed-forward NNs activated by a sigmoidal function σ are defined (from the mathematical point of view) by: respectively.In the above definitions, w k1 and w k2 are the weights, θ k1 and θ k2 are thresholds, and finally a k1 and a k2 are the coefficients of the networks.Comparing N 2 with D 2 (n1,n2) f we can observe what follows.(i) In the inner layer of D 2 (n1,n2) f (with 2n 2 +1 neurons, that can be compared with N 1 ), we have the network: where the integer parameter n 2 represents the (constant) weights, the parameter j (with j = −n 2 , ..., n 2 ) provide the threshold values (or bias), while the values: are the coefficients of the inner network.
(ii) Concerning the second layer of D2 (n1,n2) f with 2n 1 + 1 neurons, we have that the integer parameter n 1 represents the weights, k (with k = −n 1 , ..., n 1 ) are the threshold values, and finally: are the coefficients.
(iii) The activation functions of the operators D 2 (n1,n2) f are defined using sigmoidal functions, as happens in N 1 and N 2 .
We can note that the denominators in the definition of the above operators (that is crucial from the mathematical point of view) can be seen (see Equations (3.1) and (3.2)) as a part of the coefficients of both layers of D 2 (n1,n2) .This can be interpreted as the presence of a further connection among the various neurons of each layer.More precisely, the above denominators show that the input layer has a double (direct) connection with both the involved layers of the network.
Obviously, in view of the above comparison we can finally observe that the NN operators D 2 (n1,n2) can not be viewed as usual two-layer NNs, since coefficients stated in (3.1) and (3.2) are not constants, as instead happens for N 1 and N 2 .Now, we can prove the following density theorem for the family of two-layers NN operators into the space C(I) of all real valued continuous functions on I, with respect to the usual norm ∥ • ∥ ∞ .
Theorem 2. Let σ be a sigmoidal function satisfying assumption (Σ3) with α > 1, and let f ∈ C(I) be fixed.For any ε > 0 there exist n 1 , n 2 ∈ N + such that: Proof.Let ε > 0.Then, using Theorem 1 we know that, corresponding to ε/2 there exists n 1 ∈ N + sufficiently large such that: Now, we can also choose n 2 ∈ N + such that: where the function θ(x) := x, x ∈ I. Now, for every fixed x ∈ I we have: Concerning I 1 , recalling that ϕ σ is Lipschitz continuous with Lipschitz constant |σ ′ (0)|, we can write what follows: Hence the proof follows by observing that also I 2 < ε/2 in view of (3.3).⊓ ⊔ Remark 3. Note that the density approach is one of the most common when one deals with NN type approximation.In this sense, we can refer, e.g., to the Cybenko approximation theorem [20] (inspired by the well-known Kolmogorov representation theorem).

Multi-layers NN operators
In Section 3, we introduced and studied a family of deep NN operators with two layers.In this section, the above definition and the corresponding density results will be extended to the more general context of multi-layers NN operators.The idea is to extend the above definition proceeding by induction on the number of layers.
Definition 2. Let σ be a sigmoidal function, assumed as in Section 2. We define the m-layers (deep) NN operators, m ∈ N, m ≥ 2, as follows: where f : n 3 , ..., n m ), and: Clearly, for m = 2 the above deep NN operators coincide with those considered in Section 3.
Further, we can observe that the multi-layers NN operators are well-defined for every m ≥ 2, for any bounded function f .Indeed, it is easy to see that: + such that: Proof.We proceed by induction on m ≥ 2. In the case m = 2 the density result immediately follows by Theorem 2. We now suppose that the thesis holds for m − 1 ≥ 2. Let now f ∈ C(I) and ε > 0 be fixed.In view of Theorem 1, we can choose n ∈ N + such that: Further, by the inductive assumption, in correspondence to ε/2 there exists , such that: .

Now, we set n
Thus, in the case of m-layers NN operators we can write what follows: Let now x ∈ I be fixed.Using the fact that ϕ σ is Lipschitz continuous with Lipschitz constant |σ ′ (0)|, we immediately have: This completes the proof.⊓ ⊔

Activation functions and graphical examples
As a first example, we study in detail the case of the deep NN operators activated by the well-known logistic function: It is well-known (see, e.g., [16,19]) that the logistic function is a Lipschitz continuous function satisfying (Σ1), (Σ2) and (Σ3).In particular, due to its exponential decay to zero as x → −∞, σ ℓ fulfills condition (Σ3) for every α > 0. Based on the above considerations, we deduce that in case of the logistic function, the density results established in the previous section hold, hence we can formulate what follows.+ such that: As a second example, we consider the sigmoidal function σ M d (x) (introduced in [18]) that are associated to the central B-spline ( [6,12]): The sigmoidal function σ M d (x) ( [18]) are defined by the following integralformula: and the corresponding density functions assume the following expression: In [18] it has been proved that σ M d (x) satisfies all the assumptions required in Section 2. In particular, since the central B-spline has compact support, the σ M d (x) = 0, for every x < −T , where T > 0 is a suitable constant.This shows that assumption (Σ3) is satisfied by σ M d (x), for every α > 0.
As a consequence, the following corollary can be formulated.the m-layers NN operators activated by σ M d , for every ε > 0 there exists n [m] ∈ N m + such that: Finally, we also observe that, in case of d = 1: The function σ M1 (x) is also known with the name of the ramp function, see e.g., [9,13,14].Now, if we recall the definition of the well-known ReLu activation function (see, e.g., [21]): ψ ReLu (x) := (x) + , x ∈ R, it turns out that: then, the corresponding density function can be expressed in term of ReLu activation function: As a consequence of the above relation, the deep NN operators D m,σ M 1 n can be considered as a deep NN activated by the above linear combination of ReLu activation function.For more details concerning the usefulness of ψ ReLu , see, e.g., [38].
Note that, central B-splines, and consequently the corresponding σ M d , can be easily expressed in terms of powers of ψ ReLU .In the theory of NNs, powers of the ReLU function are known with the name of rectified power unit functions (RePUs).Hence, reasoning in a similar way to the case of ReLu function, it is clear that the above results hold also in case of deep NN operators activated by suitable combinations of RePUs activation functions (see, e.g., [30]).
Finally, we provide the following graphical examples with the main purpose to illustrate the approximation performances of the deep NN operators.For instance, here we consider the continuous function f (x) = x 2 − 1 + sin x, on the interval [−1, 1]; approximation of f by means of the 2-layers and 3-layers NN operators activated by the logistic function have been considered in Figure 1.

Conclusions
The multi-layers (deep) neural network operators introduced and studied in the present paper allows to establish constructive approximation results by a family of deep neural networks.The present theory deals with the approximation of functions of one-variable.It is well-known that the theory of artificial neural networks is mainly a multivariate theory; indeed the present results lead the way to the introduction of the corresponding version of deep NN operators for approximating functions of several variables.This will be done in a future work following the strategy depicted in [17].
In recent years, the theory of multi-layer NNs has been deeply studied, see, e.g., [34], in view of its wide importance in both theoretical and applied fields.Concerning very recent approximation results (that can be interpreted as density theorems) for the above tools, one can see, e.g., [36].In the latter paper, the authors established the order of approximation for deep (two or multi-layer) NNs activated by ReLU functions, in case of multivariate H ölder continuous functions.One of the main aspects arising from the above result was that the approximation error depends not only from the number of the considered neurons but also from the depth of the net, i.e., by the number of considered layers.Indeed, increasing the number of the layers in the NNs the accuracy of the approximation improves.
In view of the importance covered by the above topic, in the present paper we decided to introduce a multi-layer (deep) version of the so-called NN operators.Here, only the density problem has been treated, however the considered results are not limited to the case of the ReLU function, but hold also for a class of activation functions including RePUs and sigmoidal functions.Obviously, the role that the additional layers of the NN operators play for the degree of accuracy of the achieved approximations will be the object of a future investigation.Actually, the graphical results shown in the previous section seems to suggest the fact that, also in this case, by increasing the number of layers, the corresponding order of approximation improves.Finally, also the problem of best approximation will be considered in detail in a future study.

Theorem 3 .
for every x ∈ I and n [m] ∈ N m + .Now, we are able to prove the following density result for the family of multi-layers (deep) NN operators in the space C(I) with respect to the uniform norm.Let σ be a sigmoidal function satisfying assumption (Σ3) with α > 1.Further, let m ∈ N, m ≥ 2 and f ∈ C(I) be fixed.Then, for every ε > 0 there exists n [m] ∈ N m

Corollary 1 .
Let σ ℓ be the logistic function, m ∈ N, m ≥ 2, and f ∈ C(I) be fixed.Then, denoting by D σ ℓ ,m n [m] the m-layers NN operators activated by σ ℓ , for every ε > 0 there exists n [m] ∈ N m

Corollary 2 .
Let σ M d , d ∈ N, m ∈ N, m ≥ 2, and f ∈ C(I) be fixed.Then, denoting by D σ M d ,m n [m]

Figure 1 .
Figure 1.On the left: the plots of the function f (red line) and of the 2-layers NN operators D 2 (10,10) f (blue dots) activated by the logistic function.On the right: the plots of the function f (red line) and of the 3-layers NN operators D 3 (10,10,10) f (blue dots) activated by the logistic function.