Multidimensional wavelet neural networks based on polynomial powers of sigmoid: a framework to image verification

Wavelet functions have been used as the activation function in feed forward neural networks. An abundance of R&D has been produced on wavelet neural network area. Some successful algorithms and applications in wavelet neural network have been developed and reported in the literature. However, most of the aforementioned reports impose many restrictions in the classical back propagation algorithm, such as low dimensionality, tensor product of wavelets, parameters initialization


Introduction
Wavelet functions have been successfully used in many problems as the activation function of feed forward neural networks. There are claims that many biological fundamental properties can emerge from wavelet transformation in Marar (1997). An abundance of R&D has been produced on wavelet neural network area. Some successful algorithms and applications in wavelet neural network have been developed and reported in the literature Zhang and Benveniste (1992); Marar (1997); Oussar and Dreyfus (2000); Chen and Hewit (2000); Zhang and San (2004); Fan and Wang (2005); Zhang and Pu (2006); Chen et al. (2006); Avci (2007); Jiang et al. (2007); Misra et al. (2007).
However, most of the aforementioned reports impose many restrictions in the classical back propagation algorithm, such as low dimensionality, tensor product of wavelets, parameters initialization, and, in general, the output is one dimensional, etc.
In order to remove some of these restrictions, we develop a robust Three Layer PPS-Wavelet multi-dimensional strongly similar to classical Multilayer Perceptron. The great advantage of this new approach is that PPS-Wavelets others the possibility choice of the function that will be used in the hidden layer, without need to develop a new learning algorithm. This is a very interesting property for the design of new wavelet neural networks architectures. This paper is organized as follows. Section "Function approximation" covers basic theoretical aspects in function approximation. Section "Wavelet functions" introduces the wavelet sigmoidal function. Section "Polynomial powers of Sigmoid" presents the framework used in this research. Section "Human face verification" deals with application of face verification problem.

Function approximation
Multilayer perceptron networks (MLP) have been intensely studied as efficient tools for arbitrary function approximation. Amongst the developments achieved in the theory of function approximation using MLP, the work carried out by Hecht-Nielsen resulted in an improved version for the superposition theorem defined by Sprecher in Hecht-Nilsen (1987). Galant and White in 1988 showed that a feed forward network with one hidden layer of processing units that use flat cosines as the activation function correspond to a special case of Fourier networks that can approximate a Fourier series for a given function. Cybenko developed a rigorous demonstration that MLPs with only one hid-den layer of processing elements is sufficient to approximate any continuous function with support in a hypercube by Cybenko (1989).
The theorem is directly applied to MLP. The sigmoid, radial basis and wavelets functions are a common choice for the network construction since it satisfies the conditions imposed in the theorem. The theorem of function approximation provides a mathematical basis that gives support to the approximation of any continuous arbitrary function. Furthermore, it defi nes for the case of MLP that a network composed of only one hidden layer neurons is sufi cient to compute, in a given problem, a mapping from the input space to the output space, based on a set of training examples. However, with respect to training speed and ease of implementation, the theorem does not provide any insight about the solutions developed. The choice of activation functions and the learning algorithm defi nes which particular network is used. In any situation, the neurons operate as a set of functions that generate an arbitrary basis for function approximation which is defi ned based on the information extracted from the input-output pairs. For training a feed forward network, the back propagation algorithm is one of the most frequently employed in practical applications and can be seen as an optimization.

Wavelet functions
Two categories of wavelet functions, namely, orthogonal wavelets and wavelet frames (or non-orthogonal), were developed separately by difi erent interests. An orthogonal basis is a family of wavelets that are linearly independent and mutually orthogonal, this eliminates the redundancy in the representation. However, orthogonal wavelets bases are difi cult to construct because the wavelet family must satisfy stringent criteria in Daubechies (1992); Chui (1992). This way, for these difi culties, orthogonal wavelets is a serious draw-back for their application to function approximation and process modeling in Oussar and Dreyfus (2000). Conversely, wavelet frames are constructed by simple operations of translation and dilation of a single fi xed function called the mother wavelet, which must satisfy conditions that are less stringent than orthogonality conditions. Let φ j a wavelet, the relation: Where t j are the translations factors and d j is the dilation factors ∈R. The family of functions generated by ℧ can be defi ned as: A family ℧ is said to be a frame of L 2 (R) if there exist two constants c > 0 and c < ∞ such that for any square integrable function f the following inequalities hold: Where φ j ∈ ℧, ∥ f ∥ denotes the norm of function f and <φ j ,f> the inner product of functions. Families of wavelet frames of L 2 (R) are univer-sal approximators in Zhang and Benveniste (1992); Pati and Krishnaprasad (1993). In this work, we will show that wavelet frames allow practical implementation of multidimensional wavelets. This is important when considering problems of large input and output dimension. For the modeling of multi-variable processes, such as, the artifi cial neural networks biologically plausible, multidimensional wavelets must be defi ned. In the present work, we use multidimensional wavelets constructed as linear combination of sigmoid, denominated Polynomial Powers of Sigmoid Wavelet (PPS-wavelet).

Sigmoidal wavelet functions
In Funahashi (1989) is showed that: Let s(x) a function diferent of the constant function, limited and monotonically increase. For any 0 < α < ∞ the function created by the combination of sigmoid is described in Equation 1: where g(x) ∈ L 1 (R), i.e, in particular, the sigmoid function satisfi es this property. Using the property came from the Equation 1, in Pati and Krishnaprasad (1993) boundary suggests the construction of wavelets based on addition and subtraction of translated sigmoidal, which denominates wavelets of sigmoid. In the same article show a process of construction of sigmoid wavelet by the substitution of the function s(x) by Υ(qx) in the Equation 1. So, the Equation 2 is the wavelet function created in Pati and Krishnaprasad (1993).
where r > 0. By terms of sigmoid function, the Equation 2, ψ(x) is given by: where q > 0 is a constant that control the curve of the sigmoid function and α and r ∈ R > 0.
Pati and Krishnaprasad demonstrated that the function ψ(x) satisfi es the admissibility condition for wavelets by Daubechies (1992); Chui (1992). The Fourier Transform of the function ψ(x) is given by the Equation 4: In particular, we accepted for analysis and practical applications the family of sigmoid wavelet generated by the parameters q = 2 and α = r, as example. So, the Equation 3 can be rewritten the following form: where m = α + r. Following, partially, this research line, we present in the next section a technique for construction of wavelets based on linear combination of sigmoid powers.

Polynomial powers of Sigmoid
The Polynomial Powers of Sigmoid (PPS) is a class of functions that have been used in recent years to solve a wide range of problems related to image and signal processing in Marar (1997). Let Υ∶ R → [0,1] be a sigmoid function defi ned by Υ(x)= 1/(1+e -x ). The n th -power of the sigmoid function is a function Let θ be a set of all power of functions defi ned by (6): An important aspect is that the power these functions, still keeps the form of the letter S. Looking the form created by the power functions of sigmoid, suppose that the n th power of the sigmoid function to be represented by the following form: where a n ,a 1 ,a 2 ,…,a n are some integer values. The extension of the sigmoid power can be viewed like lines of a Pascal's triangle. The set of function written by linear combination of polynomial powers of sigmoid is defi ned as PPS function. The degree of the PPS is given by the biggest power of the sigmoid terms.

113
Multidimensional wavelet neural networks based on polynomial powers of sigmoid: a framework to image verifi cation

Echoes
DATJournal v.1 n.2 2016 this limit tends to the second derivative of the function is given on PPS terms by: where we denominated φ 2 (x) the fi rst wavelet the sigmoid function. The others derivatives, begin on the second, we considered true by derivative property by Fourier Transform in Marar (1997). The successive derivation process of sigmoid functions, allowed joining a family of wavelets polynomial functions. Among many applications for this family of PPS-wavelets, special one is that those functions can be used like activation functions in artifi cial neurons. The following results correspond to the the analytical functions for the elements φ 3 (x) and φ 4 (x) that are represented by:

Estimating the coeffi cients of PPS-wavelets
Considering j the number of wavelets that are to be defi ned, the algorithm below calculates a matrix of integer values that estimates the coefi cients of the PPS-wavelets.
Step 1: Initialization The initial values are considered only auxiliary variables. The matrix of value associated with the process of wavelet construction is obtained from the second row. Step 2: Calculate the coeffi cient of the PPS of the highest degree Step 3: Calculate the coeffi cients of the remaining terms of the polynomial Step 4: Calculate the coeffi cients of the fi rst power variable It is important to notice that steps 2 and 3 are cascaded by an inherent dependence on variable n. By proceeding in above way, a family of polynomial wavelets are generated.

PPS Wavelet neural network
Let us consider the canonical structure of the multidimensional PPS-wavelet neural network (PPS-WNN), as shown in Figure 2.
For the PPS-WNN in Figure 2, when a input pattern X=(x 1 ,x 2 ,…,x m ) T is applied at the input of the network, the output of the i th neuron of output layer is represented as a function approximation problem, i.e., f∶ R m →[0,1] n , given by: where p is number of hidden neurons, Υ(⋅) is sigmoid function, φ(•) is the PPS-wavelet, w (2) are weight between the hidden layer to the output layer,w (1) are weights between the input to the hidden layer, d are dilation factors and t are translation factors of the PPS-wavelet, b (1) and b (2) are bias factors of the hidden layer and output layer, respectively.
The PPS-WNN contains PPS-wavelets as the activation function in the hidden layer ( Figure 3) and sigmoid function as the activation function in the output layer (Figure 4). The output of the j th PPS-wavelet hidden neuron (Figure 3) is given by: where The output of the i th output layer neuron ( Figure 4) is given by: where The adaptive parameters of the PPS-WNN consist of all weights, bias, translations and dilation terms. The sole purpose of the training phase is to determine the "optimum" setting of the weights, bias, translations and dilation terms so as to minimize the difference between the network output and the target output. This difference is referred to as training error where the n is the dimension of output space, s is the number of training input patterns.
The most popular and successful learning method for training the multilayer perceptrons is the back propagation algorithm. The algorithm employs an iterative gradient descendent method of minimization which minimizes the mean squared error (L 2 norm) between the desired output (y i ) and network output (o i ). From Equations (18) and (19), we could deduce the partial derivatives of the error to each PPS-wavelet neural network parameter's, which is given by:

Algorithm to PPS wavelet neural network
In this section, the learning algorithm to the PPS-wavelet neural network is proposed by using the back propagation method. Where the initialization procedures, attribute random values on [0,1] to the parameters. However, improvements in the initialization process have been proposed by the selection of basic functions PPS-wavelet in de Queiroz and Marar (2007).

Human face verifi cation
Systems based on biometric characteristics, such as face, fi ngerprints, geometry of the hands, iris pattern and others have been studied with attention. Face verifi cation is a very important of these techniques because through it nonintrusive systems can be created, which means that people can be computationally identifi ed without their knowledge. This way, computers can be an effective tool to search for missing children, suspects or people wanted by the law. Mathematically speaking, human face verifi cation problem can be formulated as function approximation problems and from the viewpoint of artificial neural networks these can be seen as the problem of searching for a mapping that establishes a relationship from an input to an output space through a process of network learning.
This study presents a system for detection and extraction of faces based on the approach presented in Lin and Fan (2001), which consists of finding isosceles triangles in an image, as the mouth and eyes form that geometric figure when linked by lines. In order for these regions to be determined, the images must be converted into binary images, thus the vertices of the triangles must be found and a rectangle must be cut out around them so that their size can be brought to normal and the area can be fed into a second part of the system that will analyze whether or not it is a real face. Three different approaches are tested here: A weighing mask is used to score the region, proposed by Lin and Fan (2001), a classical MLP back propagation (MLP-BP) and PPS-wavelet neural network, for the analysis to be performed.

Image treatment
First the image was read with the purpose of allocating a matrix in which each cell indicates the level of brightness of the correspondent pixel; then, it is converted into a binary matrix by means of a Threshold parameter T, because the objects of interest in our case are darker than the background. This stage changes to 1 (white) a brightness level greater than T and to 0 (black). In most of the cases, due to noise and distortion in the input image, the result of the binary transformation can bring a partition image and isolated pixels. Morphologic operations -opening followed by closing -are applied with the purpose of solving or minimizing this problem by Gonzalez and Woods (2002). The Figure 8 shows the result of these operations.

Segmentation of potential face regions
After binarization the task is finding the center of three 4-connected components that meet the following characteristics: ¶ vertex of an isosceles triangle by Lin and Fan (2001); ¶ the Euclidean distance between the eyes must be 90-100 % the distance between the mouth and the central point between the eyes by Lin and Fan (2001); ¶ the triangle base is at the top of the image.
The last restriction does not allow finding upside down faces, but it significantly reduces the number of triangles in each image, thus reducing the processing time to the following stages. For example, the numbers of triangles found in Figure 5  The opening and closing operations are vital, since it is impossible to determine the triangles without this image treatment. The processing mean time to find the results presented was 4 seconds; on the other hand, 8 hours were insufficient in an attempt at finding the same results using a Pentium 4 with 2.4 Ghz processor in Figure 5 (C).

Normalization of potential facial regions
Once the potential face regions that we have selected in the previous section are allowed to have different sizes. All regions had to be normalized to the (60x60) pixels size by bi-cubic interpolation technique, because every potential regions needs to present the same amount of infor-mation for comparison. So, normalization of a potential region can reduce the effects of variation in distance and location.

Face's pattern recognition
The purpose of this stage is to decide whether a potential face region in an image (the region extracted in the fi rst part of the process) actually contains a face. To perform this verifi cation, two methods were applied: The weighting mask function, described by Lin and Fan (2001) and PPS-wavelet neural network.

The weighting mask function
The function Weighting Mask, according to the author, it is based on the following idea: If the normalized potential region is really contains a face, it should have high similarity to the mask that is formed by 10 binary training faces (Mask Generation). Every normalized potential facial region is applied into the weighting mask function that is used to compute the similarity between the normalized potential facial region and the mask. The computed value can be used in deciding whether a potential region contains a face or not.

Mask generation
The mask was created using 10 images. The fi rst fi ve are pictures of females and the others are pictures of males. All of them were manually segmented, binarized, normalized, morphologically treated (opening and closing) and then the sum of the correspondent cell of each image was stored in the 11 th matrix. Finally, that matrix was binarized with another Threshold T, for which values lower than or equal to T were replaced by 0, and the others by 1. The result was improved with T=4. Whereas at lower values the areas of the eyes and mouth become too big, at higher values these areas almost disappear. In both cases, determining the triangles is considerably diffcult. The algorithm used to decide whether a potential face (R) contains a real face is based on the idea that the binary image of a face is highly similar to that of the mask.
A set experimental results demonstrates that the threshold values should be a set between face 3400 ≤ p ≥ 6800 by Marar et al. (2004).

PPS-wavelet neural network
In order to demonstrate the effi ciency of the proposed model. Two PPS-WNNs, one with the activation function φ 2 (•) and the other with φ 5 (•) in the hidden layer, were implemented to analyze when a potential face region really contains a face. However, the raw data face, (60 x 60) pixels, cannot be used directly for the training the networks because the features are deep hidden. Therefore, we used the Principal Components Analysis (PCA) method to create a face space that represents all the faces using a small set of components Marar (1997). For this purpose we consider the fi rst 15 components as the extracted features or face space. In that case study, 100 manually segmented faces (50 women and 50 men) and more 40 non-face random images were used to network training. Therefore, the PPS-WNNs and classical MLP-BP architectures with 15 units in the input layer, with 16 PPS-wavelet neurons in the hidden layer and with 2 neurons in the output layer were designed and trained. Here, in the output layer, we represented face by the vector (1, 0) and non-face by the vector (0, 1). We used, as test, the same regions (R) applied to the previous method.

Face verifi cations results
Several tests were performed to determine an ideal threshold value for the conversion of the images into binary fi gures. In a scale from 0 (black) to 1 (white), 0.38 was empirically determined as a good value to most of the images, but to darker images 0.22 was a better value. The test was done through the use of 100 images (50 male and 50 female) with two different threshold values from Department (2003). The results are shown in Table 1.

Conclusion
Neural networks and wavelet transform have been recently seen as attractive tools for developing effi cient solutions for many real world problems in function approximation. The combination of neural networks and wavelet transform gives rise to an interesting and powerful technique for function approximation referred to as wavenets. Function approximation is a very important task in environments where computation has to be based on extracting information from data samples in the real world processes. So, mathematical model is a very important tool to guarantee the development of the neural network area.