Knowledge discovery and data mining in pavement inverse analysis

This paper describes the use of data mining tools for predicting the non-linear layer moduli of asphalt road pavement structures based on the deflection profiles obtained from non-destructive deflection testing. The deflected shape of the pavement under vehicular loading is predominantly a function of the thickness of the pavement layers, the moduli of individual layers, and the magnitude of the load. The process of inverse analysis, more commonly referred to as backcalculation, is used to estimate the elastic (Young's) moduli of individual pavement layers based upon surface deflections. A comprehensive synthetic database of pavement response solutions was generated using an advanced non-linear pavement finite-element program. To overcome the limitations associated with conventional pavement moduli backcalculation, data mining tools such as support vector machines, neural networks, decision trees, and meta-algorithms like bagging were used to conduct asphalt pavement inverse analysis. The results successfully demonstrated the utility of such data mining tools for real-time non-destructive pavement analysis.


Introduction
Since the 1960s, nondestructive deflection testing has been used to assess the structural capacity and integrity of pavement sections.For the last two decades, the predominant form of deflection testing for both project-level and network-level pavement evaluation has been the falling weight deflectometer (FWD, Fig. 1).Typically, the FWD deflection measurements are used to estimate the in-situ elastic moduli of each pavement layer as material input parameters for rehabilitation and overlay design (Alavi et al., 2008).A conventional Asphalt Concrete (AC) pavement is typically consists of three layers: a surface layer paved with AC mixture (known as surface course or wearing course), a granular base made up of relatively high-quality aggregates (base course), and a subgrade layer made up of existing soil.Sometimes, an optional subbase layer comprised of relatively low-quality aggregates is also included.The deflection of a pavement represents the combined system response of the pavement layers to an applied load.Based on this mechanical concept, the in situ moduli of individual layers can be estimated from FWD measurements through appropriate analysis methods.This procedure is referred to as pavement modulus backcalculation.The backcalculation of layer modulus of asphalt pavement has been recognized as a complex problem (Sharma, Das 2008).
In recognition of the limitations of the current American Association of State Highway and Transportation Officials (AASHTO) pavement design guide which are based on empirical regression techniques relating simple material characterizations, traffic characterization and measures of performance, mechanistic-empirical (M-E) pavement design and analysis approaches have been developed.For example, the new AASHTO pavement design guide is the new Mechanistic Empirical Pavement Design Guide (MEPDG) and its software developed through National Cooperative Highway Research Program (NCHRP) 1-37 A project (NCHRP 2004).
The mechanistic part of M-E design is the application of the engineering mechanics principles to calculate pavement responses (stresses, strains, and deflection) under loads for the prediction of the pavement performance history.The empirical nature of the M-E design stems from the fact that the laboratory-developed pavement performance models are adjusted or calibrated to the observed performance measurements (distress) from the actual pavements.With the evolution and adoption of mechanistic-empirical pavement design, the need to obtain reliable material properties has increased.Further, when new materials are being used in the rehabilitation design (such as for an asphalt concrete overlay), a combination of laboratorymeasured properties for some layers and field-derived parameters for others may result.While the field-derived parameters may be valuable, in the sense of characterizing the damaged in-situ characteristics, the values may be seemingly in conflict with the laboratory values for new materials.
The interpretation of FWD data to characterize material properties in pavement structure are carried out using empirical equations or correlation and/or the use of mechanistic based approaches.The mechanistic-based approaches under the umbrella of "backcalculation" refer to the calculation of the pavement layer properties which best describe the measured deflection in layered elastic or finite element models to represent the pavement system.The FWD backcalculation procedure involves two calculation directions, namely forward and inverse.In the forward direction of analysis, theoretical deflections are computed under the applied load and the given pavement structure using assumed pavement layer moduli.In the inverse direction of analysis, these theoretical deflections are compared with measured deflections and the assumed moduli are then adjusted in an iterative or an optimization procedure until theoretical and measured deflection basins match acceptably well.The moduli derived in this way are considered representative of the pavement response to load, and can be used to calculate stresses or strains in the pavement structure for analysis purposes.This is an iterative or an optimization method to solve the inverse problem, and will not have a unique solution for most cases.
Although several traditional and non-traditional pavement backcalculation techniques have been proposed over the years (Gopalakrishnan et al. 2010), which are briefly reviewed later, researchers are always interested in exploring advanced techniques that have the potential of more accurately characterizing pavement system responses.

Objective and Scope
The primary objective of this paper is to introduce some of the advanced data mining tools to the pavement community and examine their usefulness in solving an inverse problem encountered in the non-destructive condition evaluation of existing pavements.
Conventional three-layered flexible (asphalt) pavements are considered in this paper although the overall methodology is applicable to other pavement types.A 2-D Finite Element (FE) flexible pavement response model is used to generate a comprehensive synthetic database of pavement surface deflections corresponding to a wide range of pavement layer moduli and thicknesses.Advanced data mining tools are used to develop pavement layer moduli prediction (backcalculation) models based on deflection and thickness inputs.The predictive models are then applied to actual FWD deflection data acquired in the field to demonstrate their validity and robustness for real-time non-destructive pavement structural evaluation.The overall proposed approach described in this paper is illustrated in Figure 2.

Traditional Approaches
A number of the backcalculation approaches with software programs for flexible pavements have been developed over the years to backcalculate material properties from FWD data.
Backcalculation programs based on multilayer elastic layer theory are generally used for AC pavements.For rigid pavements, plate theory for a slab resting on a Winkler foundation or elastic solid foundation is modeled.There is no widely accepted methodology for AC overlaid PCC-type of composite pavements on a Winkler foundation.The backcalculation programs WESDEF, BISDEF, and ELSDEF are based on multilayer elastic analysis programs WESLEA, BISAR and ELSYM, respectively.These programs require the thickness, Poisson's ratio, and a seed modulus as inputs.The forward elastic layer program iterates the given seed modulus until the observed deflections match with calculated deflections.Thus, the modulus of pavement layer is highly affected by the seed modulus.Consequently, experienced engineers are required to use these backcalculation programs (Lytton, 1989).

Non-traditional Approaches
The use of a new class of computational intelligence paradigm, known as soft computing techniques, in the field of geomechanical and pavement engineering has steadily increased over the past decade owing to their ability to admit approximate reasoning, imprecision, uncertainty and partial truth (Gopalakrishnan et al., 2010).Since real-life infrastructure engineering decisions are made in ambiguous environments that require human expertise, the application of soft computing techniques has been an attractive option in pavement and geomechanical modeling.
The term "soft computing" applies to variants of and combinations under the four broad categories of evolutionary computing, artificial neural networks (ANNs), fuzzy logic, and Bayesian statistics.Although each one has its separate strengths, the complementary nature of these techniques when used in combination (hybrid) makes them a powerful alternative for solving complex problems where conventional mathematical methods fail.
Among various soft computing techniques, the interests in ANNs have been increased for use in pavement systems applications over the past 15 years (Circular, 1999).There have been several successful studies of using ANNs to predict the pavement layer moduli using the falling weight deflectometer (FWD) deflection data (Gucunski and Krstic, 1996, Khazanovich and Roesler, 1997, Kim and Kim, 1998, Meier and Rix, 1994).The NCHRP1-37A research project team in charge of developing the Mechanistic-Empirical Pavement Design Guide (MEPDG) incorporated the ANN models (Ceylan, 2002) in preparing the MEPDG concrete pavement analysis package.Recently, data mining tools are attracting attention among researchers in various fields for discovering knowledge and underlying relationships in simulated or actual data (Miradi, 2009).

Linear Regression
Linear regression probably the oldest and most widely used predictive model, which commonly represents a regression that is linear in the unknown parameters used in the fit.The most common form of linear regression is least squares fitting (Weher, 1977).

Pace Regression
It evaluates the effect of each feature and uses a clustering analysis to improve the statistical basis for estimating their contribution to overall regression.It can be shown that pace regression is optimal when the number of coefficients tends to infinity.We use a version of Pace Regression described in (Wang, 2000, Wang andWitten, 2002).

Additive Regression
It is a meta learner that enhances the performance of a regression based classifier.Each iteration fits a model to the residuals left by the classifier on the previous iteration (Friedman, 1999).The predictions of each of the learners are added together to get the overall prediction.It is generally used with Decision Stump as the base learner.

Instance-based
This is a lazy classification technique which implements nearest-neighbour classifier.It uses normalized Euclidean distance to find the training instance closest to the given test instance, and predicts the same class as this training instance (Aha and Kibler, 1991).

Conjunctive Rule
This is a rule-based learner that can predict both numeric and nominal class labels.The goal of rule induction is to induce rules from data capturing all generalizable knowledge within it, while being as small as possible (Cohen, 1995).

Decision Table
Decision table typically constructs rules involving different combinations of attributes, which are selected using an attribute selection search method.Simple decision table majority classifier (Kohavi, 1995) has been shown to sometimes outperform state-of-the-art classifiers.

Decision stump
A decision stump (Witten and Frank, 2005) is a weak tree-based machine learning model consisting of a single-level decision tree with a categorical or numeric class label.Decision stumps are usually used in ensemble machine learning techniques.

Artificial Neural Networks (ANNs)
ANNs are networks of interconnected artificial neurons, and are commonly used for non-linear statistical data modeling to model complex relationships between inputs and outputs.Several good descriptions of neural networks are available (Bishop, 1995, Fausett, 1994).

Support Vector Machines
SVMs are based on the Structural Risk Minimization (SRM) principle from statistical learning theory.A detailed description of SVMs and SRM is available in (Vapnik, 1995).In their basic form, SVMs attempt to perform classification by constructing hyperplanes in a multidimensional space that separates the cases of different class labels.It supports both classification and regression tasks and can handle multiple continuous and nominal variables.

Reduced Error Pruning Trees
REPTree (Witten and Frank, 2005) is a implementation of a fast decision tree learner.REPTree builds a decision/regression tree using information gain/variance and prunes it using reducederror pruning (with backfitting).It deals with missing values by splitting the corresponding instances into pieces.

M5 Model Trees
M5 Model Trees (Wang and Witten, 1997) are a reconstruction of Quinlan's M5 algorithm (Quinlan, 1992) for inducing trees of regression models, which combines a conventional decision tree with the option of linear regression functions at the nodes.It also uses the techniques used in CART (Breiman et al., 1984) to effectively deal with enumerated attributes and missing values.

Random SubSpace
The Random Subspace classifier (Ho, 1998) constructs a decision tree based classifier that also consists of multiple trees.It tries to achieve a balance between over fitting and achieving maximum accuracy.The algorithm maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity.

Bagging
Bagging (Breiman, 1996) is a meta-algorithm to improve the stability of classification and regression algorithms by reducing variance.Bagging is usually applied to decision tree models to boost their performance.It involves generating a number of new training sets (called bootstrap modules) from the original set by sampling uniformly with replacement.The bootstrap modules are then used to generate models whose predictions are averaged to generate the final prediction.

Theoretic Database Development
The synthetic data used in conducting pavement inverse analysis with data mining in this study were generated from a two-dimensional axi-symmetric pavement FE software developed at the University of Illinois at Urbana-Champaign (Raad and Figueroa, 1980).It incorporates stresssensitive geo-material models and has been reported to provide a more realistic representation of the flexible pavement structure and its response to loading.Numerous research studies have analyzed and validated this FE model's AC pavement structural response prediction for highway and airfield pavements (Thompson andElliott, 1985, Garg et al., 1998).
The AC surface layer was treated as linear elastic material with Young's Modulus, Eac, and Poisson ratio, μ.Stress-dependent elastic models along with Mohr-Coulomb failure criteria were applied for the unbound aggregate base and fine-grained soil subgrade layers.The (stresshardening) Kb-θ model (Hicks and Monismith, 1971) was used for the base layer (ER = Kbθn; ER is resilient modulus (psi),  is bulk stress (psi) and K and n are statistical parameters).Based on extensive testing of unbound aggregate materials, (Rada and Witczak, 1981) proposed the following relationship between K and n: Log10 (Kb) = 4.657 -1.807n.The (stress-softening) bilinear model (Thompson and Robnett, 1979) was used for the subgrade layer.
Asphalt concrete modulus Eac, granular base K- model parameter K, and the subgrade soil break point deviator stress Eri in the bilinear model were used as the layer stiffness inputs for all the different conventional flexible pavement FE simulations.The 40-kN (9-kip) wheel load was applied as a uniform pressure of 550 kPa (80 psi) over a circular area of radius 6 in.The thickness and moduli ranges used in the database generation are provided elsewhere (Ceylan et al., 2007).
A total of 30,000 FE runs were conducted by randomly choosing the pavement layer thicknesses and input variables within selected ranges to generate a knowledge database for inverse analysis using data mining tools.All the datasets were normalized within the range of 0.1 to 0.9 to facilitate learning.A scatterplot for each pair of variables (pavement layer thickness, surface deflections and layer moduli values) from the synthetic database used in data mining is displayed in a matrix arrangement and compiled in Figure 3.

Experiments and Discussion of Results: Theoretic Data
A suite of data mining tools discussed in a previous section was employed in the experimental runs using theoretic data.The goal was to identify the best-performance predictive models which could be applied on the actual field FWD data for real-time inverse analysis of pavements.The following variables define the inputs and outputs in the knowledge discovery and data mining process:  Inputs: Surface deflections (D0, D12, D24, D36, D48, and D60); AC layer thickness (Tac); and base layer thickness (Tb)  Outputs: Modulus of the AC surface layer (Eac); Modulus of the base layer (Kb); and Modulus of the Subgrade layer (Eri) Thus, data mining based backcalculation models were developed with eight input parameters and one output parameter per model.However, the unbound aggregate base layer modulus could not be predicted using just the eight inputs (deflections and thicknesses).Therefore, in the development of Kb backcalculation model, the predicted Eac and Eri were used as additional inputs along with the six FWD deflections as well as the thicknesses of the AC surface and base layer.The results for both scenarios are discussed later in the paper.
The data were divided randomly into two different subsets of the training data subset and the testing data subset in such a way that they are representative of same statistical population.Both datasets were normalized within the range of 0.1 to 0.9 for input and output values to facilitate the training process.The training data subset was used for model learning and the testing data subset was used to examine the statistical accuracy of the developed models.Further, 5-fold cross-validation was employed to increase the robustness of prediction accuracy and avoid any over-training.The R (Team, 2011) and WEKA (Hall et al., 2009) software toolkits were used in this study for data mining.
Quantitative assessments of the degree to how close the models could predict the actual outputs are used to provide an evaluation of the models' predictive performances.A multicriteria assessment with various goodness-of-fit statistics was performed using all the data vectors to test the accuracy of the trained models.The criteria that are employed for evaluation of models' predictive performances are the coefficient of correlation (R), Mean Absolute Error (MAE), and Root-Mean-Squared Error (RMSE) between the actual and predicted values.The definitions of these evaluation criteria are as follows: The values of performance statistics for the developed data mining based inverse prediction models are summarized in Figures 4 to 7, for Eac, Eri, and Kb.It is observed that excellent performance is achieved using REPTree and M5 Model trees as underlying regression algorithms with Bagging meta-learner for all three pavement layer moduli.Among the three pavement layers, the prediction accuracy for Kb is the worst as expected even after including Eac and Eri as additional inputs.This is further confirmed by the prediction error histograms plotted in Figure 8 for Eac, Kb (using Eac and Eri as additional inputs), and Eri using Bagging_M5P (Bagging meta-learning technique with M5 model trees as the base learner), for instance.The Bagging_M5P predictor was chosen as the best-performance data mining predictive technique to be used in real-time pavement inverse analysis described in the next section.Bagging_M5P models constructed on the theoretic data were applied on the actual FWD data acquired from an airport flexible pavement test section at the U.S. National Airport Pavement Test Facility (NAPTF).The selected test section is a typical conventional granular base flexible pavement resting over a medium-strength subgrade.It consists of 127-mm (5-in.)thick AC surface course, 200-mm (8-in.)thick crushed stone granular base, 307-mm (12-inch) thick granular subbase on top of the subgrade.For this analysis, the granular base and subbase layer thicknesses were combined.
A clayey material known as Dupont Clay (DPC) was used for the subgrade (target California Bearing Ratio of 8).The naturally-occurring sandy-soil material at the full-scale test site underlies the subgrade layer.Detailed information related to NAPTF flexible test sections, material properties, analysis of NDT data can be found in (Gopalakrishnan, 2004).The FWD data referenced in this paper is accessible for download at the Federal Aviation Administration (FAA) Airport Technology Website: www.airporttech.tc.faa.gov.
Nondestructive tests using the FWD equipment were conducted on the selected test section prior to traffic testing to verify the uniformity of pavement and subgrade construction and strength.Surface deflection basins from FWD tests conducted on June 14, 1999 (pavement temperature = 21.2 0 C) at nominal force amplitudes of 40-kN (9-kip) were used in this study.
For the sake of comparison, WESDEF (Cauwelaert et al., 1989), a traditional pavement inverse analysis program, was also used for backcalculating the pavement layer moduli from field FWD data.The WESDEF backcalculation program uses the WESLEA multi-layer elastic analysis program.It utilizes an iterative procedure to obtain a set of moduli that, when used in linear-elastic calculations, will produce deflections similar to the measured values.The program has the ability to backcalculate moduli values using deflections with depth, such as those obtained using Multi-Depth Deflectometers (MDDs), as well as with surface deflections.The material type, entered for each layer in the pavement structure, is used to establish the default seed modulus, minimum and maximum moduli, the Poisson's' ratio, and the interface slip values.
In WESDEF, the modulus for the stiff layer was set to 6.9 GPa (1,000,000 psi) with a Poisson's ratio of 0.50.The pavement layer moduli predicted by Bagging_M5P predictor based on field data are plotted together with those predicted by WESDEF in Figure 9.In general, the Bagging_M5P moduli predictions are consistent and agreeable with those predicted by WESDEF.Note that WESDEF assumes the subgrade to be linear elastic and requires seed moduli values to start the optimization process while Bagging_M5P considers the non-linear stress-dependent subgrade properties and employs knowledge discovery and data mining principles to find the solutions.
Irrespective of the high prediction accuracy of any developed backcalculation model, there are some major factors that can lead to erroneous results in pavement backcalculation (Irwin, 2002, Quintas andKillingsworth, 1998).For instance, major cracks in the pavement, or testing near a pavement edge can cause the deflection data to depart drastically from the assumed conditions.Pavements with cracks or various discontinuities and other such features are ill-suited for any backcalculation analysis or moduli determination.Also, layer thicknesses are not uniform in the field, nor are materials in the layers completely homogeneous.The spatial and seasonal variations of pavement layer properties in the field should also be considered.

Figure 1 .
Figure 1.Close-up of Truck-mounted Falling Weight Deflectometer (FWD) used in Non-Destructive Testing (NDT) of Pavements

Figure 2 .
Figure 2. Schematic of overall proposed approach to real-time flexible pavement inverse analysis with knowledge discovery and data mining

Figure 3 .
Figure 3. Scatterplot matrix of input and output variables in the theoretic database target and predicted modulus values corresponding to n patterns.R is a measure of correlation between the predicted and the measured values and therefore, determines accuracy of the fitting model (higher R equates to higher accuracy).The MAE and RMSE indicate the relative improvement in prediction accuracy.Relative smaller magnitudes indicate better prediction accuracy.

Figure 4 .Figure 5 .Figure 6 .
Figure 4. Summary of asphalt layer moduli (Eac) prediction performance with data mining techniques using theoretic deflection basins

Figure 7 .FrequencyFigure 8 .
Figure 7. Summary of base layer moduli (Kb) prediction performance with data mining techniques (including Eac and Eri as additional inputs) using theoretic deflection basins

Figure 9 .
Figure 9. Asphalt pavement moduli predictions with Bagging_M5P using actual FWD deflections basins acquired in the field: (a) Eac; and (b) Eri