Asymptotic properties of estimators in regression models. Baran Sándor

Hasonló dokumentumok
Statistical Inference

Construction of a cube given with its centre and a sideline

Cluster Analysis. Potyó László

Correlation & Linear Regression in SPSS

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Hypothesis Testing. Petra Petrovics.

On The Number Of Slim Semimodular Lattices

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Nonparametric Tests

Correlation & Linear Regression in SPSS

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Regression

Statistical Dependence

Performance Modeling of Intelligent Car Parking Systems

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Nonparametric Tests. Petra Petrovics.

A logaritmikus legkisebb négyzetek módszerének karakterizációi

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Linear. Petra Petrovics.

Ensemble Kalman Filters Part 1: The basics

Local fluctuations of critical Mandelbrot cascades. Konrad Kolesko

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Factor Analysis

STATISTICAL ANALYSIS OF HIDDEN MARKOV MODELS

Discussion of The Blessings of Multiple Causes by Wang and Blei

Genome 373: Hidden Markov Models I. Doug Fowler

Bevezetés a kvantum-informatikába és kommunikációba 2015/2016 tavasz

Descriptive Statistics

Supporting Information

Alternating Permutations

Választási modellek 3

Pletykaalapú gépi tanulás teljesen elosztott környezetben

Schwarz lemma, the Carath eodory and Kobayashi metrics and applications in complex analysis

Characterizations and Properties of Graphs of Baire Functions

Using the CW-Net in a user defined IP network

Cashback 2015 Deposit Promotion teljes szabályzat

Phenotype. Genotype. It is like any other experiment! What is a bioinformatics experiment? Remember the Goal. Infectious Disease Paradigm

Dependency preservation

Mapping Sequencing Reads to a Reference Genome

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Angol Középfokú Nyelvvizsgázók Bibliája: Nyelvtani összefoglalás, 30 kidolgozott szóbeli tétel, esszé és minta levelek + rendhagyó igék jelentéssel

STUDENT LOGBOOK. 1 week general practice course for the 6 th year medical students SEMMELWEIS EGYETEM. Name of the student:

Modeling the ecological collapse of Easter Island

Kvantum-informatika és kommunikáció 2015/2016 ősz. A kvantuminformatika jelölésrendszere szeptember 11.

MATEMATIKA ANGOL NYELVEN

Lecture 11: Genetic Algorithms

Széchenyi István Egyetem

Kamatlábmodellek statisztikai vizsgálata. Statistical Inference of Interest Rate Models. Fülöp Erika

A rosszindulatú daganatos halálozás változása 1975 és 2001 között Magyarországon

On Statistical Problems of Discrete and Continuous Time Autoregressive Processes

EN United in diversity EN A8-0206/419. Amendment

Computer Architecture

Klaszterezés, 2. rész

Unification of functional renormalization group equations

Lopocsi Istvánné MINTA DOLGOZATOK FELTÉTELES MONDATOK. (1 st, 2 nd, 3 rd CONDITIONAL) + ANSWER KEY PRESENT PERFECT + ANSWER KEY

TestLine - Angol teszt Minta feladatsor

Néhány folyóiratkereső rendszer felsorolása és példa segítségével vázlatos bemutatása Sasvári Péter

Unit 10: In Context 55. In Context. What's the Exam Task? Mediation Task B 2: Translation of an informal letter from Hungarian to English.

ACTA ACADEMIAE PAEDAGOGICAE AGRIENSIS

ANGOL NYELVI SZINTFELMÉRŐ 2013 A CSOPORT. on of for from in by with up to at

Contact us Toll free (800) fax (800)

Gazdaságtudományi Kar. Gazdaságelméleti és Módszertani Intézet. Logistic regression. Quantitative Statistical Methods. Dr.

Csima Judit április 9.

First experiences with Gd fuel assemblies in. Tamás Parkó, Botond Beliczai AER Symposium

Nonrelativistic, non-newtonian gravity

GEOGRAPHICAL ECONOMICS B

Nemzetközi Kenguru Matematikatábor

SQL/PSM kurzorok rész

Utolsó frissítés / Last update: február Szerkesztő / Editor: Csatlós Árpádné

FÖLDRAJZ ANGOL NYELVEN

ELEKTRONIKAI ALAPISMERETEK ANGOL NYELVEN

Számítógéppel irányított rendszerek elmélete. Gyakorlat - Mintavételezés, DT-LTI rendszermodellek

Superconformal symmetry in many appearances

MATEMATIKA ANGOL NYELVEN

Erdős-Ko-Rado theorems on the weak Bruhat lattice

3. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Efficient symmetric key private authentication

PELL EGYENLETEK MEGOLDÁSA LINEÁRIS REKURZÍV SOROZATOK SEGÍTSÉGÉVEL

Can/be able to. Using Can in Present, Past, and Future. A Can jelen, múlt és jövő idejű használata

Tudományos Ismeretterjesztő Társulat

KIEGÉSZÍTŽ FELADATOK. Készlet Bud. Kap. Pápa Sopr. Veszp. Kecsk Pécs Szomb Igény

MATEMATIKA ANGOL NYELVEN

MATEMATIKA ANGOL NYELVEN MATHEMATICS

MATEMATIKA ANGOL NYELVEN

ELEKTRONIKAI ALAPISMERETEK ANGOL NYELVEN

USER MANUAL Guest user

MATEMATIKA ANGOL NYELVEN

Website review acci.hu

Supplementary materials to: Whole-mount single molecule FISH method for zebrafish embryo

MATEMATIKA ANGOL NYELVEN

MATEMATIKA ANGOL NYELVEN

Probabilistic Analysis and Randomized Algorithms. Alexandre David B2-206

Jelentkezés Ajánlólevél / Referencialevél

LIST OF PUBLICATIONS

16F628A megszakítás kezelése

Could René Descartes Have Known This?

Region in which λ(a) is included. Which region D brings good response?

már mindenben úgy kell eljárnunk, mint bármilyen viaszveszejtéses öntés esetén. A kapott öntvény kidolgozásánál még mindig van lehetőségünk

Milyen végzettség, jogosultság szükséges a pályázaton való részvételhez?

Combinatorics, Paul Erd}os is Eighty (Volume 2) Keszthely (Hungary), 1993, pp. 1{46. Dedicated to the marvelous random walk

ELEKTRONIKAI ALAPISMERETEK ANGOL NYELVEN

A jövedelem alakulásának vizsgálata az észak-alföldi régióban az évi adatok alapján


Számítógéppel irányított rendszerek elmélete. A rendszer- és irányításelmélet legfontosabb részterületei. Hangos Katalin. Budapest

Geokémia gyakorlat. 1. Geokémiai adatok értelmezése: egyszerű statisztikai módszerek. Geológus szakirány (BSc) Dr. Lukács Réka

Utasítások. Üzembe helyezés

Átírás:

Asymptotic properties of estimators in regression models doktori (PhD) értekezés Baran Sándor Debreceni Egyetem Debrecen, 2000

Ezen értekezést a Debreceni Egyetem Matematika doktori program Valószínűségelmélet és matematikai statisztika alprogramja keretében készítettem 995 2000 között és ezúton benyújtom a Debreceni Egyetem doktori Ph.D. fokozatának elnyerése céljából. Debrecen, 2000. augusztus 3.......................... Baran Sándor jelölt Tanúsítom, hogy Baran Sándor doktorjelölt 995 2000 között a fent megnevezett doktori alprogram keretében irányításommal végezte munkáját. Az értekezésben foglaltak a jelölt önálló munkáján alapulnak, az eredményekhez önálló alkotó tevékenységével meghatározóan hozzájárult. Az értekezés elfogadását javaslom. Debrecen, 2000. augusztus 3.......................... Dr. Fazekas István témavezető

- We looked for answers one and all, years flew like minutes to amaze us, by the spilt dewdrops of your blood be merciful to us Prince Jesus. George Faludy: Danse Macabre Acknowledgement Here I would like to thank all the people who have, either directly or indirectly, contributed to my dissertation. First of all, to my wife, and understanding. Ágnes and my daughter, Zsuzsanna for their love, patience To my parents for having brought me up and encouraging me. To my supervisor, Dr. István Fazekas, who acquainted me with the beauty of statistics and helped me in my proficiency with his instructions and useful advice. To my professors, Dr. Mátyás Arató, Dr. Péter Major and Dr. Gyula Pap, who widened my scope of vision and from whom I have learned and still learn a lot. To Dr. Alexander Kukush, who taught me new techniques and drew my attention to several new results. To my colleague, Dr. Márton Ispány, who had been a patient room-mate for four years. To Dr. Csaba Schneider for the above citation. Finally, to my friends Anikó József Gáll and László Erdei. Ézsiás, Márta Pintér, Boglárka Tóth, Attila Bérczes, Translated by Robin Skelton (see [6]).

- A Titkot űztük mindahányan, s az évek szálltak mint a percek, véred kiontott harmatával irgalmazz nékünk Jézus Herceg! Faludy György: A haláltánc-ballada 2 Köszönetnyilvánítás Itt szeretnék köszönetet mondani mindazoknak, akik direkt vagy indirekt módon hozzájárultak a disszertációm elkészítéséhez. Először is feleségemnek Ágnesnek és lányomnak Zsuzsannának, a szeretetükért, türelmükért és megértésükért. Szüleimnek, hogy felneveltek és bátorítottak. Témavezetőmnek, Dr. Fazekas Istvánnak, aki megismertetett a statisztika szépségeivel, előrehaladásomat pedig útmutatásával és hasznos tanácsaival segítette. Tanáraimnak, Dr. Arató Mátyásnak, Dr. Major Péternek és Dr. Pap Gyulának, akik szélesítették látókörömet és akiktől sokat tanultam és tanulok ma is. Dr. Alexander Kukushnak, aki új technikákra tanított meg és számos újdonságra hívta fel a figyelmemet. Kollégámnak, Dr. Ispány Mártonnak, mert négy évig türelmes szobatársam volt. Dr. Schneider Csabának a fenti idézet angol fordításáért. Végül pedig barátaimnak, Ézsiás Anikónak, Pintér Mártának, Tóth Boglárkának, Bérczes Attilának, Gáll Józsefnek és Erdei Lászlónak. 2 Lásd [5].

vi

Contents Preface Introduction and preliminary results 5. Measurement error models and estimators................ 5.. Functional and structural models................. 5..2 OLS estimator for linear structural models........... 6..3 The SIMEX estimator....................... 7..4 Regression calibration and semiparametric methods...... 8..5 The Lee-Sepanski estimator.................... 9..6 The deconvolution method.................... 0.2 Auxiliary results.............................. 2.2. Mixing................................ 2.2.2 Uniform law of large numbers................... 5 2 A consistent estimator for the linear model 7 2. Introduction................................. 7 2.2 The model and the naive estimator.................... 7 2.3 A new estimator for the parameter β................... 2 2.4 Consistency of the new estimator..................... 25 2.5 Simulation results............................. 29 3 An estimator based on validation data 35 3. Introduction................................. 35 3.2 The model and the estimator....................... 36 3.3 Asymptotic properties of temporal models................ 37 3.4 Results for spatial models......................... 43 3.5 Infill asymptotics for spatial models................... 45 3.6 Simulation results............................. 50 vii

viii CONTENTS 4 A new estimator in functional models 57 4. Introduction................................. 57 4.2 The model and the estimator....................... 57 4.3 Examples.................................. 59 4.4 Consistency and asymptotic normality.................. 62 4.5 Asymptotic properties of spatial models................. 7 4.6 Results for the polynomial model..................... 73 4.7 Infill asymptotics for spatial models................... 75 4.8 Simulation results............................. 78 Summary 85 Összefoglaló (Hungarian summary) 89 Bibliography 94 A List of Publications 99 B Conference talks 03

List of Figures Illustration of additive linear errors-in-variables model......... 2 3. The histograms of the estimates of the parameters in Case of Example 3.6. for small sample........................... 5 3.2 The histograms of the estimates of the parameters in Case of Example 3.6. for large sample............................ 52 3.3 The histograms of the estimates of the parameters in Case 2 of Example 3.6...................................... 53 3.4 The histograms of the estimates of the parameters in Case of Example 3.6.2..................................... 54 3.5 The histograms of the estimates of the parameters in Case 2 of Example 3.6.2..................................... 55 4. The means of the estimates of the parameters in the polynomial model 78 4.2 The mean square errors of the estimates of the parameters in the polynomial model................................ 79 4.3 The variances of the estimates of the parameters in the polynomial model 79 4.4 The means of the estimates of the parameters in the trigonometric polynomial model.............................. 80 4.5 The mean square errors of the estimates of the parameters in the trigonometric polynomial model...................... 8 4.6 The variances of the estimates of the parameters in the trigonometric polynomial model.............................. 8 ix

x LIST OF FIGURES

List of Notations g(x, β) regression function y response variable of the regression (observed) ξ explanatory variable of the regression (usually not observed) x observed explanatory variable δ error term of the regression ε measurement error η primary data for the process η η validation data for the process η Θ parameter set T parameter set where the random processes are defined P location of the primary data V location of the validation data β 0 true value of the unknown parameter β N set of natural numbers Z set of integers Z d d-dimensional integer lattice R real line R d d-dimensional Euclidean space Euclidean norm in R d l l norm in R d I identity matrix A transpose of the matrix A λ min (A) the smallest eigenvalue of the matrix A c different constants Λ cardinality of the set Λ vol(t ) volume of the set T C(A R r ) the set of R r valued continuous functions defined on A (Ω, F, P) the underlying probability space P (A) outer probability of a set A ω elementary event E expectation xi

xii LIST OF NOTATIONS E(ζ η) conditional expectation of ζ given η p the norm on L p (Ω, F, P) o P () quantity converging to zero in probability N (m, D) normal distribution with mean (vector) m and covariance (matrix) D µ ζ the mean of a random variable ζ σζ 2 the variance of a random variable ζ

Preface In statistical work one often encounters such non-standard situations when the explanatory variables of the regression models (e.g. location, temperature, pressure) can not be observed precisely. This usually yields false estimates, misleading results. As an illustration consider the following simple statistical problem. Given the linear regression model y = β () 0 + β (2) 0 ξ + δ, where ξ and δ are independent and have normal distribution with means µ ξ and 0 and variances σξ 2 and σ2 δ, respectively. On the basis of independent observations on y and ξ we want to estimate the unknown regression parameter β 0 = (β () 0, β(2) 0 ). It is well known that the ordinary least squares estimator in this case provides a consistent estimate of β 0. Now, assume that instead of ξ we observe only x that measures ξ with error, e.g. x = ξ + ε, where ε is independent of ξ and δ and has normal distribution with zero mean and variance σε 2. It is also a classical result, that in this situation the ordinary least squares estimator provides a consistent estimate not of β 0 but of (β () 0, λβ(2) 0 ), where σ2 ξ λ = σξ 2 + σ2 ε <. On Figure using a short simulation we illustrate this attenuation of the regression line to 0 caused by the measurement error. Models of the above type, i.e. when the explanatory variables (or sometimes the response variables, too) can only be observed with error are called measurement error or errors-in-variables models. These models have been studied for about half a century. A summary of results for the linear model, including some applications, is given in Fuller s textbook [2], while for the nonlinear model the monograph of Carroll, Ruppert and Stefanski [8] gives a good overview of the existing methods. This dissertation consists of four chapters. The first contains an introduction to the measurement error models and gives an overview of the estimation methods. It

2 PREFACE 4 2 0 2 4 2 0 2 Figure : Illustration of additive linear errors-in-variables model. are the true (ξ, y) data, the solid line is the least squares fit to these data. + are the observed data (x, y), the dashed line is the least squares fit. Here µ ξ =, σ 2 ξ = σ2 δ = 2, β 0 = (, 2) and σ 2 ε = 0.8. also indicates the links between the earlier results and the new results of the author. A separate section contains those auxiliary results that are used in the proofs of some of the theorems of the next three chapters. At the end of each chapter simulation results are provided to illustrate the performance of the estimators in different concrete situations. In Chapter 2 a new estimator of the unknown parameter of the linear model is considered, based on the Fourier transform of a certain weight function. This type of estimator was introduced by An, Hickernell and Zhu [2] for traditional linear regression models but their estimator occurred not to be consistent if the explanatory variables are measured with errors. Besides proving this fact we suggest an appropriate modification of the estimator considered in [2] and we prove its strong consistency. In Chapter 3 we generalize an estimator of Lee and Sepanski [3] for the case of strong mixing error terms. We prove the consistency and asymptotic normality of the estimator both for temporal and spatial observations. The properties of the estimator are also described under infill asymptotics. In Chapter 4 a new estimator of the unknown parameter of the nonlinear measure-

PREFACE 3 ment error model is introduced. This estimator is a modification of the one considered by Fazekas and Kukush [9] for the case when both the measurement error and the error of the regression are mixing and they are not necessarily independent. The consistency and asymptotic normality of the estimator is verified both for temporal and for spatial observations. We also study the properties of the estimator under infill asymptotics but only in that special case when the two error terms are mixing but independent of each other.

4 PREFACE

Chapter Introduction and preliminary results. Measurement error models and estimators.. Functional and structural models When one wants to speak about measurement error models, the starting point should be an underlying regression model for the dependent variables y i in terms of the predictors ξ i, i T. Here T is the set of points, where y i and ξ i are defined (e.g. time in the case of temporal models and location in the case of spatial models). In the next three chapters of this dissertation that contain the author s results we will deal with the regression model y i = g(ξ i, β 0 ) + δ i, i T, (..) where y i is the observed response variable, ξ i is the vector of explanatory variables and δ i is the error term of the regression. As we will deal only with parametric estimation, function g will always be assumed to be known. We remark, that this is not always the case, see e.g. [32]. By β 0 we denote the true value of the unknown parameter to be estimated. In literature one can find many other types of regression models. One of them that is quite often mentioned in errors-in-variables context is the quasilikelihood and variance function model given by E(y i ξ i = u) = g m (u, β 0 ), (..2) var(y i ξ i = u) = σ 2 0 g v(u, β 0, θ 0 ), (..3) where g m and g v are known functions and β 0, θ 0 and σ 0 denote true values of the unknown parameters β, θ and σ to be estimated. 5

6 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS The distinguishing feature between the classical regression models and the measurement error models is that in the latter instead of ξ i we can only observe x i that measures ξ i with error. These observations together with the corresponding observations on y i form the so called primary data. Usually the presence of an additive measurement error is assumed, i. e. x i = ξ i + ε i, i T, (..4) where ε i is a random error term. We mention that there are some papers dealing with multiplicative measurement error but significantly less work was done in this area. E.g. for the linear regression model Hwang [27] derived a consistent estimator of the parameters. In a more recent work of Iturria, Carroll and Firth [28] two general estimation methods are proposed for the polynomial regression. According to [2] we can distinguish between two classes of errors-in-variables models: functional models, where the response variables ξ i are non-random and structural models, where they are considered as random vectors. This is the classical approach. However, we shall remark that in [8] a slightly different classification is proposed. They call a model functional (or speak about functional modeling) if the response variables ξ i are either fixed or random, but in the latter case no or minimal assumptions are made on their distribution and structural, if the estimators are based on the distribution of ξ i s (e.g. likelihood methods). In this sense all the models studied in Chapters 2, 3 and 4 can be considered as functional, while according to the traditional approach only the model of Chapter 4 belongs to the class of functional models. In what follows we will always use the traditional classification...2 OLS estimator for linear structural models Consider first the linear structural model with additive error term y i = ξ i β 0 + δ i, x i = ξ i + ε i, i N, where ξ i, δ i and ε i are independent copies of ξ, δ and ε, respectively. Assume that the error terms δ and ε have zero mean and they are orthogonal to ξ, i.e. Eδξ = 0 and Eεξ = 0. Moreover, suppose that all the random variables under consideration have finite second moments. Here ξ β 0 is the orthogonal projection in L 2 sense of y to the linear subspace spanned by the coordinates of ξ. Thus, Eξξ β 0 = Eξy. (..5) Assume that we have n independent observations on y and ξ and let Y = (y,..., y n ) and Ξ = (ξ,..., ξ n ). If we replace in (..5) the expectations with the appropriate

.. MEASUREMENT ERROR MODELS AND ESTIMATORS 7 sample means, we obtain the ordinary least squares (OLS) estimator β n of β 0 that is the solution of n Ξ Ξ β n = n Ξ Y. (..6) The strong law of large numbers implies that if Eξξ is a regular matrix then β n estimates β 0 in a strongly consistent way. However, if we can observe only y and x, we can rewrite our model in the following form: y = x β + δ, where x β is the orthogonal projection of y to the subspace spanned by the coordinates of x. Thus, if in (..6) we substitute X = (x,..., x n ) for Ξ, where the x i s are independent observations of x (this is the so called naive approach) we obtain a strongly consistent estimator of β, denoted by β n. Usually, β does not coincide with the true parameter vector β 0 that can be seen from the argument below. By the orthogonality of y x β and x 0 = Ex(y x β ) = Ex(ξ β 0 + δ x β ) = Eξξ β 0 (Eξξ + Eεε )β + Eδε. This explains the attenuation of the regression line to 0 in the example of the Preface. A similar result can be proved for the functional model, too, in the case when n Ξ Ξ has a regular limit, as n. Despite the inconsistency of the OLS estimator of β 0 in the linear measurement error model, Gallo in his Ph.D. dissertation showed that certain linear combinations of this estimator consistently estimate the corresponding combinations of the true parameter (see [22]). His results are extended in [24], where beyond the consistency authors determine when these linear combinations of the OLS estimator are asymptotically normally distributed. To be able to correct the bias caused by the measurement error one needs either some prior information about the distribution of x or ξ or additional data sets that allow us to estimate these distributions. We mention here two types of data sets: validation data, that means that we have observations directly on ξ and on the corresponding y or x; replication data, that is we have replicates of x. The latter is often the case in chemical or biological studies. If we assume (..4), replication data can be used to estimate the variance of the measurement error...3 The SIMEX estimator For the structural models Cook and Stefanski [3] proposed a general method in the case when the variance of the error is known or at least well estimated. Their

8 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS method, called SIMEX for simulation and extrapolation, uses computer simulation to determine the effect of the measurement error on an estimator. The easiest way to show the basic idea behind the SIMEX is via the OLS estimation in the linear regression model considered in the Preface. To recall it, y = β () 0 + β (2) 0 ξ + δ, x = ξ + ε, where ξ, ε and δ are independent normal with means µ ξ, 0, 0 and variances σξ 2, σ2 ε and σδ 2 (2), respectively. The OLS estimator β n of β (2) 0 based on n independent observations on x and y converges to σξ 2β(2) 0 σξ 2 +, σ2 ε as n. Suppose that besides the data set used to calculate the OLS estimate of the slope parameter β (2) 0 we have M additional data sets with increasing measurement error variances, say ( + λ m )σε 2, where 0 = λ 0 < λ < < λ M. These data sets can be obtained using computer simulation (simulation step, see e.g. [8, 3]). The OLS (2) estimator β n,m calculated from the m th data set consistently estimates σ 2 ξ β(2) 0 σξ 2 + ( + λ m)σε 2. Now, one can plot λ m versus β(2) n,m, m =, 2,..., M and fit a regression function G(λ) on these points. Extrapolation back to λ = yields the SIMEX estimator of β (2) 0 (extrapolation step). In [7] the asymptotic distribution of the SIMEX is derived while in [39] it is applied for estimation in generalized linear mixed models for clustered data, when one of the predictors has an additive measurement error...4 Regression calibration and semiparametric methods Another general approach is the so called regression calibration method, suggested by Carroll and Stefanski [9] and Gleser [23]. The idea is quite simple, one has to replace ξ by an estimate κ(x) of E(ξ x) and then perform a standard statistical analysis. The regression function E(ξ x) can be estimated using e.g. validation or replication data. Validation data set contain either observations on (ξ, x) or on (ξ, x), where ξ is an unbiased estimate of ξ. Both in [9] and [23] the estimation is done in a parametric way, e.g. E(ξ x) is approximated by the best linear predictor of ξ given x. An alternative approach is due to Sepanski, Knickerbocker and Carroll [37] who proposed to estimate E(ξ x) by using nonparametric kernel regression of ξ (or ξ )

.. MEASUREMENT ERROR MODELS AND ESTIMATORS 9 on x based on validation or replication data. They stated results for two types of models. The first is the logistic regression model, where the response y is a binary random variable that takes values 0 and and P(y = ξ = u) = F (β () 0 + β (2) 0 u), (..7) with F (v) = ( + exp(v)). When there is no measurement error in the model, the maximum likelihood estimator of the true parameter vector β 0 = (β () 0, β(2) 0 ) based on n independent observations on y and ξ (denoted by y i and ξ i, respectively) is the solution of (, ξ i ) (y i F (β () + β (2) ξ i )). (..8) n i= The second is the quasilikelihood and variance function model (..2) (..3) (that was also the model considered by Carroll and Stefanski in [9]) where now the true value of the parameters θ and σ are assumed to be known. The quasilikelihood estimator of the parameter β based on n independent observations on y and ξ solves i= y i g m (ξ i, β) g m (ξ i, β) = 0. (..9) g v (ξ i, β, θ 0 ) The authors proved that if in (..8) and (..9) ξ i is replaced by κ(x i ), where x i s are independent observations on x and κ(x) denotes the nonparametric kernel regression of ξ (or ξ ) given x, the estimators obtained in this way converge to certain constants that are in both cases very close to the original parameter values. The also showed the asymptotic normality of these estimators. Instead of replacing ξ by an estimate of E(ξ x), Sepanski and Carroll [36] suggested to estimate the conditional mean and variance of the dependent variable y given x by using kernel regression based on validation data. They used quasilikelihood and variance function techniques based on the primary data to get the estimates of the unknown parameters. The estimators obtained in this way occurred to be asymptotically normal. A similar idea appeared earlier in [0] where the authors described a semiparametric estimation method in logistic measurement error models. For scalar explanatory variables they gave a representation for an optimal bandwidth...5 The Lee-Sepanski estimator For the model (..) Lee and Sepanski [3] showed an approach that is computationally simpler than the semiparametric methods mentioned above. They assumed that g(ξ, β) can somehow be linearized, i.e. it can be approximated by z (x)γ(β), where the vector z(x) consists of finite order polynomials or other functions of x. Hence, using validation data that in [3] consists of m independent observations of the pair

0 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS (ξ, x), one can determine the ordinary least squares estimate γ m (β) of γ(β). estimator β n,m proposed by Lee and Sepanski minimizes n i= The ( y i z(x i ) γ m (β)) 2, (..0) where the pairs (y i, x i ), i =, 2,..., n, that are independent observations of (y, x), form the primary data. We remark that the expression in (..0) is exactly the objective function of the OLS estimator of β in a model of the form y = z(x) γ m (β) + δ. The authors proved the consistency and asymptotic normality of β n,m when both n and m tend to infinity. They also formulated similar results for the case when the response y is also observed with error. In Chapter 3 of this dissertation the results of [3] are extended to dependent data. We concentrate on the case when measurement error presents only in the explanatory variables. It is assumed that both the primary and the validation data consist of observations of an underlying stochastic process or random field. This process (or field) is supposed to satisfy weak dependence conditions, more precisely it should be strong mixing (see Subsection.2. of this chapter). Consistency and asymptotic normality of the estimator is proved both for temporal and for spatial data. It is also showed that the estimator proposed by Lee and Sepanski occurs to be inconsistent in the case of infill asymptotics, i.e. when more and more spatial observations are taken from a fixed domain (see e.g. [30]). These results were published in [8]...6 The deconvolution method At the end we show a general method of constructing consistent estimators in general errors-in-variables models (both functional and structural) with additive measurement error (models of the form (..) (..4)). The roots of this method can be found in the paper of Stefanski [38] who considered nonlinear structural models with i.i.d. normal error terms. His idea is the following. Let us consider the model (..) (..4) with T = N and for simplicity we will assume that the error terms ε i, δ i are i.i.d. and ξ i are non-random explanatory variables. Suppose that if we can observe y i and ξ i, i =, 2,..., n, we have an estimator β of β 0 that minimizes a function Φ(β) = n n i= ϕ(y i, ξ i, β). Let m i denote Eϕ(y i, ξ i, β 0 ). If we assume that there exists a function ψ such that E ( ψ(y i, x i, β) y i ) = ϕ(yi, ξ i, β) (..) then this ψ is unbiased, i.e. Eψ(y i, x i, β 0 ) = m i. One can expect then that the estimator β of β 0 based on the observations y i and x i, i =, 2,... n, and defined as the

.. MEASUREMENT ERROR MODELS AND ESTIMATORS minimizer of n n i= ψ(y i, x i, β) has the same asymptotic properties (e.g. consistency, asymptotic normality) as β in the traditional regression model. We remark, that in the structural case, when y i, ξ i and x i are considered as independent observations of certain random vectors y, ξ and x, respectively, instead of (..) one has to demand E ( ψ(y, x, β) y, ξ ) = ϕ(y, ξ, β). (..2) Functions ψ satisfying (..) (or (..2)) can be found e.g. by solving the following deconvolution equation: ψ(y, ξ + z, β)p(z)dz = ϕ(y, ξ, β), (..3) for all y, ξ and β, where p denotes the density function of ε. That is why this method is often called deconvolution method. In [38] Stefanski also proposed some approximate solutions of (..3) in the case when ε follows the normal law. The idea above can also be used in the case when in the classical regression model we can estimate β 0 by an M-estimator β (that is the case considered in [38]) that is β is the solution of ϕ(y i, ξ i, β) = 0, where i= Eϕ(y i, ξ i, β) = 0, for all i. Then, in the measurement error model (..) (..4) one should estimate β 0 by the solution of ψ(y i, x i, β) = 0, i= where ψ satisfies (..) in the functional and (..2) in the structural case. In Chapter 2 of this dissertation the deconvolution method is applied for the linear structural model and for the objective function developed by An, Hickernell and Zhu [2]. The new objective function obtained in this way yielded a strongly consistent estimator of the unknown parameter. These results are included in [6]. What concerns the functional case, Kukush and Zwanzig [29] used the idea of Stefanski [38] to get a minimum contrast estimator in an implicit functional model with i.i.d. error terms. As a special case they found a strongly consistent estimator for the model (..) (..4) in the case of one-dimensional x i and y i that had been derived from the ordinary least squares estimator. Their estimator was extended by Fazekas and Kukush [9] for vector valued x i and dependent error terms δ i and by Fazekas, Baran, Kukush and Lauridsen [7] for vector valued x i and y i and for strong mixing error terms ε i and δ i. These estimators are discussed in a more detailed form in Section 4.2 of Chapter 4.

2 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS In [7], [9] and [29] the independence of δ i and ε i for each i is a key assumption. In Chapter 4 of this dissertation, the situation of vector valued x i and y i is examined when {δ i } and {ε i } are mixing and they are not necessarily independent. Example 4.3. shows that this estimator is a generalization of the one suggested by Chan and Mak [] for the polynomial model with i.i.d. jointly normal error terms and also of its extension for the same model with arbitrary i.i.d. pairs (ε i, δ i ), i N, proposed by Cheng and Schneeweis [2]. These results are contained in [5] and in [7]..2 Auxiliary results In this section we give those auxiliary results, known from the literature, that are used in the next three chapters of this dissertation..2. Mixing Mixing processes Let η i, i N, be a (vector valued) stochastic process. First we give the definition of a measure of dependence between σ-algebras. Let (Ω, F, P) be the underlying probability space and let A and B be sub σ-algebras of F. Then the α-mixing (or strong mixing) coefficient of A and B is α(a, B) = sup P(AB) P(A)P(B). A A, B B Let M l k denote the σ-algebra generated by {η i of η i, i N, is defined by : k i l}. The α-mixing coefficient α η (n) = sup α ( M k, M k+n). k< Definition.2. A stochastic process η i, i N, is called α-mixing (or strong mixing) if lim n α η(n) = 0. Let j(t) = 2 min{k N : 2k t}. The following condition for the mixing coefficient α η (n) will be appropriate for our purposes: b(α η, t, a) = αη a/(j(t)+a) (k)(k + ) j(t) 2 <, (.2.) where a > 0. k=

.2. AUXILIARY RESULTS 3 To prove the consistency of estimators defined in Chapters 3 and 4 we have to handle the moments of partial sums of mixing processes. Lemma below gives the analogues of Rosenthal inequalities for strong mixing processes. Lemma.2.2 Let η i, i N, be a centered stochastic process (i.e. Eη i = 0, i N) that satisfies condition (.2.) and let a be a positive constant. Then i= E t η i c(t)b(αη, t, a) ( η i t+a ) t, i= if t 2, and ( t n ) t/2 E η i c(t)b(αη, t, a) max ( η i t+a ) t, ( η i 2+a ) 2, if 2 t. In particular ( t n ) t/2 E a i η i c(t)b(αη, t, a) a 2 i max ( η i t+a ) t, i n i= i= if 2 t. (Here c(t) depends on t (and on the dimension) but it does not depend on n.) i= i= The proof can be found in [4, Theorem 2, p. 26]. To verify asymptotic normality we need a central limit theorem (CLT) for mixing processes. The following CLT can be found in [26, Corollary ]. Theorem.2.3 Let η i, i N, be a centered (vector valued) stochastic process, S n = n i= η i and let Σ n denote the variance (matrix) of S n. Assume that mixing condition (.2.) is satisfied for t = 2 and with the same a > 0 Then lim sup n i= sup η i 2+a <. (.2.2) i N n cov(η i, η j ) <. i,j= If, moreover, lim n n Σ n = Σ, (.2.3) where Σ is a positive definite matrix, then as n. (Σ n ) /2 S n N (0, I),

4 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS Mixing random fields We describe here the notion of α-mixing for random fields in the sense of [4] and [25]. Let η s, s I Z d, be a random field. For Λ I let A Λ denote the σ- algebra generated by {η s : s Λ}. By ϱ we denote the max distance in Z d, i.e. for u = (u,..., u d ), v = (v,..., v d ) Z d ϱ(u, v) = max i d u i v i. Let ϱ(λ, Λ 2 ) denote the distance of Λ and Λ 2 for Λ, Λ 2 Z d and let Λ denote the cardinality of a set Λ. The α-mixing coefficient of the random field η s, s I, is defined in the following way: α η (n; u, v) = = sup{ P(AB) P(A)P(B) : A A Λ, B A Λ2, Λ u, Λ 2 v, ϱ(λ, Λ 2 ) n}. Definition.2.4 A random field η s, s I Z d, is called α-mixing if for any integers u, v 0. lim α η(n; u, v) = 0, n We need the following condition for the mixing coefficient α η. For a positive constant a, i d [α η (i;, )] a/(2+a) <. (.2.4) i= Lemma below is the spatial version of Lemma.2.2. Lemma.2.5 Let η s, s I Z d, be a centered random field that satisfies condition (.2.4). Let < t 2, a > 0 and assume that E η s t+a is finite for every s I. Then there exists a constant K depending only on t and on the mixing coefficients α η (i;, ) such that t E η s K (E η s t+a ) t/(t+a) s P for any finite set P I. s P The proof can be found in [20]. The following theorem is a CLT for mixing random fields (see [4] for stationary random fields and [25] for the general case). Theorem.2.6 (Guyon [25, Theorem 3.3.]) Let η s, s I Z d, be a centered (vector valued) random field and P n, n N, be a strictly increasing sequence of finite subsets of I. Let S n = s P n η s and denote by Σ n the variance (matrix)

.2. AUXILIARY RESULTS 5 of S n. Assume that mixing condition (.2.4) is satisfied and with the same a > 0 { η s 2+a : s I} are uniformly integrable. Then lim sup P n cov(η u, η v ) <. n u,v P n If, moreover, i d α η (i; u, v) < if u + v 4, (.2.5) i= α η (i;, ) = o(i d ) (.2.6) and lim inf λ ( min Pn ) Σ n > 0, (.2.7) n where λ min denotes the smallest eigenvalue of the matrix given as its argument, then as n. (Σ n ) /2 S n N (0, I), We remark that in [25] only the uniform boundedness of (2 + a) th moments of {η s } was demanded. But actually the stronger condition of uniform integrability is needed in order to be able to reduce by truncation the CLT to the case of uniformly bounded random vectors. This condition is obviously holds if we demand the uniform boundedness of (2 + a + λ) th moments of {η s } with λ > 0..2.2 Uniform law of large numbers To prove the uniform law of large numbers we will use the following lemma, the proof of which can be found e.g. in [9]. Lemma.2.7 Let U n (β) be random vectors, n N, β Θ, where Θ is a compact set and assume that lim n U n (β) = 0 in probability for each β Θ. Suppose that for each ε > 0 { } lim lim sup P sup U n (β ) U n (β 2 ) > ε = 0, (.2.8) l 0 n β β 2 l where P (A) means the outer probability of a set A Ω. Then lim sup U n (β) = 0 n β Θ in probability, that is for any ε > 0 } P {sup U n (β) > ε β Θ as n. 0,

6 CHAPTER. INTRODUCTION AND PRELIMINARY RESULTS Lemma below is very useful in the cases when one wants to prove convergence results using Taylor series expansion. Lemma.2.8 Let Θ be a compact set, η n (β) be a sequence of random variables, β Θ, and let a(β) : Θ R d be a function, continuous in a point β 0 of Θ. If β n β 0 in probability and η n (β) a(β) in probability uniformly in β (n ) then η n ( β n ) a(β 0 ) in probability, as n. For the proof see e.g. [7].

Chapter 2 A consistent estimator for the linear structural model 2. Introduction The linear regression model is one of the most frequently used models in statistics. For the classical model, when one can precisely observe the regressors, An, Hickernell and Zhu [2] introduced a new strongly consistent estimator of the unknown parameter, based on the Fourier transform of a symmetric weight function. In Section 2.2 of this chapter we prove that this estimator is not consistent if the explanatory variables are measured with error. In Section 2.3 an appropriate modification of the estimator considered in [2] is introduced using the deconvolution method (see Section.). In Section 2.4 the strong consistency of this new estimator is verified. The main advantage of the estimator considered here is that it is more robust than the least squares type estimators, it does not even require the existence of the mean of the measurement error. The simulation results of Section 2.5 clearly show this property. The results of this chapter are published in [6]. 2.2 The model and the naive estimator Let us consider the following model y i = ξ i β 0 + δ i, (2.2.) x i = ξ i + ε i, i N, (2.2.2) where design points ξ, ξ 2,... are i.i.d. copies of a random vector ξ, while ε i and δ i are i.i.d. random error terms that have the same distribution as the random variables ε and δ, respectively. We suppose that Eδ = 0 and the three sequences {ξ i }, {ε i } and 7

8 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL {δ i } are mutually independent. β 0 is the true value of the unknown parameter β to be estimated on the basis of observations on y and x. Assume that the parameter set is p-dimensional: β 0 Θ R p ; y i and δ i are scalar random variables, x i, ξ i, and ε i are p-dimensional vectors. For the traditional linear model y i = ξ i β 0 + δ i, i N, (2.2.3) where one can observe y i and ξ i, An, Hickernell and Zhu [2] introduced a new strongly consistent estimator and they also proved its asymptotic normality. This estimator is the following. Let w(t) be a continuous probability density kernel function on the real line that satisfies w(t) = w( t) 0, t > 0, and and let ϕ w (v) denote the Fourier transform of w, that is ϕ w (v) = e itv w(t)dt. The estimator β n of β 0 introduced in [2] is the maximum point of A n (β) = n 2 n l=s= t w(t)dt < (2.2.4) ϕ w (y l y s (ξ l ξ s ) β). (2.2.5) Here, the kernel function w(t) should be chosen in such a way that its Fourier transform has a closed form. The usual kernels are the densities of the normal distribution N (0, a 2 ), the uniform distribution on [ a, a] and the symmetric exponential distribution, that is w(t) = a 2 e a t. To adapt the above estimator for the model (2.2.) (2.2.2) it seems natural first to try the so called naive approach that is to substitute x i for ξ i in (2.2.5) and take the maximum in β of the expression obtained in this way. The following theorem shows that under certain mild conditions the naive estimator β n = arg max β A n (β), where is not consistent. A n(β) = n 2 n l=s= ϕ w (y l y s (x l x s ) β), (2.2.6)

2.2. THE MODEL AND THE NAIVE ESTIMATOR 9 Theorem 2.2. Let ϕ ε (v) and ϕ δ (v) denote the characteristic functions of ε and δ, respectively. Assume that and the kernel function w(t) satisfies If ( ϕ ε ( tβ 0 ) ϕ ε(tβ 0 ) v E ξ 2 <, E ε 2 < (2.2.7) then β n is not a consistent estimator of β 0. t 2 w(t)dt <. (2.2.8) ϕ ε (tβ0 ) ϕ ε( tβ0 ) ) ϕ δ (t)ϕ δ ( t)tw(t)dt 0, (2.2.9) v Proof. We prove the theorem indirectly. Assume that βn β 0 in probability, as n. By the definition of βn, A n (β n ) Using the Taylor series expansion of A n (β) = 0. around β 0 we get 0 = A n(β 0 ) + 2 A n( β n) 2 (β n β 0 ), (2.2.0) where β n is a point between β n and β 0 ( β n can depend on the coordinates, too). Conditions (2.2.4) and (2.2.8) ensure that we can use the dominated convergence theorem and interchange differentiation and integration, so Hence, dϕ w (v) dv = d 2 ϕ w (v) dv 2 = ite itv w(t)dt, t 2 e itv w(t)dt. A n(β) = n dϕ w n 2 dv (y l y s (x l x s ) β)(x s x l ) (2.2.) = n 2 n l=s= l=s= ite it(y l x l β) e it(ys x s β) w(t)dt(x s x l )

20 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL and 2 A n(β) 2 = d 2 ϕ w n 2 dv 2 (y l y s (x l x s ) β)(x s x l )(x s x l ) (2.2.2) = n 2 n l=s= l=s= t 2 e it(y l x l β) e it(ys x s β) w(t)dt(x s x l )(x s x l ). First, we prove that 2 A n (β) 2 is bounded in L uniformly in β. 2 A n(β) 2 t 2 w(t)dt n 2 n l=s= (x s x l )(x s x l ). (2.2.3) Conditions (2.2.7) and (2.2.8) ensure that the expectation of the right hand side of (2.2.3) is bounded that implies the result we needed. Hence, the second term of the right hand side of (2.2.0) tends to 0 in probability, as n. Let us find the limit of A n (β0). From the model equations (2.2.) (2.2.2) and (2.2.) we get A n(β 0 ) = n 2 n = l=s= ( it n ( it n ite it(δ l ε l β0) e it(δs ε s β0) w(t)dt(x s x l ) )( e itδ l e itε l β0 n l= e itδs e itε s β0 x s )w(t)dt s= )( e itδ l e itε l β0 x l n l= e itδs e itε s )w(t)dt. β0 s= The strong law of large numbers implies that the expressions in the parentheses tend to their expectations a.s., as n. Using the independence of ξ i, ε i and δ i we obtain the limits below. n n e itδ l e itε l β0 ϕ δ (t)ϕ ε ( tβ0 ), l= s= e itδs e itε s β0 x s ϕ δ ( t)exe itβ 0 ε = iϕ δ ( t) ϕ ε(tβ 0 ) v + ϕ δ ( t)ϕ ε (tβ 0 )Eξ, a.s., as n. Conditions (2.2.4) and (2.2.7) allow us to interchange the limit and

2.3. A NEW ESTIMATOR FOR THE PARAMETER β 2 the integral and this means that A n(β 0 ) ( tϕ δ (t)ϕ δ ( t) ϕ ε ( tβ0 ) ϕ ε(tβ0 ) v ϕ ε (tβ 0 ) ϕ ε( tβ 0 ) v ) w(t)dt (2.2.4) a.s., as n. Condition (2.2.9) implies that the limit of the right hand side of (2.2.0) does not equal to zero that is a contradiction. Hence βn cannot be a consistent estimator of β 0. The following example shows that despite its complicated form, condition (2.2.9) is quite a natural one. Example 2.2.2 Assume that ε has p-dimensional normal distribution with zero mean and covariance matrix D. Then, Hence, ϕ ε (tβ 0 ) = ϕ ε( tβ 0 ) = e t2 2 β 0 Dβ0. ϕ ε (tβ 0 ) v = tdβ 0 e t2 2 β 0 Dβ0. This means that the right hand side of (2.2.9) equals 2Dβ 0 t 2 e t2 β0 Dβ0 ϕ δ (t)ϕ δ ( t)w(t)dt that cannot be equal to zero if Dβ 0 is nonzero. So, if Dβ 0 0 then βn is not consistent. Conversely, as the variance of ε β 0 equals β0 Dβ 0, Dβ 0 = 0 means that ε β 0 = 0 with probability one. Hence, in this case the model (2.2.) (2.2.2) is equivalent to the traditional linear regression model y i = ξi β 0 + ε i, where y i and ξ i are observed, i =, 2,..., n. In this case, according to the results of [2], βn is a consistent estimator of β 0. 2.3 A new estimator for the parameter β To obtain a consistent estimator for the model (2.2.) (2.2.2) we assume the existence of auxiliary functions ϕ l,s w (v, β) such that E( ϕ l,s w (y l y s (x l x s ) β, β) y l, y s, ξ l, ξ s ) (2.3.) = ϕ w (y l y s (ξ l ξ s ) β),

22 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL l, s N. Now, let the estimator β = β n of β 0 be the maximum point of à n (β) = n 2 n l=s= ϕ l,s w (y l y s (x l x s ) β, β). (2.3.2) First we have to find the exact form of the auxiliary functions ϕ l,s w (v, β). As the following proposition shows, in order to do this we have to know the characteristic function of the measurement error. Proposition 2.3. Let ϕ ε (v) denote the characteristic function of the measurement error ε and assume that Then ϕ l,s w (v, β) = w(t) ϕ ε (tβ )ϕ ε ( tβ dt <, β Θ. (2.3.3) ) { eitv w(t)dt if l = s, e itv ϕ ε(tβ )ϕ ε( tβ ) w(t)dt if l s. (2.3.4) Proof. By the independence of the three i.i.d. sequences ξ i, ε i and δ i condition (2.3.) is equivalent to E( ϕ l,s w (v (ε l ε s ) β, β) = ϕ w (v) = This means that we can search for ϕ l,s w (v, β) in the form ϕ l,s w (v, β) = γ l,s (v, β, t)w(t)dt, where It is easy to see that e itv w(t)dt, v R p. (2.3.5) E( γ l,s (v (ε l ε s ) β, β, t) = e itv, v R p. γ l,s (v, β, t) = { e itv e itv ϕ εs ε l (tβ ) = if l = s, e itv ϕ ε(tβ )ϕ ε( tβ if l s, ) (2.3.6) where ϕ εs ε l (v) is the characteristic function of ε s ε l. Hence, (2.3.3) and (2.3.6) imply (2.3.4). In the following examples we give the explicit form of the auxiliary functions ϕ l,s w (v, β) for different combinations of weight functions w(t) and distributions of the measurement error. First we always give the form of the functions γ l,s (v, β, t) and then we calculate their weighted integral with respect to w(t).

2.3. A NEW ESTIMATOR FOR THE PARAMETER β 23 Example 2.3.2 Let w(t) be the density function of the normal distribution N (0, a 2 ) and assume that ε has a p-dimensional normal distribution with zero mean and covariance matrix D. Then { γ l,s e itv if l = s, (v, β, t) = e itv+t2 β Dβ if l s. An easy calculation shows that if Θ = {β R p β Dβ < 2a }, then 2 e v2 a 2 2 if l = s, ϕ l,s w (v, β) = v2 a 2 2a2 β Dβ e 2 2a 2 β Dβ if l s. Example 2.3.3 Let w(t) be the density function of the symmetric exponential distribution with parameter a > 0 and assume that the components of the measurement error ε = (ε (), ε (2),..., ε (p) ) are independent and have exponential distribution with parameters λ, λ 2,..., λ p, respectively. Hence, { e itv if l = s, γ l,s (v, β, t) = ) ( ) e ( itv + t2 β 2... + t2 β 2 p λ 2 λ if l s 2 p { e itv if l = s, = e itv (α 0 + α t 2 + + α p t 2p ) if l s, where β i denotes the ith component of β. The coefficients α i are the following functions of β, β 2,..., β p and λ, λ 2,..., λ p : α 0 =, α k = a i a i2... a ik, i <i 2< <i k p where a k = βk 2/λ2 k, k =, 2,..., p. To obtain the closed form of ϕl,s w (v, β) we need the Fourier transform of functions of the form t 2k a 2 e a t, k = 0,,..., p. Short calculation (using the characteristic function of the Γ-distribution, say) shows that Hence, ϕ l,s a t 2k e itv a t dt = a 2 2 (2k)!(a iv)2k+ + (a + iv) 2k+ (a 2 + v 2 ) 2k+. { a 2 w (v, β) = a 2 +v if l = s, 2 a 2 a 2 +v + ( ) p 2 k= α a k 2 (2k)! (a iv)2k+ +(a+iv) 2k+ if l s. (a 2 +v 2 ) 2k+ We remark that ϕ l,s w (v, β) is a real valued function.

24 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL Example 2.3.4 Let w(t) be the same as in Example 2.3.2, i.e. it is the density function of the normal distribution N (0, a 2 ), and assume that the components of the measurement error ε = (ε (), ε (2),..., ε (p) ) are independent and have symmetric exponential distribution with parameters λ, λ 2,..., λ p, respectively. Hence, e itv if l = s, γ l,s (v, β, t) = ( ) e itv λ 2 +t2 β 2 2 ( λ 2 ) λ... p +t2 β 2 2 p 2 λ if l s 2 p { e itv if l = s, = e itv ( + κ t 2 + + κ 2p t 4p ) if l s, where β i denotes the ith component of β. The coefficients κ i are the following functions of β, β 2,..., β p and λ, λ 2,..., λ p : κ i = α j α i j, 0 j p 0 i j p where α i s are the same as in Example 2.3.3. Using the characteristic function of the normal distribution we get t 2k e t2 2a 2 e itv dt = ( ) k d2k( e a 2 v 2 ) 2 2πa dv 2k = 2 2k 2 a 2k+ Γ ( 2k + 2 ) Φ ( 2k + 2, 2, a 2 ) v2, 2 k = 0,,..., p, where Γ denotes the gamma function and Φ(α, γ, v) is the Kummer function defined by the series (see [3]): Hence, ϕ l,s w (v, β) = + α γ e v2 a 2 v! α(α + ) v 2 + γ(γ + ) 2! +.... 2 if l = s, 2 + )) 2p (e (( ) k d2k a2 v 2 2 if l s. e v2 a 2 k= α k Example 2.3.5 Similarly to Example 2.3.3, let w(t) be the density function of the symmetric exponential distribution but now assume that the components of the measurement error ε = (ε (), ε (2),..., ε (p) ) are independent and have Cauchy distribution with parameters (0, ). Hence, { γ l,s e itv if l = s, (v, β, t) = e itv+2 t β l if l s. dv 2k

2.4. CONSISTENCY OF THE NEW ESTIMATOR 25 An easy calculation shows (using the characteristic function of the symmetric exponential distribution) that if Θ = {β R p β l < a 2 }, then ϕ l,s w (v, β) = { a 2 a 2 +v 2 if l = s, a(a 2 β l ) (a 2 β l ) 2 +v 2 if l s. Example 2.3.6 Let w(t) be the density function of the uniform distribution on [ a, a] and assume that the components of the measurement error ε = (ε (), ε (2),..., ε (p) ) are independent and have exponential distribution with parameters λ, λ 2,..., λ p, respectively. As the characteristic function of the uniform distribution on [ a, a] equals sin(av), from the general theory of Fourier transforms follows that Hence, a 2a a ϕ l,s w (v, β) = av t 2k e itv dt = ( ) k d2k( sin(av) ) av dv 2k, k = 0,,..., p. sin(av) av if l = s, + ( sin(av) ) p k= α k (( ) k d2k av if l s. sin(av) av dv 2k ) Example 2.3.7 Again, let w(t) be the density function of the uniform distribution on [ a, a] and suppose that the components of the measurement error ε = (ε (), ε (2),..., ε (p) ) are independent and have Cauchy distribution with parameters (0, ). Then ϕ l,s w (v, β) = sin(av) av ( ) if l = s, e 2a β l v sin(av)+2 β l cos(av) 2 β 2 l a(4 β 2 l +v2 ) if l s. 2.4 Consistency of the new estimator To prove the strong consistency of the estimator β n we adapt the ideas of [2] for the errors-in-variables case. To do this we need the lemma below that is a form of the classical Glivenko-Cantelli theorem. Lemma 2.4. Let P be a probability measure on R d (d ), P n be the empirical measure constructed by sampling from P and let D denote the set of the half planes of R d. Then, lim sup P n (D) P(D) 0 a.s. n D D Proof. The proof is an obvious consequence of Lemma II.8 and Theorem II.4 of [34].

26 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL Theorem 2.4.2 Suppose that (y i, x i ), i =, 2,..., n, fulfills model equations (2.2.) (2.2.2), the parameter set Θ is compact, the density kernel w(t) satisfies (2.2.4) and sup β Θ w(t) ϕ ε (tβ )ϕ ε ( tβ dt <. (2.4.) ) Assume, that for the function ϕ w(v, β) = e itv ϕ ε (tβ )ϕ ε ( tβ ) w(t)dt, sup ϕ w (v, β) dv < (2.4.2) β Θ v holds. Then, β n is a strongly consistent estimator of β 0. Proof. Let Fβ (x) denote the distribution function of y x β, where y = ξ β 0 + δ, and let A(β) = ϕ w(u v, β)df β (u)df β (v). Using Fubini s theorem and the independence of ξ, ε and δ, A(β) can be rewritten in the following form: A(β) = = = Ee it(y (ξ+ε) β) ϕ ε ( tβ ) Ee it(y (ξ+ε) β) ϕ ε (tβ ) Ee it(y ξ β) Ee it(y ξ β) w(t)dt ϕ w (u v)df β (u)df β (v), w(t)dt where ϕ w (v) is the Fourier transform of w(t) and by F β (v) the distribution function of y ξ β is denoted. Obviously, it is a continuous function on Θ. An, Hickernel and Zhu [2] showed that A(β 0 ) = sup A(β) > A(α) for every α β 0. β Θ

2.4. CONSISTENCY OF THE NEW ESTIMATOR 27 Now, let F β,n (v) denote the empirical distribution of the sample y i x i β, i =, 2,..., n. It is easy to show that à n (β) = n 2 = + n n l=s= n ( e it(y l y s (x l x s) β) ϕ ε (tβ )ϕ ε ( tβ ) w(t)dt ϕ ε (tβ )ϕ ε ( tβ ) ) w(t)dt ϕ w (u v, β)d F β,n (u)d F β,n (v) ϕ ε (tβ )ϕ ε ( tβ ) w(t)dt + n. Using the dominated convergence theorem, form (2.4.) follows that Ãn(β) is continuous on Θ. Let Hence, and Fβ (v) = Fβ (v + u)df β (u) and F à n (β) = A(β) = ϕ w(v, β)d F β,n(v) n β,n (v) = F β,n (v + u)d F β,n (u). ϕ w (v, β)df β (v) ϕ ε (tβ )ϕ ε ( tβ ) w(t)dt + n. Due to condition (2.4.2) we can use integration by parts, so using the Riemann- Lebesgue lemma (see e.g. [35, Theorem 7.5]) we have A(β) = Fβ (v) ϕ w (v, β) dv (2.4.3) v and à n (β) = n F β,n(v) ϕ w(v, β) dv (2.4.4) v ϕ ε (tβ )ϕ ε ( tβ ) w(t)dt + n. Let us consider the vector η = (δ, ξ, ε ). For a given v R and β Θ the event {y x β < v} = {δ + ξ (β 0 β) ε β < v}

28 CHAPTER 2. A CONSISTENT ESTIMATOR FOR THE LINEAR MODEL is equivalent to the event where D(v, β) = {z R 2p+ {η D(v, β)}, : (, (β 0 β), β )z < v}. As D(v, β) is a half plane of R 2p+, Lemma 2.4. implies that Hence, lim sup F n β,n (v) F β (v) = 0 a.s. v,β F β,n(v) Fβ (v) + ( F β,n (v + u) Fβ (v + u) ) d F β,n(u) F β (v + u)d ( F β,n (u) F β (u) ) sup F β,n (v) F β (v) d F β,n (u) v,β + ( F β,n (u v) Fβ (u v) ) dfβ (u) 2 sup F β,n (v) F β (v). v,β lim sup F n β,n (v) F β (v) = 0 a.s. Therefore, using (2.4.), (2.4.2), (2.4.3) and (2.4.4), we get v,β lim sup Ãn(β) A(β) n β lim sup n β + lim n n sup β Θ lim sup n v,β F β,n (v) F β (v) ϕ w(v, β) dv v w(t) ϕ ε (tβ )ϕ ε ( tβ dt + lim ) n n F β,n (v) F β (v) sup ϕ w (v, β) dv = 0 β v a.s. This implies that β n β 0 a.s., as n. We remark, that ϕ w (v, β) coincides with ϕl,s w (v, β) in the case when l s. In the following examples we show two cases when the conditions of Theorem 2.4.2 are fulfilled, so β n is a strongly consistent estimator of β 0.