Department: Statistics ![Remove this limiter [clear]](close-x.png)
18 matches in the database.
These are records: 1 - 18.

1.
Belu, Alexandru C.
Multivariate Measures of Dependence for Random Variables and Levy Processes.
Degree: PhD, Statistics, 2012, Case Western Reserve University
► Multipoint correlations and measures of association that are applicable to random variables…
(more)
▼ Multipoint correlations and measures of association that are applicable to random variables lacking first or higher moments are becoming increasingly important. Increasing computational power as well as the proliferation of massive data, has spurred research into multipoint correlations in an attempt to measure the association between several random variables at once. At the same time, advances in stochastic processes and Levy tempered stable distributions, has introduced the need for measures of dependence which can be applied to volatile distributions lacking first or second moments. In this dissertation, we present two such measures for studying the dependence of distributions without higher moments and multipoint correlations. We present a thorough overview of copulas, Levy processes and Levy copulas, as well as introduce the two measures in question, Schweizer and Wolff’s sigma and Szekely’s distance correlation. We extend these measures to the multivariate case, and provide sample estimates for each. We take these measures and apply them to various bivariate distributions with specific properties and examine their behavior in these cases. Finally we introduce our very own measure of dependence for Levy processes, setting the groundwork for establishing a measure of association between such processes.
Advisors/Committee Members: Woyczynski, Wojbor.
Subjects: Statistics
Keywords: copulas, Levy copulas, Levy processes, measures of dependence, Szekely's distance correlation, Schweizerr and Wolff sigma, generating bivariate data, bivariate distributions, multivariate measures of association, multipoint correlations.
More Like This

2.
Cahoy, Dexter Odchigue.
Fractional Poisson Process in Terms of Alpha-Stable Densities.
Degree: PhD, Statistics, 2007, Case Western Reserve University
► The link between fractional Poisson process (fPp) and α-stable density is established…
(more)
▼ The link between fractional Poisson process (fPp) and α-stable density is established by solving an integral equation. The result is then used to study the properties of fPp such as asymptotical n-th arrival time, number of events distributions, covariance structure, stationarity and dependence of increments, self-similarity, and intermittency property. Asymptotically normal parameter estimators and their variants are derived; their properties are studied and compared using synthetic data. An alternative fPp model is also proposed.Finally, the asymptotic distribution of a scaled fPp random variable is shown to be free of some parameters; formulae for integer-order, non-central moments are also derived.
Advisors/Committee Members: Woyczynski, Wojbor A.
Subjects: Statistics
Keywords: fPp; Nν; POISSON PROCESS
More Like This

3.
Dey, Tanujit.
Prediction and Variable Selection.
Degree: PhD, Statistics, 2008, Case Western Reserve University
► Variable selection in linear regression models is an important aspect of many…
(more)
▼ Variable selection in linear regression models is an important aspect of many scientific analyses. We review several frequentist model selection techniques in the introductory chapter. Model uncertainty is one of the serious issues related to the model selection problem. One way this issue can be resolved is by using a Bayesian technique called Bayesian model averaging (BMA). In Chapter 2, we discuss BMA techniques and illustrate the ideas with examples. An often used BMA approach to model selection is based on the so-called highest posterior probability model. In Chapter 3 we discuss several asymptotic properties of this model selection technique. Under a spike and slab hierarchy we find that the highest posterior model is total risk consistent for model selection, but that it also possesses some curious properties. Most important of these is a marked underfitting in finite samples, a phenomenon well noted in the literature for Bayesian Information Criterion (BIC) related procedures, but not often associated with highest posterior model selection. We employ a rescaling of the hierarchy and show the resulting rescaled spike and slab models mitigate the effects of underfitting due a perfect cancelation of a BIC-like penalty term. By drawing upon an equivalence between the highest posterior model and the median model, we consider the issue of how to calibrate rescaled spike and slab models by looking at their posterior inclusion probabilities. In Chapter 4 we describe a new spike and slab model for model space exploration and variable selection in linear regression models. Several theoretical features are discussed to motivate the approach. An R package modelSampler has been developed and applications are presented. In Chapter 5 we present a more stable variable selection technique. We also discuss the issue of model selection uncertainty. Numerical examples are provided. Chapter 6 discussed the issue of imputing missing values without biasing variable selection and prediction. Several methods are discussed with examples and a new tree based imputation technique is proposed.
Advisors/Committee Members: Ishwaran, Dr. Hemant.
Subjects: Statistics
Keywords: posterior model
More Like This

4.
Ehrlinger, John M.
Regularization: Stagewise Regression and Bagging.
Degree: PhD, Statistics, 2011, Case Western Reserve University
► Regularized stagewise regression and bagging are general purpose machine learning methods which…
(more)
▼ Regularized stagewise regression and bagging are general purpose machine learning methods which have received wide attention because of their good empirical results. Regularized stagewise regression uses a gradient descent optimization approach to minimize a loss function. In Chapter 1, we use geometric arguments to introduce regularized stagewise. We review gradient descent optimization in Chapter 2 and introduce functional gradient descent, a general purpose regularized stagewise method which approximates the gradient with a set of basis functions. In Chapter 3, we present a linear regression implementation of functional gradient descent. We study this method in two distinct ways. In Chapter 4, we investigate the properties of the method using a critical point analysis. We explore the forward path construction, and quantify the long active set cycling between new variable entry. We establish an exact closed form expression for the number of steps before a discontinuity occurs, indicating a new variable has been selected. We also suggest variants of the procedure that may be ultra-scalable and efficient in large problems. In Chapter 5, we use geometric arguments to explore the parameter estimation of regularized stagewise regression. Using an orthonormal basis, we develop a recursive relation to estimate stagewise parameters which we extend to the original coordinate directions. We propose a novel stagewise-based variance estimation method that uses the smooth recursive parameter estimates. This method may be preferable in sparse, high-dimensional settings. We examine functional gradient descent for exponential families in Chapter 6, applying gradient descent to the logistic loss function and to an l2-loss function to develop two regularized stagewise methods for logistic regression. Bagging is a variance reduction technique which uses resampling to improve a predictor. In Chapter 7 we use a prediction error decomposition to investigate how bagging differs from the original predictor, and the theoretical, ideal bagging. We examine how bagging reduces prediction error and when it can increase it. In Chapter 8, we investigate the aggregation of predictors. With an empirical study, we find that bagging improvements are primarily a consequence of randomizing the model selection. This result also indicates an enhancement to bagging which improves prediction error performance over other bagging procedures.
Advisors/Committee Members: Ishwaran, Hemant.
Subjects: Statistics
Keywords: regularization; linear regression; machine learning; feature selection; LARS; lasso; gradient descent; regularized stagewise; bagging; out-of-bag
More Like This

5.
Fan, Yiying.
Covariance estimation and application to building a new control chart.
Degree: PhD, Statistics, 2010, Case Western Reserve University
► Development of new methods of statistical process control (SPC) is extremely important…
(more)
▼ Development of new methods of statistical process control (SPC) is extremely important for modern surveillance applications. Typical challenges in SPC are that the data are correlated and multivariate. In this dissertation we provide a new control chart based on the approximation of the joint tail distribution of P{St/sqrt(var(St))>= x, Mt>= y } where Mt = sup u∈[0,t] Zu and St =int Zudu are supremum and cumulative sum of a continuous time process {Zt}. This new control chart is motivated from combining the merits of the Shewhart and cumulative-sum (CUSUM) control charts for process monitoring. To construct control boundaries for any control chart, the covariance function of an in-control time processes must be known or estimated. We systematically discuss and provide solutions to covariance estimation for both parametric and nonparametric models with or without stationarity when there is either single or multiple realizations of the process. We use continuous Gaussian processes with covariance functions r(s, t) = cos(s-t) for s, t in [0, pi/4 ], r(s, t) = exp[-(s-t)2] for s, t in [0, T] where T > 0 and discrete autoregressive moving average (ARMA) processes to evaluate the new control chart for both in-control and out-of-control performances in comparison to the standard Shewhart, CUSUM and exponentially weighted moving average (EWMA) control charts. It is shown through simulation that the new control chart is efficient and compares well to the standard control charts for both the in and out of control scenarios.
Advisors/Committee Members: Sun, Jiayang.
Subjects: Statistics
Keywords: covariance estimation; control chart; continuous Gaussian processes; stationarity; nonstationarity
More Like This

6.
Fares, Souha A.
Cox-Ross-Rubinstein Option Pricing Model with Dependent Jump Sizes.
Degree: PhD, Statistics, 2011, Case Western Reserve University
► Options are very important derivative securities in the financial market and the…
(more)
▼ Options are very important derivative securities in the financial market and the option pricing theory is used in most areas in finance. Numerous researchers have contributed to the theory of option pricing. Cox, Ross and Rubinstein presented a discrete time option pricing formula that has, in the limit, the notorious Black-Scholes formula. Kan extended the CRR model by representing the changes in the stock price by the sequence of random variables Xt. She assumed the Xt′s to be independent and introduced the multinomial model. In this thesis, we extend the CRR model assuming a dependency between the jump sizes of the stock price. We have chosen this approach because of its relevance to the stock market. We show the option price to have a similar expression as in the independent case. In addition, we introduce new limiting theorems using Fourier inversion method and perturbation theory of linear operators. Finally we describe a limit of the new option price.
Advisors/Committee Members: Woyczynski, Wojbor.
Subjects: Statistics
More Like This

7.
Fridline, Mark M.
Almost Sure Confidence Intervals for the Correlation Coefficient.
Degree: PhD, Statistics, 2010, Case Western Reserve University
► The dissertation develops a new estimation technique for the correlation coefficient. Although…
(more)
▼ The dissertation develops a new estimation technique for the correlation coefficient. Although this methods seems to be similar a bootstrap method, it is nearly based on a sequential sampling and sampling without replacement. This paper will emphasize the features, advantages, and applications of this new procedure. It also will explain the theoretical background and explain the necessary theory to apply this method successfully.
Advisors/Committee Members: Denker, Manfred.
Subjects: Statistics
Keywords: ASCLT, Correlation Coefficient
More Like This

8.
Konda, Sreenivas.
FITTING MODELS OF NONSTATIONARY TIME SERIES: AN APPLICATION TO EEG DATA.
Degree: PhD, Statistics, 2006, Case Western Reserve University
► A computationally efficient algorithm is presented for fitting models to a nonstationary…
(more)
▼ A computationally efficient algorithm is presented for fitting models to a nonstationary time series with an evolutionary (time-varying) spectral representation. We formally define time-varying memory process and prove that this process satisfies local stationarity definition. Our procedure segments the EEG nonstationary time series into stationary or approximately stationary blocks, with and without overlapping, and then estimates the time varying parameters using the local stationarity concept. Our estimation procedure does not make any assumptions about the distribution of innovations (data generating process). We also present a systematic procedure to separate the short memory part from the nonstationary long memory part of the test example time series using a simple frequency domain procedure. Our method is simple and efficient compared to the currently available procedures to analyze the EEG data. Using our procedure, we present a thorough analysis of the sleep EEG data of fullterm and preterm neonates. Several extensions of our method to multivariate time series are also proposed.
Advisors/Committee Members: Woyczynski, Wojbor A.
Subjects: Statistics
Keywords: Locally stationary time series; Long-memory; Time-varying spectrum; Alpha-stable processes.
More Like This

9.
Li, Xiaosong.
Testing on the Common Mean of Normal Distributions Using Bayesian Method.
Degree: PhD, Statistics, 2011, Case Western Reserve University
► Of all the problems in the statistical sciences, one of the oldest…
(more)
▼ Of all the problems in the statistical sciences, one of the oldest is the inference on a common mean of several different normal populations with unknown and probably unequal variance. There are several ways to make the inference on the common mean. The most common way is point estimation, which uses sample data to calculate a single value serving as a best guess for the unknown population mean; the second way is interval estimation, which constructs an interval of possible values of the unknown mean; and the third one is to conduct a hypothesis test which assumes all populations have the same mean as the null hypothesis. The first two types of inference are widely studied in the literature in the past, but little attention has been paid to the third type. One of the reasons may be that the test statistic(s) of the hypothesis test usually involves a complicated sampling distribution(s) which requires lots of computational resources out of reach of the ordinary researcher. With the fast development of new technology which is widely accessible on a personal computer, this is no longer an obstacle and more research has been done in this area. In their 2008 paper, Dr. Ching-Hui Chang and Dr. Nabendu Pal described several methods that can be used to test hypotheses concerning the common mean of several normal distributions with unknown variances. The methods they proposed are the likelihood ratio test (LRT), two tests based on the Graybill-Deal Estimator (GDE) and the test based on the maximum likelihood estimator (MLE). In this thesis, several procedures based on the Bayesian method are proposed, simulation studies of power and robustness of the newly proposed tests, the LRT and GDE test are performed and discussed. The new tests proposed in this thesis are either based on the assumption that the posterior distribution of the common mean µ following some specific distribution (t or normal), or is free of assumption of distribution, and is based on slice sampling (Neal, 2003), Highest Posterior Density (HPD) Method (Berger 1985) or a modified version of HPD.
Advisors/Committee Members: Williamson, Patricia.
Subjects: Statistics
Keywords: Meta-Analysis; Bayesian method
More Like This

10.
Ma, Junheng.
Contributions to Numerical Formal Concept Analysis, Bayesian Predictive Inference and Sample Size Determination.
Degree: PhD, Statistics, 2011, Case Western Reserve University
► This dissertation contributes to three areas in Statistics: Numerical Formal Concept Analysis…
(more)
▼ This dissertation contributes to three areas in Statistics: Numerical Formal Concept Analysis (nFCA), Bayesian predictive inference and sample size determination, and has applications beyond statistics. Formal concept analysis (FCA) is a powerful data analysis tool, popular in Computer Science (CS), to visualize binary data and its inherent structure. In the first part of this dissertation, Numerical Formal Concept Analysis (nFCA) is developed. It overcomes FCA's limitation to provide a new methodology for analyzing more general numerical data. It combines the Statistics and Computer Science graphical visualization to provide a pair of nFCA graphs, H-graph and I-graph, to reveal the hierarchical clustering and inherent structure among the data. Comparing with conventional statistical hierarchical clustering methods, nFCA provides more intuitive and complete relational network among the data. nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient which measures the consistency of a dendrogram to the original distance matrix. We have also applied nFCA to cardiovascular (CV) traits data. nFCA produces consistent results to the earlier discovery and provides a complete relational network among the CV traits. In the second part of this dissertation, Bayesian predictive inference is investigated for finite population quantities under informative sampling, i.e., unequal selection probabilities. Only limited information about the sample design is available, i.e., only the first-order selection probabilities corresponding to the sampled units are known. We have developed a full Bayesian approach to make inference for the parameters of the finite population and also predictive inference for the non-sampled units. Thus we can make inference for any characteristic of the finite population quantities. In addition, our methodology, using Markov chain Monte Carlo, avoids the necessity of using asymptotic approximations. Sample size determination is one of the most important practical tasks for statisticians. There has been extensive research to develop appropriate methodology for sample size determination, say, for continuous, or ordered categorical outcome data. However, sample size determination for comparative studies with unordered categorical data remains largely untouched. In terms of statistical terminology, one is interested in finding the sample size needed to detect a specified difference between the parameters of two multinomial distributions. For this purpose, in the third part of this dissertation, we have developed a frequentist approach based on a chi-squared test to calculate the required sampled size. Three improvement for the original frequentist approach (using bootstrap, minimum difference and asymptotic correction) have been proposed and investigated. In addition, using an extension of a posterior predictive p-value, we further develop a simulation-based Bayesian approach to determine the required sample size. The performance of these methods is evaluated via both a simulation study and a real application to Leukoplakia lesion data. Some asymptotic are also provided.
Advisors/Committee Members: Sun, Jiayang.
Subjects: Statistics
Keywords: nFCA
More Like This

11.
Papana, Ariadni.
Tools for Comprehensive Statistical Analysis of Microarray Data.
Degree: PhD, Statistics, 2008, Case Western Reserve University
► DNA microarrays are a widely used technology for genome-wide analysis of mRNA…
(more)
▼ DNA microarrays are a widely used technology for genome-wide analysis of mRNA levels under different experimental conditions. Monitoring developmental changes of human and non-human organisms via changes in gene expression can provide us a way of unraveling biological processes at the cellular level. However, determining true genomic differences between samples can be difficult due to the tremendous amount of noise. Analysis of microarray data includes the detection of differentially expressing genes among experimental groups, high dimensional variable selection, detection and stabilization of heterogeneity of variances and unraveling the inter-relationship between genes. The focus of this thesis is development of statistical methodology for comprehensive analysis of microarray data. Our main focus here is GeneChip Affymetrix expression arrays, a widely used technology for studying mRNA abundance. However, our methodology is applicable to all types of microarrays. Chapter 1 gives a brief introduction. In Chapter 2 and 3, two popular preprocessing methods for constructing gene expression measurements, the robust multi-array average and MAS-5.0 Affymetrix algorithm, are studied and a new set of diagnostic tools for assessing the quality of microarray data is proposed. In Chapter 4, a classification and regression tree algorithm for variance stabilization and regularization of high throughput genomic data is developed. Chapter 5 considers cross-validation (CV) and multi-fold cross-validation (MCV) for model selection and prediction error estimation. Computationally efficient expressions of CV and MCV are derived and used for the analysis of multigroup time course data. In Chapter 6, a non-parametric, data-adaptive gene hunting filter for multigroup temporal microarray data is proposed for the identification of differentially expressing profiles. Finally, in Chapter 7, local and global orthogonal smoothing via a rescaled spike and slab model is introduced. Microarrays are instrumental in answering important biological or genetical questions. Successful quantification of gene expression, identification of genetic markers, as well as measurement of gene expression changes over a variety of conditions is facilitated via the usage of microarrays. Comparisons of distinct biological groups help unravel how phenotypes associate with certain genotypes. Therefore, microarrays can be utilized for improving disease diagnosis and prognosis, for providing therapeutic choice, as well as, for drug discovery.
Advisors/Committee Members: Ishwaran, Hemant.
Keywords: genes; MICROARRAY; Spike and Slab
More Like This

12.
Piryatinska, Alexandra.
Inference for the Levy models and their application in medicine and statistical physics.
Degree: PhD, Statistics, 2005, Case Western Reserve University
► Levy processes, that is stochastic processes with time-homogeneous and independent increments, found…
(more)
▼ Levy processes, that is stochastic processes with time-homogeneous and independent increments, found numerous applications in the physical sciences, economics and engineering. In this dissertation we study specific theoretical issues related to the multiscaling properties of some special classes of Levy processes and to the kinetic equations describing time-evolution of statistical mechanical systems driven by certain Levy processes displaying, perhaps limited, fractal behavior. To be able to apply these models to real data we also develop statistical parametric estimation procedures for them. These theoretical tools are then utilized in analysis of EEG recordings for fullterm and preterm neonates. The issues of sleep stage separations and long-memory property have been also investigated for this data set.
Advisors/Committee Members: Woyczynski, Wojbor A.
Subjects: Statistics
Keywords: Smoothly truncated Levy processes,; parametric estimations, anomalous diffusion,; time series
More Like This

14.
Shi, Peipei.
ESTIMATION AND APPROXIMATION OF TEMPERED STABLE DISTRIBUTION.
Degree: PhD, Statistics, 2010, Case Western Reserve University
► Tempered stable random variables have a LePage like series representation, which was…
(more)
▼ Tempered stable random variables have a LePage like series representation, which was first introduced by Rosi¶nski. In this dissertation, we study the accuracy of the Rosi¶nski representation as determined by the convergence rates of the series. We also study estimators of parameters of certain tempered stable distributions and construct their confidence intervals. Finally, we present several simulation results for the Gamma-tempered random variable.
Advisors/Committee Members: Woyczynski, Wojbor.
Subjects: Statistics
Keywords: Tempered stable distribution, convergence rate, parameter estimation.
More Like This

15.
Snyder, Scott Alan.
Design and Modeling of a Three-Dimensional Workspace.
Degree: PhD, Statistics, 2005, Case Western Reserve University
► The FES Center, Cleveland, Ohio, conducts research into the use of implantable…
(more)
▼ The FES Center, Cleveland, Ohio, conducts research into the use of implantable medical devices designed to expand a spinal cord injured user’s workspace, and augment daily function. The research presented here is to develop and utilize statistical techniques to estimate the workspace achieved when restoring arm control. The workspace properties of interest are quantified by an experimental protocol designed to collect data to evaluate the 3-D reachable workspace and the 3-D controllable, or functional, workspace. Non-parametric and parametric strategies are developed to model the reachable workspace. Within the parametric setting superquadrics are used and confidence bounds for the shapes are presented. The controllable workspace is quantified by collecting spatial binary data, which are the success or failure of a particular task at locations within the reachable workspace. These data are modeled and checked for correspondence with the fitted model. Properties of the model are investigated. A result concerning residuals is presented, along with “jump maps”, a new technique for displaying variation across a map. In fitting models to spatial binary data, difficulties have been observed in properly capturing variance parameters from simulated datasets, when the number of binary observations is not large. Alternative algorithms and models are presented that have competing advantages. A new, promising mixture prior distribution is developed and evaluated. Finally, sequential sampling strategies for binary spatial models are developed. These competing strategies are designed to select the locations where additional observations will be sampled. In a real-time experimental setting, it is necessary to have a strategy that minimizes the amount of computation time. A new strategy is presented that minimizes the amount of computation time spent refitting the model and searching for the next point(s) to sample.
Advisors/Committee Members: Sedransk, Joseph.
Keywords: Bayesian; spatial design; binary data; workspace; variance parameters
More Like This

16.
Wang, Xiaofeng.
New Procedures for Data Mining and Measurement Error Models with Medical Imaging Applications.
Degree: PhD, Statistics, 2005, Case Western Reserve University
► In this dissertation we provide analysis strategies for two research areas: spatial-temporal…
(more)
▼ In this dissertation we provide analysis strategies for two research areas: spatial-temporal data mining and measurement error problems. Motivated by analyzing data from a "Neuromuscular Electrical Stimulation" experiment we develop an efficient procedure for mining spatial-temporal data which combines the following modern and newly developed components: data segmentation and registration, statistical smoothing mapping for identifying "activated" regions and a semiparametric model for detecting spatial-temporal similarities/trends from "large-p-small-n" data sets. For measurement error problems we provide new density and regression estimators for nonparametric errors-in-variables models. The errors can be either homogeneous or nonhomogeneous. In contrast to most existing procedures our new estimators are stable, easy to compute and do not depend on a Fourier transform. The asymptotics of the new estimators is investigated. Our procedures have the potential to become powerful new tools in the image analysis and other fields.
Advisors/Committee Members: Sun, Jiayang.
Subjects: Statistics
Keywords: Spatial-temporal data; Medical imaging; Registration; Smoothing; Measurement error models; Deconvolution; Semiparametrics
More Like This

17.
Xu, Yaomin.
New Clustering and Feature Selection Procedures with Applications to Gene Microarray Data.
Degree: PhD, Statistics, 2008, Case Western Reserve University
► Statistical data mining is one of the most active research areas. In…
(more)
▼ Statistical data mining is one of the most active research areas. In this thesis we develop two new data mining procedures and explore their applications to genetic data. The first procedure is called PfCluster - Profile Cluster Analysis. It is a clustering method designed for profiled genetic data. The PfCluster is efficient and flexible in uncovering clusters determined by a new class of biologically meaningful distance metrics. A new internal quality measure of clusters, coherence index, is developed to find coherent clusters. An efficient mechanism for choosing the threshold of coherent clusters is also derived and implemented. The threshold is based on the first and second order approximations to the true threshold under a null distribution for parallel clusters. The PfCluster has been applied to simulated data and two real data examples: a biomarker LOH dataset and a microarray gene expression dataset. PfCluster is competitive to the correlation-based clustering procedures. The second procedure is called RPselection - Resampling based partitioning selection. It is a feature selection algorithm designed for microarray studies. It selects a subset of genes that maximizes a fitness score. The fitness score measures the relevance between the partition labels from a clustering result and an external class label derived from the clinical outcomes. The score is computed using a resampling procedure. The RPselection algorithm has been applied to simulated data and a real uveal melanoma gene expression data. RPselection outperforms gene-by-gene test-based feature selection procedures. Software development is an integral part of modern statistical research. Two software packages, pfclust and rpselect, are developed in this thesis based on our PfCluster method and RPselection algorithm. Packages pfclust and rpselect are implemented based on R object-oriented programming framework, and they can be easily customized and extended by users. The ideas in our two procedures can be generalized and applied to other data mining tasks. This thesis concludes with discussion on connections between two methods and the related future research.
Advisors/Committee Members: Sun, Jiayang.
Subjects: Statistics
Keywords: Bioinformatics; coherence index; data mining; feature selection; gene expression pathway; gene profiling; informative gene; microarray data; profile cluster analysis; partitioning; regulatory network; statistical pattern recognition
More Like This

18.
Zhang, Zhongfa.
Multiple Hypothesis Testing For Finite and Infinite Test.
Degree: PhD, Statistics, 2005, Case Western Reserve University
► Multiple hypotheses testing is one of the most active research areas in…
(more)
▼ Multiple hypotheses testing is one of the most active research areas in statistics. The number of hypotheses can be finite or infinite. For a multiple hypothesis testing, an overall error criterion must be properly defined and different test procedures must be developed. In this thesis, we investigate situations of both finite and infinite hypotheses testing. Accordingly, the thesis will be roughly divided into two parts. The first part of this thesis will focus on the finite hypotheses testing. We study the False Discovery Rate (FDR) proposed by Benjamini and Hochberg in 1995, as an error criterion for a multiple testing procedure. We first attempt to find a functional relationship between FDR and the more familiar family-wise error rate (FWER) in order to study the practical aspects of the two criteria and to get a controlling procedure of one from that of the other. A few new theoretic results are then presented about FDR and based on these results, a new and “suboptimal” FDR controlling procedure is proposed. Some comparisons are made to compare the performance of the proposed procedure with that of Benjamini and Hochberg’s (1995) and Storey et al’s (2003). The procedure is then applied to a microarray data set to illustrate its application in the bioinformatics area. The second part of this thesis involves testing the equality of two curves. This type of testing problems occurs often in functional data analysis. In this part, we develop test procedures for testing if two curves measured with homoscedastic or heteroscedastic errors are equal. The method is applicable to a general class of curves that can be either specified up to some unknown parameters, or are only assumed to be smooth. The null distribution of the test statistic is derived and an approximation formula to estimate the p-value is developed, when the homoscedastic or heteroscedastic variances are either known or unknown. Simulation experiments are conducted to show how our procedures perform in finite sample situations. Application to our motivating data example from an environmental study is illustrated. The two areas are actually related. We will discuss their connections in the last chapter and propose questions for future research.
Advisors/Committee Members: Sun, Jiayang.
Keywords: m0; FDR; FWER
More Like This