Department: Biostatistics ![Remove this limiter [clear]](close-x.png)
14 matches in the database.
These are records: 1 - 14.

1.
Chen, Haiying.
Ranked set sampling for binary and ordered categorical variables with applications in health survey data.
Degree: PhD, Biostatistics, 2004, Ohio State University
► Ranked set sampling (RSS) is a sampling procedure that can be considerably…
(more)
▼ Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling. It involves preliminary ranking of the variable of interest to aid in sample selection. Although ranking processes for continuous variables have been studied extensively in the literature, the use of RSS in the case of a binary variable has not been investigated thoroughly. We investigate the application of RSS to estimation of a population proportion theoretically and empirically using a National Health and Nutrition Examination Survey III (NHANES III) data set. We propose the use of logistic regression to aid in the ranking of the binary variable of interest. Our results indicate that this use of logistic regression leads to substantial gains in precision for estimation of a population proportion. Further, we illustrate how data from one source can be used to construct the necessary logistic regression equation, which can, in turn, be used to estimate the relevant proportions in a second group of subjects for which the same predictor variables are available. The results indicate the extent to which the sample size required to achieve a desired precision is reduced. Balanced RSS, however, is not in general optimal in terms of variance reduction. We investigate the application of unbalanced RSS to estimation of a population proportion. In particular, Neyman allocation is shown to be optimal for this setting. Further, we provide methods to obtain estimators for the probabilities of success for the various judgment order statistics under either perfect or imperfect rankings so that Neyman allocation can be implemented. Finally, we extend the application of RSS, both balanced and unbalanced, to ordered categorical variables with the goal of estimating the probabilities of all categories. We use ordinal logistic regression to aid in the ranking of the ordinal variable of interest. We also propose an optimal allocation scheme and methods for implementing it under either perfect or imperfect rankings. The results indicate that the use of ordinal logistic regression in ranking leads to substantial gains in precision for estimation of population proportions.
Advisors/Committee Members: Stasny, Elizabeth.
Subjects: Statistics
Keywords: RSS; ranking; Neyman; set size; Neyman allocation; SRS
More Like This

3.
Ding, Jie.
Monte Carlo Pedigree Disequilibrium Test with Missing Data and Population Structure.
Degree: PhD, Biostatistics, 2008, Ohio State University
► Family-based association test is one way of mapping disease susceptibility genes by…
(more)
▼ Family-based association test is one way of mapping disease susceptibility genes by testing for association between marker genotypes and disease phenotypes in family data. Missing genotypes usually exist in real datasets. We proposed the Monte Carlo pedigree disequilibrium test (MCPDT) to test for association using general pedigree data with missing genotypes. It generates Monte Carlo samples of missing genotypes conditioned on observed genotypes and then calculates test statistics with the Monte Carlo samples. In a simulation study, it achieved better performance than other family-based association test methods. Since MCPDT uses estimates of population marker allele frequencies in the generation of Monte Carlo samples, population structure may generate bias in MCPDT statistics. To adjust for population structure in MCPDT, a Markov chain Monte Carlo algorithm was designed to infer the structure from pedigree data with multiple null markers and the inferred structure was then used in MCPDT. Simulation studies were done to evaluate the performance of this method.
Advisors/Committee Members: Lin, Shili.
Subjects: Biostatistics
More Like This

4.
Erich, Roger Alan.
Regression Modeling of Time to Event Data Using the Ornstein-Uhlenbeck Process.
Degree: PhD, Biostatistics, 2012, Ohio State University
► In this research, we develop innovative regression models for survival analysis that…
(more)
▼ In this research, we develop innovative regression models for survival analysis that model time to event data using a latent health process which stabilizes around an equilibrium point; a characteristic often observed in biological systems. Regression modeling in survival analysis is typically accomplished using Cox regression, which requires the assumption of proportional hazards. An alternative model, which does not require proportional hazards, is the First Hitting Time (FHT) model where a subject's health is modeled using a latent stochastic process. In this modeling framework, an event occurs once the process hits a predetermined boundary. The parameters of the process are related to covariates through generalized link functions thereby providing regression coefficients with clinically meaningful interpretations. In this dissertation, we present an FHT model based on the Ornstein-Uhlenbeck (OU) process; a modified Wiener process which drifts from the starting value of the process toward a state of equilibrium or homeostasis present in many biological applications. We extend previous OU process models to allow the process to change according to covariate values. We also discuss extensions of our methodology to include random effects accounting for unmeasured covariates. In addition, we present a mixture model with a cure rate using the OU process to model the latent health status of those subjects susceptible to experiencing the event under study. We apply these methods to survival data collected on melanoma patients and to another survival data set pertaining to carcinoma of the oropharynx.
Advisors/Committee Members: Pennell, Michael.
Subjects: Biostatistics; Statistics
Keywords: cancer clinical trial; cure rate model; first hitting time model; Gaussian process; mixture model; nonproportional hazards, survival analysis, Ornstein-Uhlenbeck process; random effects model
More Like This

5.
Gibellato, Marilisa Gail.
Stochastic modeling of the sleep process.
Degree: PhD, Biostatistics, 2005, Ohio State University
► The structure of sleep varies with perturbing factors such as age, pathological…
(more)
▼ The structure of sleep varies with perturbing factors such as age, pathological processes, and pharmacological agents. This can be appreciated by simple observation of the sleep patterns of individuals and electroencephalogram data. Statistical models have been proposed to track these changes, but there remains a need for a comprehensive and informative description of the sleep process. In this dissertation, I describe the sleep process in two manners using data collected from groups of younger (20-25 years of age) and older (70-79 years of age) subjects. The first model of sleep is comprised of the two stages “sleep” and “wake”. I model the times spent asleep between wakeful periods as independent observations from a generalized gamma distribution (GGD) using a maximum likelihood estimation (MLE) procedure and justify usage of a reparameterization of the GGD and the observed information to estimate the variance of the MLE’s. I next develop strategies for estimating and comparing underlying demographic group mean GGD parameters. The resulting analysis detects differences in the parameters of age and gender groups that serve as impetus to develop a “Sleep Index” based on the mean of the GGD fit to the data. The successive times of wakefulness are found to have a first order dependence structure. Although distributional fitting is limited due to censoring, I fit a GGD to the wake times greater than 2.5 minutes combined across all subjects. The wake and sleep processes are found to be independent of one another. The second model is a semi-Markov process including sleep stages 1, 2, 3, 4, wakefulness, and rapid eye movement (REM) sleep. The embedded Markov chains (MC) are characterized and shown to be non-stationary across a subject's night of sleep. I then compare the embedded MC's for the various age and gender groups across nights using general linear models to detect differences in transition probabilities. This investigation provides a comprehensive picture of the sleep process from two perspectives and yields concrete information that can be used in clinical applications. These characterizations will likely produce other meaningful measures of sleep process perturbation.
Advisors/Committee Members: Nagaraja, Haikady N.
Subjects: Statistics
Keywords: Sleep and aging; Generalized Gamma Distribution; Maximum likelihood estimation; Semi-Markov process
More Like This

6.
Gulati, Parul.
Testing for Differential Expression in Small Sample Microarray Experiments.
Degree: PhD, Biostatistics, 2010, Ohio State University
► A typical microarray experiment involves comparing expression levels of thousands of genes…
(more)
▼ A typical microarray experiment involves comparing expression levels of thousands of genes across groups or experimental conditions simultaneously. The cost of microarray chips is high and the sample sizes associated with microarray experiments are usually low. This situation creates challenges in processing and analyzing data. In this study two important steps in microarray analysis, filtering and hypothesis testing, were assessed using simulation studies and real data. We propose a filtering approach that filters out non-expressed genes as opposed to other filtering methods which intend to filter out genes that are not differentially expressed before hypothesis testing. We compare the performance of this proposed method to other two commonly used filtering methods. We also develop a novel hypothesis testing procedure which provides better parameter estimates by taking into account the functional relationship between the variances of gene variances and gene expression levels. This relationship was ignored in the methods proposed in the literature for microarray analysis. We compare the proposed testing method with three other existing methods using simulated and spike-in data.
Advisors/Committee Members: Jarjoura, David.
Subjects: Biostatistics
Keywords: microarray; filtering; non-expressed
More Like This

7.
Kelbick, Nicole DePriest.
Detecting underlying emotional sensitivity in bereaved children via a multivariate normal mixture distribution.
Degree: PhD, Biostatistics, 2003, Ohio State University
► A common theme in finite mixture problems involves a random sample taken…
(more)
▼ A common theme in finite mixture problems involves a random sample taken from a population consisting of an unknown mixture of distributions. The goal is to identify the component distributions using information from the sample. A medical example might entail clinical test results from patients whose true disease status is unknown. Another example pertains to latent class models which attempt to relate observed data to an unseen variable whose possible outcomes correspond to classes of a population. Although mixture models are conceptually appealing many obstacles arise during their application. Areas of difficulty include complicated likelihoods, lack of clearly defined hypotheses, cumbersome estimating equations and elusive asymptotic properties. Progress in the study of mixture models was hindered by these difficulties until the advent of adequate computational power and numerical methods. The topic of this thesis is motivated by a longitudinal study conducted at The Ohio State University that focused on the course of grief in children who experienced the loss of a parent. Researchers hypothesize parental loss will have a greater psychological impact on some of the children which will manifest itself over an extended period of time as an increase in the number of symptoms associated with behaviorial, anxiety, mood and other psychological disorders. A mixture model approach is used to determine whether or not such a latent group of grieving children exists. Under the null hypothesis, the bereaved children are a homogenous group and the data is assumed to have a multivariate normal distribution. The alternative hypothesis states the data follow a mixture of two multivariate normal distributions. Data patterns are exploited to develop simple models for the variance and correlation structures. Mean models are formulated to test the statistical hypotheses of interest. This approach has the benefit of reducing the number of model parameters resulting in a simplified fitting process. A consequence of the nonexclusive relationship between the mixing distribution and the model parameters is that both play a fundamental role in the development and outcome of the statistical inference procedure. Results of model fitting are reported and conclusions based on the likelihood ratio test are discussed.
Advisors/Committee Members: Verducci, Joseph S.
Subjects: Statistics
Keywords: Mixture Models; Longitudinal Data; Moment Estimation
More Like This

8.
Li, Dongmei.
Resampling-based Multiple Testing with Applications to Microarray Data Analysis.
Degree: PhD, Biostatistics, 2009, Ohio State University
► In microarray data analysis, resampling methods are widely used todiscover significantly differentially…
(more)
▼ In microarray data analysis, resampling methods are widely used todiscover significantly differentially expressed genes under different biological conditions when the distributions of test statistics are unknown. When sample size is small, however, simultaneous testing of thousands, or even millions, of null hypotheses in microarray data analysis brings challenges to the multiple hypothesis testing field. We study small sample behavior of three commonly used resampling methods, including permutation tests, post-pivot resampling methods, and pre-pivot resampling methods in multiple hypothesis testing. We show the model-based pre-pivot resampling methods have the largest maximum number of unique resampled test statistic values, which tend to produce more reliable P-values than the other two resampling methods. To avoid problems with the application of the three resampling methods in practice, we propose new conditions, based on the Partitioning Principle, to control the multiple testing error rates in fixed-effects general linear models. Meanwhile, from both theoretical results and simulation studies, we show the discrepancies between the true expected values of order statistics and the expected values of order statistics estimated by permutation in the Significant Analysis of Microarrays (SAM) procedure. Moreover, we show the conditions for SAM to control the expected number of false rejections in the permutation-based SAM procedure. We also propose a more powerful adaptive two-step procedure to control the expected number of false rejections with larger critical values than the Bonferroni procedure.
Advisors/Committee Members: Hsu, Jason.
Subjects: Biostatistics
More Like This

9.
Niu, Liang.
STATISTICAL MODELING AND ANALYSIS OF CHROMATIN INTERACTIONS.
Degree: PhD, Biostatistics, 2012, Ohio State University
► Chromatin interactions are of interest to researchers. A recent molecular technique, HiC,…
(more)
▼ Chromatin interactions are of interest to researchers. A recent molecular technique, HiC, that uses formaldehyde cross-linking and paired-end sequencing, is able to detect genome-wide chromatin interactions. HiC can be modified to study the chromatin interactions mediated by a protein of interest. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) is also a technique for this purpose. However, these methods may also generate noise (product from random collisions of DNA fragments) in addition to signal (product from real interactions). Here we proposed a mixture model to distinguish real chromatin interactions from random collisions for the data from such experiments. The model is casted into a Bayesian framework to make use of a special feature of the data and information on protein binding sites and gene promoters to reduce false positives. We also proposed models for detection of chromatin interactions with different intensities between two samples, e.g., a cancer sample and a normal sample. A simulation study demonstrated the nice performance of the above models. The model for the one sample study has also been applied on real datasets and achieved nice results.
Advisors/Committee Members: Lin, Shili.
Subjects: Biostatistics
More Like This

10.
Pan, Xueliang.
Using Structural Information in Modeling and Multiple Alignments for Phylogenetics.
Degree: PhD, Biostatistics, 2008, Ohio State University
► Phylogenetic studies are increasingly based on structural biological data and on statistical…
(more)
▼ Phylogenetic studies are increasingly based on structural biological data and on statistical formalization. That leads to the study of improved models and of extracting the maximum information from sequence data. In this research, I have proposed to incorporate the structural information in two areas that relate to phylogenetic inference: one is to use a spatial dependent substitution model for likelihood calculation in phylogenetic inference; the other is to use a gap distance measure for MSA evaluation. While the first application is to using an improved substitution models in phylogenetic inference, the second one focuses on the quality of the MSA produced by different alignment procedures. The proposed spatial dependent model was based on our observation that the amino acids close to the functional core region tend to be conservative and these on the periphery are likely subject to mutation. So we proposed a substitution model with its rate for each amino acid dependent on its distance to the catalytic active center, or the functional core of the protein. The SD model has been implemented in the framework of Bayesian hierarchical model, the posterior distribution of the model parameters and the phylogenetic inference was estimated simultaneously using the MCMC Metropolis-Hastings algorithm. The SD model has been applied to 11 enzymes that are primarily to central metabolism that are found in pecies from all Kingdoms. The SD model is much better than the urrently available substitution models in terms of fitness onsistently for all examples. Besides the modeling, we also use the structural information of the equences for MSA evaluation. The fixed alignments used in phylogenetic studies are derived in advance of phylogenetic analysis. There are many different ways to construct these alignments. The gap measurement proposed here is based on the assumption of structural superposition, and it not only evaluates the alignment quality of those sequences with structural information, but also those sequences without structural information. This measurement can be used to select a better MSA for our phylogenetic analysis. Furthermore, it may lead to improvement of the sequence alignment.
Advisors/Committee Members: Pearl, Dennis.
Subjects: Biostatistics
Keywords: Phylogenetics; Structural Information; Substitution Model; Gap Distance Score
More Like This

11.
Sun, Junfeng.
Stochastic models for compliance analysis and applications.
Degree: PhD, Biostatistics, 2005, Ohio State University
► Compliance is the extent to which a patient follows the prescribed regimen.…
(more)
▼ Compliance is the extent to which a patient follows the prescribed regimen. Good compliance is crucial in maintaining the drug concentration in the body, and is thus very important in both clinical trials and medical practice. Even though many different compliance indices have been proposed in the literature, few studies have been devoted to the study of the compliance process. There is no published systematic study of the statistical properties of these compliance indices. We utilize the information-rich electronic event monitoring (EEM) data, build realistic stochastic models to describe them, and study the statistical properties of several clinically meaningful compliance indices. For discrete compliance data, we use stationary Markov chains to model the dependence structure and empirical Bayes approach to account for the variation among patients. The indices based on discrete data are the percentage of compliant days and the percentage of doses taken. We also study several indices based on inter-dosing times: the therapeutic coverage, the delayed medication index, the premature medication index, the timing error, and the percentage of time in drug holidays. We apply Markov-dependent mixture models to describe the inter-dosing times. To construct a more biologically meaningful index of compliance, we combine the pharmacokinetic (PK) model of the drug with the inter-dosing times. We establish asymptotic normality of the various indices under the proposed models and construct hypothesis tests to compare the compliance levels of patients or different groups of patients. We illustrate our methodology through an analysis of a data set from an AIDS clinical trial.
Advisors/Committee Members: Nagaraja, Haikady N.
Subjects: Statistics
Keywords: inter-dosing; inter-dosing times
More Like This

12.
Walters, Kimberly Ann.
The Use Of Post-Intervention Data From Waitlist Controls To Improve Estimation Of Treatment Effect In Longitudinal Randomized Controlled Trials.
Degree: PhD, Biostatistics, 2008, Ohio State University
► In medicine and public health research, the randomized delayed-intervention controlled trial (RDICT),…
(more)
▼ In medicine and public health research, the randomized delayed-intervention controlled trial (RDICT), also known as a wait-listed or stepped wedge design, is commonly used to study overt, slow-acting treatments in comparison to a control condition over time. Ten RDICT designs are specified as generalizations of the motivating example, a longitudinal psychology study of a psychoeducational intervention for children with bipolar disorder. These designs vary according to number of observation occasions, time between observations, and length of delay before the control group receives treatment.Two estimators of fixed effects in separate linear mixed effects (LME) models, θ1 and θ2, are proposed to measure treatment effect based on data from an RDICT design. The LME models have a piecewise linear mean structure, allowing phases for treatment, placebo, and leveling-off effects. The treatment effect is traditionally conceptualized as the difference in slopes between the immediate treatment (IT) and pre-intervention control groups, which we call θ1. Alternately, in an RDICT design, the treatment effect can be the change in slope post-intervention in the delayed-treatment (DT) control group, called θ0. The full model, which allows these treatment effects to differ, produces the standard estimator, θ1. A reduced model, nested within the full one, forces the inter and intra treatment effects to be identical and produces the novel estimator, θ2. A simulation study was conducted to observe the relative efficiency of θ2 to θ1 as it varies over the 10 RDICT designs and 8 scenarios, which differ in size of treatment effect, intraclass correlation, and sample allocation to DT group. The best-performing and recommended RDICT design, called H2.5 with a DT:IT allocation ratio of 2:1, achieved a relative efficiency of 1.3 when the group-specific treatment effects are identical. The H2.5 design has the longest overall calendar duration of the 10 designs considered and is an extension of the design used in the motivating example study of childhood mood disorders.
Advisors/Committee Members: Verducci, Joseph.
Subjects: Behaviorial sciences; Biostatistics; Design; Health; Mental health; Psychology; Public health; Statistics; Therapy
Keywords: longitudinal method; design; randomized controlled trials; treatment effect; intervention studies; repeated measures
More Like This

13.
Yang, Jingyuan.
Likelihood Approaches for Detecting Imprinting and Maternal Effects in Family-Based Association Studies.
Degree: PhD, Biostatistics, 2010, Ohio State University
► Genomic imprinting and maternal effect are involved in many complex human diseases…
(more)
▼ Genomic imprinting and maternal effect are involved in many complex human diseases but have long been neglected in association studies. In this dissertation, we propose two likelihood approaches for detecting imprinting and maternal effects (LIME) simultaneously in family-based association studies. Since these two effects could cause similar parent-of-origin patterns in binary disease traits, it is important to incorporate both of them into the modeling to avoid lurking effects. Statistical methods that are developed to detect one of them while assuming the absence of the other will report false positives when the assumption is violated. Our first LIME approach (LIME-ped) is designed for general pedigrees with missing genotypes from prospective family-based association studies. LIME-ped formulates the probability of familial genotypes by introducing a novel concept called "conditional mating type" between marry-in founders and their non-founder spouses, and models the penetrance using a logit link. To deal with missing genotypes, LIME-ped enumerates possible unobserved genotypes and sums over the likelihoods of all compatible familial genotypes conditional on observed genotypes. Our simulation study demonstrates that: (1) LIME-ped has the correct type I error rate for testing for imprinting when maternal effect is present, or vice versa; (2) applying LIME-ped to pedigrees fully utilizes the data and achieves higher power than trimming down the pedigrees to nuclear families; (3) "filling in" the unobserved genotypes conditional on the genotypes of relatives augments the total information, leading to higher power for LIME-ped than simply excluding individuals with missing genotypes. The second LIME approach (LIME-mix) is designed for case-parent/control-parent triads studies. Since biological fathers are often hard to recruit in family-based studies, we also allow for case-mother/control-mother pairs arising from the triads with missing fathers. The approach is referred to as "LIME-mix", since the real data is a mixed sample of triads and pairs. We assume multiplicative relative risks due to variant allele effect, genomic imprinting, and maternal effect; and analytically derive the partial likelihood of the family counts with particular genotype combinations. LIME-mix can be applied to either rare or common diseases without restriction on the disease prevalence. Since the partial likelihood does not involve any nuisance parameters about mating types, LIME-mix makes no assumptions on the mating type frequencies, such as allelic exchangeability and mating symmetry, the latter of which is a necessary assumption universally made in many existing imprinting and/or maternal effects detection methods. The robustness of the proposed LIME-mix approach compared to two existing methods is demonstrated via simulation. The LIME approaches are applied to nuclear families, general pedigrees and case-control families from the Framingham Heart Study data. Several SNPs that have variant allele, imprinting and/or maternal effects are identified.
Advisors/Committee Members: Lin, Shili.
Subjects: Biostatistics
More Like This

14.
Zhao, Yonggang.
The general linear model for censored data.
Degree: PhD, Biostatistics, 2003, Ohio State University
► In survival analysis, a linear model often provides an adequate approximation to…
(more)
▼ In survival analysis, a linear model often provides an adequate approximation to the survival times and covariates after a suitable transformation. This dissertation is devoted to a systematic investigation of semiparametric regression methods for estimating the regression parameter in the context of linear regression without specifying the error distribution, where the response is right-censored. The method uses the random-sieve likelihood, which combines the benefits of semiparametric likelihood with estimating equations and constraints. A method of estimating the parameters is developed and inferential procedures based on the asymptotic distributions of the estimated regression parameters and of the profile likelihood ratios are derived. The small sample operating characteristics of the proposed method are examined via simulations and illustrated on a data set from a study of ganglioside of primary brain tumors and a data set from bone marrow transplant study. This dissertation proposes an estimation method as well as an inference procedure, for a general linear model, allowing for right-censoring and an unspecified error distribution. The proposed methodology yields an easily interpreted regression estimate, and is especially useful when the proportionality assumption doesn't hold for Cox regression models. However, how to determine the best transformation of the response or how to select the best model from the class studied are left for future work.
Advisors/Committee Members: Pearl, Dennis.
Subjects: Statistics
Keywords: Proportional hazards model; Censoring; Sieve-Likelihood; Sieve; brain tumors
More Like This