Biology is characterized by complex interactions between phenotypes, such as recursive and simultaneous relationships between substrates and enzymes in biochemical systems. Structural equation models (SEMs) can be used to study such relationships in multivariate analyses, e.g., with multiple traits in a quantitative genetics context. Nonetheless, the number of different recursive causal structures that can be used for fitting a SEM to multivariate data can be huge, even when only a few traits are considered. In recent applications of SEMs in mixed-model quantitative genetics settings, causal structures were preselected on the basis of prior biological knowledge alone. Therefore, the wide range of possible causal structures has not been properly explored. Alternatively, causal structure spaces can be explored using algorithms that, using data-driven evidence, can search for structures that are compatible with the joint distribution of the variables under study. However, the search cannot be performed directly on the joint distribution of the phenotypes as it is possibly confounded by genetic covariance among traits. In this article we propose to search for recursive causal structures among phenotypes using the inductive causation (IC) algorithm after adjusting the data for genetic effects. A standard multiple-trait model is fitted using Bayesian methods to obtain a posterior covariance matrix of phenotypes conditional to unobservable additive genetic effects, which is then used as input for the IC algorithm. As an illustrative example, the proposed methodology was applied to simulated data related to multiple traits measured on a set of inbred lines.
Phenotypic traits may exert causal effects between them. For example, on the one hand, high yield in dairy cows may increase the liability to certain diseases and, on the other hand, the incidence of a disease may affect yield negatively. Likewise, the transcriptome may be a function of the reproductive status in mammals and the latter may depend on other physiological variables. Knowledge of phenotype networks describing such interrelationships can be used to predict the behavior of complex systems, e.g. biological pathways underlying complex traits such as diseases, growth and reproduction. Structural Equation Models (SEM) can be used to study recursive and simultaneous relationships among phenotypes in multivariate systems such as genetical genomics, system biology, and multiple trait models in quantitative genetics. Hence, SEM can produce an interpretation of relationships among traits which differs from that obtained with traditional multiple trait models, in which all relationships are represented by symmetric linear associations among random variables, such as covariances and correlations. In this review, we discuss the application of SEM and related techniques for the study of multiple phenotypes. Two basic scenarios are considered, one pertaining to genetical genomics studies, in which QTL or molecular marker information is used to facilitate causal inference, and another related to quantitative genetic analysis in livestock, in which only phenotypic and pedigree information is available. Advantages and limitations of SEM compared to traditional approaches commonly used for the analysis of multiple traits, as well as some indication of future research in this area are presented in a concluding section.
Structural equation models (SEMs) are multivariate specifications capable of conveying causal relationships among traits. Although these models offer insights into how phenotypic traits relate to each other, it is unclear whether and how they can improve multiple-trait selection. Here, we explored concepts involved in SEMs, seeking for benefits that could be brought to breeding programs, relative to the standard multitrait model (MTM) commonly used. Genetic effects pertaining to SEMs and MTMs have distinct meanings. In SEMs, they represent genetic effects acting directly on each trait, without mediation by other traits in the model; in MTMs they express overall genetic effects on each trait, equivalent to lumping together direct and indirect genetic effects discriminated by SEMs. However, in breeding programs the goal is selecting candidates that produce offspring with best phenotypes, regardless of how traits are causally associated, so overall additive genetic effects are the matter. Thus, no information is lost in standard settings by using MTM-based predictions, even if traits are indeed causally associated. Nonetheless, causal information allows predicting effects of external interventions. One may be interested in predictions for scenarios where interventions are performed, e.g., artificially defining the value of a trait, blocking causal associations, or modifying their magnitudes. We demonstrate that with information provided by SEMs, predictions for these scenarios are possible from data recorded under no interventions. Contrariwise, MTMs do not provide information for such predictions. As livestock and crop production involves interventions such as management practices, SEMs may be advantageous in many settings.S TRUCTURAL equation models (SEMs) (Wright 1921;Haavelmo 1943) are multivariate models that account for causal associations between variables. They were adapted to the quantitative genetics mixed-effects models settings by Gianola and Sorensen (2004). These models can be viewed as extensions of the standard multiple-trait models (MTMs) (Henderson and Quaas 1976) that are capable of expressing functional networks among traits. Gianola and Sorensen also investigated statistical consequences of causal associations between two traits when they are studied in terms of MTM parameters, expressed as functions of SEM parameters. Additionally, these authors developed inference techniques by providing likelihood functions and posterior distributions for Bayesian analysis and addressed identifiability issues inherent to structural equation modeling.The work of Gianola and Sorensen (2004) was followed by several applications of SEMs to different species and traits, such as dairy goats (de los Campos et al.
Anterior cruciate ligament (ACL) rupture is a common condition that can be devastating and life changing, particularly in young adults. A non-contact mechanism is typical. Second ACL ruptures through rupture of the contralateral ACL or rupture of a graft repair is also common. Risk of rupture is increased in females. ACL rupture is also common in dogs. Disease prevalence exceeds 5% in several dog breeds, ~100 fold higher than human beings. We provide insight into the genetic etiology of ACL rupture by genome-wide association study (GWAS) in a high-risk breed using 98 case and 139 control Labrador Retrievers. We identified 129 single nucleotide polymorphisms (SNPs) within 99 risk loci. Associated loci (P<5E-04) explained approximately half of phenotypic variance in the ACL rupture trait. Two of these loci were located in uncharacterized or non-coding regions of the genome. A chromosome 24 locus containing nine genes with diverse functions met genome-wide significance (P = 3.63E-0.6). GWAS pathways were enriched for c-type lectins, a gene set that includes aggrecan, a gene set encoding antimicrobial proteins, and a gene set encoding membrane transport proteins with a variety of physiological functions. Genotypic risk estimated for each dog based on the risk contributed by each GWAS locus showed clear separation of ACL rupture cases and controls. Power analysis of the GWAS data set estimated that ~172 loci explain the genetic contribution to ACL rupture in the Labrador Retriever. Heritability was estimated at 0.48. We conclude ACL rupture is a moderately heritable highly polygenic complex trait. Our results implicate c-type lectin pathways in ACL homeostasis.
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis.
The prediction of total egg production (TEP) potential in poultry is an important task to aid optimized management decisions in commercial enterprises. The objective of the present study was to compare different modeling approaches for prediction of TEP in meat type quails (Coturnix coturnix coturnix) using phenotypes such as weight, weight gain, egg production and egg quality measurements. Phenotypic data on 30 traits from two lines (L1, n=180; and L2, n=205) of quail were modeled to predict TEP. Prediction models included multiple linear regression and artificial neural network (ANN). Moreover, Bayesian network (BN) and a stepwise approach were used as variable selection methods. BN results showed that TEP is independent from other earlier expressed traits when conditioned on egg production from 35 to 80 days of age (EP1). In addition, the prediction accuracy was much lower when EP1 was not included in the model. The best predictive model was ANN, after feature selection, showing prediction correlations of r=0.792 and r=0.714 for L1 and L2, respectively. In conclusion, machine learning methods may be useful, but reasonable prediction accuracies are obtained only when partial egg production measurements are included in the model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.