Samβada in Uganda: landscape genomics study of traditional cattle breeds with a large SNP dataset

Authors and Affiliations:

Sylvie Stucki (1), Pablo Orozco-terWengel (2), Licia Colli (3,4), Fredrick Kabi (5), Charles Masembe (5), Vincent Muwanika (5), Riccardo Negrini (3,6), Michael W. Bruford (2), Stéphane Joost (1) and the NEXTGEN Consortium*

Lab. of Geographic Information Systems (LaSIG), EPFL, Lausanne, Switzerland
Organisms and Environment Division, Cardiff School of Bioscience, Cardiff University, Cardiff, UK
Istituto di Zootecnica, Facoltà di Agraria, Università Cattolica del S. Cuore di Piacenza, Piacenza, Italy
Centro di Ricerca BioDNA, Facoltà di Agraria, Università Cattolica del S. Cuore di Piacenza, Piacenza, Italy
Institute of Environment & Natural Resources, Makerere University, Kampala, Uganda
Associazione Italiana, Allevatori, Rome, Italy

* EU funded project

Email:

sylvie.stucki@epfl.ch

Abstract:

Introduction

Since its introduction [1], landscape genomics has developed quickly with the increasing availability of both molecular and topo-climatic data. Current challenges involve processing large numbers of models and disentangling selection from demography. Several methods address the latter, either by estimating a neutral model from population structure [2] or by inferring simultaneously environmental and demographic effects [3]. Here we present Samβada, an integrated software for landscape genomic analysis of large datasets. This tool was developed in the framework of NextGen with the objective of characterising traditional Ugandan cattle breeds using single nucleotide polymorphisms (SNPs) data.

Methods

Samβada uses logistic regressions to estimate the probability that an individual carries a specific genetic marker given the habitat that characterises its sampling site [4]. The genetic data is recoded as binary variables and their association to the topo-climatic data is assessed with log-likelihood ratio (G) and/or Wald tests [5]. Models are ranked according to their scores to ease post-processing analyses.

Large SNP panels and whole-genome sequences often require sharing the computational load. When requested, Samβada splits the molecular data to distribute processing and merges the results subsequently .

While global regression models assess the overall relationships in the data, spatial patterns of associations give information about local processes at work. Samβada can measure the level of spatial autocorrelation in both molecular and environmental datasets using local and global Moran’s I [6].

Data

Blood and skin samples were collected form 102 Ugandan cattle along with their geographic coordinates. The samples were genotyped with the 800k BovineHD assays (Illumina Inc., San Diego, USA)), rendering 2.113.358 binary markers for analysis. The environment was described with 73 variables: monthly values of temperature and precipitation from WorldClim [7], and slope and aspect derived from the digital elevation model STRM3 [8].

Results

A total of 1549 significant models (G score) involving 323 loci were found (p=0.01 before Bonferroni correction). Fig. 1 shows the distribution of p-values for models involving maximum temperature in April, a variable commonly found to predict allele frequencies. Most associations were found in chromosomes 5, 14, 20 and X. The most significant model involves the SNP BovineHD0500019261 on chromosome 5 (Fig. 2). A bivariate LISA map presents the spatial association between this marker and the mean temperature in April (Fig. 3).

Discussion

High-density SNP assays allow detecting genomic regions potentially involved in local adaptation. In our study, loci under selection are associated with latitude, and the most relevant local correlations were found in Uganda’s North and South. This might indicate a demographic effect since cattle breeds differ between these regions, but it may also reflect local adaptation as many environmental parameters are correlated with latitude. The SNP BovineHD0500019261 maps to the gene CHST11 which is involved in cartilage make up.

Our study shows that landscape genomics can handle large molecular datasets. However the sampling size is critical (n=102) to assess model significance. Bonferroni correction might be too conservative for whole-genome sequencing and alternative approaches such as False Discovery Rate might be considered.

References:

[1] Luikart, G., England, P. R., Tallmon, D., Jordan, S., and Taberlet, P. (2003). The power of population genomics: from genotyping to genome typing. Trends in Ecology and Evolution, 4(12), 981–994.

[2] Coop, G., Witonsky, D., Di Rienzo, A., and Pritchard, J. K. (2010). Using environmental correlations to identify loci underlying local adaptation. Genetics, 185(4), 1411–1423.

[3] Frichot, E., Schoville, S. D., Bouchard, G., and François, O. (2013). Testing for associations between loci and environmental gradients using latent factor mixed models. Molecular Biology and Evolution.

[4] Joost, S., Kalbermatten, M., and Bonin, A. (2008). Spatial Analysis Method (SAM): a software tool combining molecular and environmental data to identify candidate loci for selection. Molecular Ecology Ressources, 8, 957–960.

[5] Dobson, A. J. and Barnett, A. G. (2008). An Introduction to Generalized Linear Models. Chapman & Hall, 3rd edition.

[6] Anselin, L. (1995). Local Indicators of Spatial Association - LISA. Geographical Analysis, 27(2), 93–115. GISDATA (Geographic Information Systems Data) Specialist Meeting on GIS (Geographic Information Systems) and Spatial Analysis, Amsterdam, Netherlands, Dec 01-05, 1993.

[7] Hijmans, R., Cameron, S., Parra, J., Jones, P., and Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal Of Climatology, 25(15), 1965–1978.

[8] Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, M., Oskin, M., Burbank, D., and Alsdorf, D. (2007). The shuttle radar topography mission. Reviews of Geophysics, 45(2).

[9] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate - A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B-Statistical Methodology, 57(1), 289–300.

Attachment:

figures-stucki.pdf