. 1 ABSTRACT
. 18 Hybrid zones provide an excellent opportunity for studying local adaptation, hybrid zone
. 19 dynamics, differential gene loss, and genetic mapping. The genus Zea, which contains the
. 20 major world crop maize, contains many diverse wild taxa collectively called teosinte. Zea
. 21 mays parviglumis, the lowland progenitor of maize, and Zea mays mexicana, its highland
. 22 sister subspecies, live parapatrically with a few regions of range overlap. In these regions,
. 23 putative hybrids have been identified in previous studies, but never deeply explored. Here
. 24 we use a published 983 SNP dataset across 2,793 Zea individuals to identify and confirm a
. 25 set of 112 Zea mays parviglumis and Zea mays mexicana hybrids, mostly clustered in three
. 26 allopatric, genetically distinct hybrid groups in Central Mexico. Here we show that these
. 27 hybrid groups have distinct environments that are intermediate between that of Zea mays
. 28 parviglumis and Zea mays mexicana. We also provide evidence that these individuals are true
. 29 hybrids and not ancestral to parviglumis and mexicana, or products of isolation by distance
. 30 using multiple methods. This work expands on previous work by demonstrating that these
. 31 plants are true hybrids, identifying hybrid groups and putative hybrid zones, and genetically
. 32 characterizing the hybrid groups. With the potential for local adaptation, variable hybrid
. 33 zone dynamics, and differential architectures of hybridization in all three populations, we
. 34 present Zea as a great, natural model system for studying hybridization and hybrid zones.
. 35 2 INTRODUCTION
. 36 Gene flow can homogenize populations across a species range, but can also serve as major
. 37 driver of evolutionary change, creating novel diversity that is then acted upon by selection
. 38 (Ellstrand (2014)). Hybridization is the creation of genetically distinct individuals through
2
. 39 gene flow between differentiated populations; hybrid zones are regions where these differen-
. 40 tiated populations come into secondary contact and reproduce (Barton & Hewitt (1985)).
. 41 Hybrid zones have been detected in a diverse set of taxa including, for example, house mouse
. 42 (Mus musculus Gim ́enez et al. (2016)), sunflower (genus Helianthus Rieseberg et al. (1999)),
. 43 Oxford ragwort (genus Senecio Brennan et al. (2009)), Heliconius butterflies (genus Helico-
. 44 nius Mallet et al. (1990)), and chickadees (genus Peocile Taylor et al. (2014)) The process
. 45 of hybridization can enhance adaptability to novel conditions by increasing heterozygosity,
. 46 producing new combinations of alleles and generating novel hybrid phenotypes (Abbott et al.
. 47 (2013)). Genotypes in hybrid zones can be the result of numerous generations of hybridiza-
. 48 tion and recombination and provide an opportunity for evolutionary studies of population
. 49 dynamics, the timing of secondary contact of parental species, and unequal allele retention
. 50 including variation in the genetic architecture of hybridization across replicate hybrid zones
. 51 (Barton & Hewitt (1985); Smith & O’brien (2005)).
52
. 53 Population dynamics in hybrid zones can vary substantially. For example, hybrid popula-
. 54 tions can constitute a neutral intergradation zone when hybrids are not at a fitness advantage
. 55 or disadvantage, a tension zone when hybrids have reduced fitness relative to their parents,
. 56 or a selection-dependent (i.e. dispersal-independent) zone when hybrids show an increase in
. 57 fitness within their natal environment (Barton & Hewitt (1985)). As deviations from neutral-
. 58 ity, tension and selection-dependent zones have received the most theoretical consideration.
. 59 (I’m not sure which of us wrote this last sentence, but it’s a sentence fragment and I’m not
. 60 sure what the point was) Factors such as relative migration rate and population density, dif-
. 61 ferential parental fitness and changing climate play a role in determining the size and shape
. 62 of tension zones (Barton (1979); Key (1968); Buggs (2007); Barton & Hewitt (1985)). A
3
. 63 parental population with greater dispersal, population density, or overall fitness tends to
. 64 push a tension zone away from itself by flooding the zone with its own genotypes (Barton
. 65 (1979)). In contrast, selection-dependent zones are expected to be wider than tension zones
. 66 due to increased fitness of hybrids (Barton & Hewitt (1985)) and are also more likely to
. 67 spread beyond the region where gene flow initiated. Such hybrid populations that become
. 68 physically isolated may ultimately speciate (Barton & Hewitt (1985) & Abbott et al. (2013)).
. 69 For example, Pinus densata (High Mountain Pine), Senecio squalidus (Oxford Ragwort), and
. 70 three species resulting from Helianthus annus x H. petiolaris (H. anomalus, H. deserticola,
. 71 & H. paradoxus) were all born out of hybrid zones and subsequently occupied novel habi-
. 72 tat, becoming isolated from parental populations (Abbott & Brennan (2014), Abbott et al.
. 73 (2008), Rieseberg et al. (1998), & Heiser et al. (1969)).
74
. 75 Hybrid zones can be further characterized based on time since secondary contact of
. 76 parental taxa. More recently developed zones will be comprised of hybrids with long tracts
. 77 of admixture linkage disequilibrium (LD), which will, over time, be broken up by recombina-
. 78 tion. The time since admixture of parental taxa can be estimated by evaluating admixture
. 79 LD blocks through a process called “admixture deconvolution” or “ancestry painting” (Liang
. 80 & Nielsen (2014)). The simplest model used to determine admixture timing assumes a single
. 81 admixture event with LD tract lengths modeled under exponential decay (Liang & Nielsen
. 82 (2014)). More complex models take into account multiple admixture events and source popu-
. 83 lations, changing rates of gene flow into hybrid zones, and hybrid diffusion across geographic
. 84 space (Hellenthal et al. (2014); Gravel (2012); Pool & Nielsen (2009); Sedghifar et al. (2015)).
85
86 Finally, replicate hybrid zones can be viewed as pools of novel allele combinations upon 4
. 87 which selection can act. Patterns of parental allele loss and retention in hybrid populations
. 88 can be characterized as genomic architectures of hybridization. Nonrandom architectures
. 89 across hybrid individuals and populations provide evidence that a hybrid zone is selection
. 90 dominated (i.e., dispersal independent; Barton & Hewitt (1985)). In the case where multiple
. 91 hybrid zones are created by similar parental populations and distributed across an environ-
. 92 mental gradient, different genomic architectures of hybridization have the potential to be
. 93 locally adaptive.
94
. 95 Here we establish teosinte (i.e., wild maize) as a promising system for the study of evo-
. 96 lutionary dynamics in hybrid zones and, in particular, adaptive architectures of hybridiza-
. 97 tion. The parental taxa of teosinte hybrids are the Mexican annual teosintes Zea mays
. 98 ssp. parviglumis (hereafter parviglumis) and Zea mays ssp. mexicana (hereafter mexicana).
. 99 Parviglumis, best known as the progenitor of domesticated maize (Zea mays ssp. mays;
. 100 Matsuoka et al. (2002)), is found in the lowlands of southwest Mexico (<1800m) should it
. 101 be ”Western Mexico” here?, and mexicana is distributed throughout the highlands of the
. 102 Mexican Central Plateau (1600-2700m; Hufford et al. (2012)). The two subspecies diverged
. 103 approximately 60,000 generations ago (Ross-Ibarra et al. (2009), & Hanson et al. (1996)) and
. 104 differ in morphological features that suggest local adaptation. Parviglumis plants are green
. 105 and glabrous, while mexicana individuals are more deeply pigmented and hairy. Differences
. 106 in pigment and pilosity between parviglumis and mexicana are thought to be associated
. 107 with adaptation across the altitudinal gradient in western Mexico (Wilkes (1967); Doebley
. 108 (1984)). Pigmentation has previously been associated with cold tolerance in maize (Chong &
. 109 Brawn (1969), & Doebley (1984)) and can be beneficial to high-altitude plants by improving
. 110 their absorption of radiant solar energy (Galinat (1967), Chong & Brawn (1969)). Trichome
5
. 111 abundance has been associated with plants in cold climates generally (Daubenmire (1947);
. 112 Bosabalidis & Sawidis (2014); Carlquist (1974)). In teosinte, Lauter et al. speculated that
. 113 macrohairs indurated with silica may form a boundary layer that reduces the loss of absorbed
. 114 radiant heat at high altitude (Lauter (2001) & Lauter et al. (2004)).
115
. 116 Putative hybrids between these taxa have been reported at intermediate altitudes in re-
. 117 gions of overlap in the distributions of parviglumis and mexicana (Fukunaga et al. (2005);
. 118 Van Heerwaarden et al. (2011); Pyhajarvi et al. (2013)). The location of hybrids at inter-
. 119 mediate altitudes is compelling since hybrid zones defined by altitude are often adaptive
. 120 given the substantial environmental variation spanning altitudinal gradients (e.g., tempera-
. 121 ture, atmospheric pressure, soil moisture, light intensity, and wind velocity (Korner (2007)).
. 122 Moreover, teosinte hybrids have been documented across a region spanning hundreds of kilo-
. 123 meters and several degrees of latitude in Mexico, which presents the opportunity for varying
. 124 adaptive architectures of hybridization. To explore this system more fully and assess the
. 125 evidence for multiple, independent hybrid zones we utilized a publicly available data set of
. 126 983 SNPs genotyped in 2,793 individuals (Fang et al. (2012)). We additionally generated
. 127 new data with 822 SNPs using the same genotyping platform across 239 individuals from
. 128 12 populations The markers were not identical. That’s not clear here. Is that okay?. With
. 129 these data we identify and confirm a set of 112 Zea mays parviglumis and Zea mays mexicana
. 130 hybrids, residing in three allopatric, genetically distinct hybrid groups in the Central Plateau,
. 131 the Central Balsas River Valley, and South Guerrero State of Mexico. The environments of
. 132 these hybrid groups were all shown to be distinct and intermediate between that of Zea mays
. 133 parviglumis and Zea mays mexicana. We present multiple independent sources of evidence
. 134 that these plants are true hybrids and neither ancestral to parviglumis and mexicana, nor
6
. 135 products of IBD. Finally, we present this Mexican annual teosinte system as a great, all
. 136 natural model system for studying hybridization and hybrid zones because of its potential
. 137 for isolated local adaptation, variable hybrid zone dynamics, and differential architectures of
. 138 hybridization in all three populations.
139
. 140 3 MATERIALS & METHODS
. 141 3.1 Generation of SNP Dataset
. 142 Our starting dataset was a publicly available collection of 983 SNPs genotyped in 2,793 in-
. 143 dividuals (Van Heerwaarden et al. (2010), Van Heerwaarden et al. (2011), & Fang et al.
. 144 (2012)). From this extensive dataset we subsampled only parviglumis, mexicana and maize
. 145 individuals relevant to our analyses. We retained only maize from Mexico since these lan-
. 146 draces were most likely to have experienced gene flow with parviglumis and mexicana. The
. 147 dataset was also filtered by first removing markers, then individuals, with ≥ 10% missing
. 148 data. After data filtering, we retained 967 SNPs genotyped in 1344 individuals.
. 149 3.2 Identifying Hybrids
. 150 To determine whether an individual is substantially admixed, we first ran a STRUCTURE
. 151 (version 2.3.4)(Pritchard et al. (2000)) analysis using all group size (k) values from two to
. 152 eight, with 5,000 iterations of burnin and 10,000 MCMC repetitions, on all 1,344 teosinte
. 153 and maize individuals in the published 967 SNP dataset. The curve of the likelihood asso-
. 154 ciated with increases in k-values had mostly leveled out before reaching k=8. We therefore
. 155 determined the highest group size did not need to be increased to find the best k-value. We
. 156 then used the resulting likelihood values with the web Application, “Structure Harvester”
7
. 157 (version 0.6.93) (Earl & vonHoldt (2011)), to determine the best fitting k based on the deltaK
. 158 statistic: the second order rate of change of the likelihood. The best k was determined to be
. 159 three. We defined hybrids as individuals with greatest attribution to the teosinte subspecies
. 160 they are annotated as and a significant (≥ 25%) attribution to the other teosinte. Teosinte
. 161 with ≥ 90% self attribution were considered high confidence, non-admixed individuals. One
. 162 plant resided among Central Balsas hybrids, had a parviglumis label, but had more mexicana
. 163 attribution than parviglumis and therefore did not meet our definition of hybrid. Rather than
. 164 make a subjective determination of the the status of this individual we removed it from the
. 165 dataset.
. 166 3.3 Environmental Data Extraction
. 167 A custom R script, utilizing the package ”dismo” (version 1.1-1) using the BIOCLIM algo-
. 168 rithm with 30 second resolution, was used to extract environmental data related to tempera-
. 169 ture and precipitation based on the geographic coordinates of our hybrid individuals within
. 170 hybrid groups as well as parviglumis and mexicana samples from our SNP dataset.
. 171 3.4 Fst X Fst Calculations
. 172 Wright’s Fst was determined with custom R and Python scripts using the R package “hierf-
. 173 stat” (version 0.04-10). 60 Scatterplots were created using a custom R script comparing Fst
. 174 values of each hybrid group (two at a time) vs each parent population (one at a time) for
. 175 each of 10 chromosomes. 60 corresponding density plots were also created using a custom
. 176 R script. The density plots show the distances of all points on each scatterplot to the y=x
. 177 line. For each point this was done by creating a virtual point reflected across the y=x line
. 178 by switching the x and y values. The Euclidean distance between the real and virtual points
8
. 179 was calculated, divided by two, then multiplied by -1 if the real point resides below the y=x
. 180 line.
. 181 3.5 Principle Component Analysis
. 182 The Principle Component analysis was done with a custom R script using the package
. 183 “prcomp” (from the “stats” package version 3.2.2). give credit to Joost script? I would
. 184 cite the paper where this script was first used and I would explain how the script customizes
. 185 the analysis beyond with prcomp can do Do you know what paper this was first used in?
. 186 3.6 STRUCTURE Analysis with Both Datasets
. 187 A STRUCTURE (version 2.3.4) analysis was performed using group sizes (k-values) from
. 188 two to eight with 100,000 iterations of burnin and 1,000,000 MCMC repetitions both for
. 189 the published 967 SNP dataset (Fang et al. (2012)) and for the generated 822 SNP dataset
. 190 separtely. Three was chosen as the best k for both datasets using the deltaK method (Earl &
. 191 vonHoldt (2011)). Also, using k ̄3 three randomly selected members of each each population
. 192 of high-confidence parviglumis, high-confidence mexicana, and Mexican maize from the 967
. 193 SNP dataset and each population from the 822 SNP dataset were used to test the effects
. 194 of subsampling on STRUCTURE results. Five biological replicates were ran using k ̄3, 3,000
. 195 iterations of burnin and 10,000 MCMC repetitions and the highest mean ln likelihood repe-
. 196 tition was chosen for presentation. Plots were generated using the program distruct (version
. 197 1.1) (Rosenberg et al. (2002)).
9
. 198 3.7 Inversion Analyses
. 199 Wright’s Fst (Wright (1952)) was calculated with custom R and Python scripts using the
. 200 R package “hierfstat” (version 0.04-10) comparing high-confidence parviglumis and mexi-
. 201 cana samples. Markers with an Fst greater than 0.7 were extracted using a custom Python
. 202 script. Four markers were then identified as high-confidence inv4m markers (PZD00030.4,
. 203 PZA01637.3, PZA01637.4, and PZD00030.1), meaning that they lie within the well known
. 204 inversion on chromosome Four, citation here, any ideas Matt? Where was it first identi-
. 205 fied?, and they have >0.95 Fst values. Using another custom Python script, these loci were
. 206 examined in all high-confidence parviglumis and mexicana samples and a parviglumis and
. 207 mexicana type were identified or each locus based on allele frequencies within each subspecies.
. 208 Each individual was then analyzed using a custom Python script, and had one of five possible
. 209 mutually exclusive inversion states assigned to it based on genotypes at these loci.
210
. 211 The frequency of the mexicana inversion types per sampling site was plotted against alti-
. 212 tude. This frequency was calculated by summing, for each sampling site, mexicana inversion
. 213 types weighted as two, parviglumis types weighted as zero, and heterozygotes types weighted
. 214 as one normalized by 2N where N is the sum of mexicana, parviglumis and heterozygous in-
. 215 version types in the sampling site. In order to be considered as having “mixed” hybrid status
. 216 a sampling site must have at least 20% minority hybrid status individuals. The highest Fst
. 217 SNP on inv4m between high confidence parviglumis and high confidence mexicana was also
. 218 determined in order to look at the inversion data in terms of allele frequencies.
219
10
. 220 3.8 ConStruct Analysis
. 221 The program conStruct (version beta) was used to generate a STRUCTURE-like q-matrix
. 222 explicitly controlling for isolation by distance. Because the package is not designed for high
. 223 numbers of individuals and cannot run with more individuals than SNPS, we randomly sub-
. 224 sampled one individual from each parviglumis, mexicana, and hybrid accession reducing the
. 225 data to 195 individuals. Using these individuals we ran the program under the spatial model
. 226 using default parameters. We ran conStruct at group size two with five replicates, then used
. 227 the output from the highest likelihood replicate to determine the percent attribution to the
. 228 alternative teosinte for each group, where the alternative teosinte is the mexicana annual
. 229 teosinte that does not match the individual’s taxonomic description. The groups used to
. 230 make the alternative attribution table were high-confidence parviglumis, ambiguous parvig-
. 231 lumis, high-confidence, high-confidence mexicana, ambiguous mexicana, all hybrids, Central
. 232 Plateau hybrids, Central Balsas hybrids, and South Guerrero hybrids.
233
. 234 4 RESULTS
. 235 4.1 Identifying Hybrids and Hybrid Groups
. 236 To identify hybrids, we used the program STRUCTURE with a subset of a publicly avail-
. 237 able SNP dataset (Van Heerwaarden et al. (2010), Van Heerwaarden et al. (2011), & Fang
. 238 et al. (2012)) that includes 967 SNPs genotyped in 1,344 individuals. STRUCTURE uses
. 239 a model-based approach to infer population structure and assign individuals to populations
. 240 probabilistically (Pritchard et al. (2000)). STRUCTURE results (Figure 4), suggested that
. 241 the population structure of maize and the Mexican annual teosintes, together, was best de-
11
. 242 scribed as having three groups: parviglumis, mexicana, and maize. This aligns perfectly with
. 243 the taxonomic descriptions assigned to these individuals. Within each taxonomic group, in-
. 244 dividuals were sorted from lowest to highest altitude. Many of the admixed individuals in
. 245 both teosinte subspecies were the individuals closest in altitude to the altitudinal distribution
. 246 of the other subspecies, and were therefore more likely to have come into contact with and
. 247 potentially admixed with the other subspecies (Figures 4 & 20).
248
. 249 We plotted all 1344 individuals on a map of Mexico (Figure 5). We also used the q-matrix
. 250 from STRUCTURE to make a map of Mexico with pie charts representing hybrid individuals
. 251 and their genome-wide attribution to the three taxonomic groups (Figure 6). Using this data
. 252 we identified 112 high-confidence hybrids, 634 high-confidence non-admixed parviglumis in-
. 253 dividuals, 95 high-confidence non-admixed mexicana individuals, as well as 176 parviglumis
. 254 and 51 mexicana individuals that showed evidence of admixture, but did not pass the thresh-
. 255 old to be considered high-confidence hybrids. Most high-confidence hybrids were among three
. 256 groups of clustered hybrid populations in Mexico: one in the Central Plateau (CPG), one in
. 257 the Central Balsas River Valley (CBG), and one in Southern Guerrero State (SGG) (Figure
. 258 5). Relatedly, the putative hybrid zones they reside in will be referred to as the CPZ, CBG,
. 259 and SGZ respectively. These hybrid groups varied substantially in the number of individuals
. 260 sampled: 4 in the CPG, 84 in the CBG, and 14 in the SGG. Note that this was not all
. 261 admixed individuals in these areas, but rather those found using our SNP dataset which also
. 262 passed our high threshold for being labeled high-confidence hybrids. The hybrid groups also
. 263 differed in their general pattern of attribution to parviglumis, mexicana and Mexican maize.
. 264 The CPG showed greatest attribution to mexicana, the SGG showed greatest attribution to
. 265 parviglumis, and the CBG showed greatest attribution to parviglumis, much attribution to
12
266 mexicana, and some to maize. 267
. 268 In addition to the CPG,CBG, and SGG, other hybrids were identified including those
. 269 from a sampling location East of the CBG and North of the SGG called Ahuacatitlan (Fig-
. 270 ure 6). This population was noteworthy as a previously studied mexicana-parviglumis hybrid
. 271 population (Fukunaga et al. (2005) & Pyhajarvi et al. (2013)). For this reason, we wanted
. 272 to include the population as part of either the CBG or SGG if it was appropriate. We also
. 273 needed a more rigorous method for determining hybrid groups than geographic ranges in
. 274 order to ensure that each group is genetically distinct and showed within-group genetic simi-
. 275 larity. In order to solve both problems concurrently, we ran a Principle Component Analysis
. 276 (PCA) with all parviglumis, mexicana, and hybrid samples.
277
. 278 4.2 Confirming Hybrid Grouping
. 279 The PCA showed clear clustering of all Zea subspecies as well as hybrids broadly and all
. 280 hybrid groups indicating common ancestry within each hybrid group and genetic differen-
. 281 tiation between groups (Figure 7). This validates both our threshold for identifying high
. 282 confidence hybrids as well as our grouping of these hybrids. As for the Ahuacatitlan pop-
. 283 ulation, we found they form their own cluster separate from the CBG and the SGG. We
. 284 therefore concluded that the Ahuacatitlan plants could not reasonably be considered a part
. 285 of any of the hybrid groups identified here. For this reason, together with the fact they
. 286 represented only one sampling location, we decided to consider the Ahuacatitlan samples as
. 287 part of the “other hybrids” category. PC1 distinguished between parviglumis and mexicana
. 288 with hybrids inhabiting the space between, while PC2 distinguished between the CBG and
13
. 289 parent samples. We also made a biplot of both samples and SNPs on PCs one and two and
. 290 found that among the top 10 SNPs, in terms of vector magnitude, were one SNP on the large
. 291 inversion on chromosome3 (citation here for chr3 inversion), five SNPs on inv4m (citation
. 292 here for inv4m), and one SNP located on the ZMM4 gene, a MADS-box gene which affects
. 293 flowering time and development (Danilevskaya et al. (2008)) (Figures 8 & 25). In fact, four
. 294 of the five SNPs on inv4m were the top four SNPs in the entire dataset by this measure
295
. 296 We wondered whether the CBG’s increased genetic differentiation from its parent sub-
. 297 species could be explained by its unique STRUCTURE attribution to maize (Figure 6). To
. 298 test this we reran the PCA with maize as part of the dataset. We found that when maize
. 299 was incorporated into the PCA (Figure 9) the CBG did not cluster with maize, and was not
. 300 closer to maize than all other teosinte samples. We therefore could not conclude that the
. 301 CBG’s place in the original PCA was due to genetic relatedness to maize. Here PC1 seemed
. 302 to explain the differentiation between teosinte and maize, and PC2 seemed to distinguish
. 303 between parviglumis/maize and mexicana with all hybrids clustering with teosinte relative
. 304 to PC1 and inhabiting the center with regards to PC2.
305
. 306 4.3 Confirming Hybrid Identification
. 307 In order to determine whether our hybrids are true hybrids, and not merely products of Iso-
. 308 lation by Distance (IBD) we performed a STRUCTURE analysis using k values from two to
. 309 eight on known parviglumis, mexicana, and hybrid populations from our generated 822 SNP
. 310 dataset and separately with our 967 SNP dataset. In both cases 100,000 iterations of burnin
. 311 and 1,000,000 MCMC repetitions were used in order to generate results of the highest quality.
14