Introduction

Mobile insect pests regularly colonize annual crops from alternative host plant sources1,2,3. Protecting crops from attack by these pests is difficult for pest managers due to the unpredictable nature of seasonal outbreaks, particularly when insecticide resistant genotypes are present4,5. For mobile insect pests, dispersal among crop and non-crop host plant resources influences both the seasonal dynamics and genetic background of pest populations, with direct consequences for pest management6,7,8,9. Molecular studies of population structure and gene flow can potentially provide insight into patterns of insect movement in agricultural landscapes10,11.

The diamondback moth, Plutella xylostella L., is the most destructive pest of brassicaceous crops worldwide12,13. It attacks Brassica vegetable crops throughout tropical and temperate regions14 and in recent decades has become a significant pest of canola crops in temperate regions4,15,16,17,18. The propensity of P. xylostella to evolve insecticide resistance rapidly and a lack of alternative control options has led it to evolve resistance to most pesticides4. Within Australia, P. xylostella has been a major pest of Brassica vegetable crops since the late 1800s19 and a sporadic but damaging pest of canola crops since the 1990s following a dramatic expansion of canola production18. Approximately 3 million hectares of canola is grown annually under a Mediterranean climate in southern Australia20, providing vast host resources for P. xylostella from crop planting in autumn to maturity in late spring. Intermittent outbreaks of P. xylostella in canola during spring cause substantial crop losses18,21. Commercial Brassica vegetable crops are grown continuously in horticultural areas surrounding the major urban centres in each Australian state, occupying a total area less than 1% of total canola plantings22. Other host plants for P. xylostella include Brassica forage crops grown during spring and summer as stock feed, and a diversity of introduced and native wild brassicaceous species distributed over vast areas and proliferated by rainfall. In Brassica vegetable crops, limited P. xylotella dispersal and intense insecticide use targeting this insect can lead to elevated levels of insecticide resistance in local P. xylostella populations6,22,23. In canola-growing areas, the highly seasonal availability of host plants compels P. xylostella to move regularly among crops and other brassicaceous host plants, which tends to homogenise levels of insecticide resistance24. Estimating gene flow among local populations of P. xylostella between host plant types and identifying source populations of P. xylostella that seasonally infest Australian canola are essential to facilitate forecasting of seasonal pest pressure and inform insecticide resistance management13,18,19.

Various molecular markers have been used to investigate population structure in P. xylostella, including allozymes, ISSRs, microsatellites and mtDNA. Plutella xylostella populations from different continents are clearly differentiated19,25,26, but a lack of genetic structure among populations within parts of Asia10,27,28, the USA29,30 and Australasia19,31 implies regular intermixing at an intra-continental level. Many population genetic studies of P. xylostella have been difficult to interpret due to limited sampling32 and few studies have sampled at a sufficient spatiotemporal scale or resolution to investigate movement at a landscape scale. However, two studies successfully identified the seasonal migration pathways of P. xylostella in China through extensive field sampling and analysis of both microsatellite markers and geographic variation in mtDNA haplotype frequencies10,28. Inferences from genetic data were corroborated by light trapping33.

Within Australia, Endersby et al.19 found no differentiation at six microsatellite loci among 17 populations across Australia and one from New Zealand, despite a sampling scale spanning > 5,000 km. These Australasian populations were clearly differentiated from populations collected in Asia and Africa19. Australian P. xylostella has low genetic diversity consistent with a founder effect19,26,31,34,35. Present levels of gene flow among Australian P. xylostella populations remain to be resolved because genetic homogeneity could reflect co-ancestry18, and because the statistical power of six microsatellites to detect weak population structure was uncertain. Furthermore, inconsistent patterns of population structure reported among P. xylostella collected from eastern Australia25,36 may reflect the presence of a cryptic species, Plutella australiana, among analysed samples35,37.

The revolution in massively parallel sequencing technologies38 and associated genotyping methods has facilitated genome-wide genetic marker sets and brought unprecedented resolution to questions of population structure39,40. Restriction-site-associated DNA sequencing (RAD-seq)41 enables sequencing of targeted short regions across the genome, allowing simultaneous discovery and genotyping of single nucleotide polymorphisms (SNPs) in model and non-model species42,43. The ability to sequence orthologous regions across multiple individuals at high sequencing coverage makes it possible to confidently genotype SNPs and generate high density markers for population genetic studies40,44. Microsatellites remain popular for population genetic studies due to high polymorphism45, but can be outperformed by large SNP panels in resolving population structure46,47, with several examples in insects48,49. RAD-seq has genotyped thousands of SNPs in P. xylostella50 and resolved species-level nuclear divergence between cryptic Australian Plutella species35, suggesting potential for this method to provide insight into the movement patterns of P. xylostella.

Here, we examined whether geographic, host plant-related or temporal population genetic structure exists among geographically distinct populations of P. xylostella in Australia. Samples were collected from canola crops, Brassica vegetable crops, Brassica forage crops and wild brassicaceous plant species throughout southern Australia and in two consecutive years to facilitate temporal comparisons. After molecular species identification, P. xylostella individuals were genotyped across genome-wide sites using RAD sequencing for population genetics analysis.

Results

Sample collection

Plutella species were collected from different Brassica host plants and locations throughout southern Australia in 2014 and 2015 (Fig. 1). After species identification using PCR-RFLP, 909 P. xylostella individuals from 60 locations, 32 in 2014 and 28 in 2015, were retained for analysis (Table 1). In total, 29 populations were collected from canola crops, 15 from Brassica vegetable crops, three from Brassica forage crops and 13 from brassicaceous weeds. Of these, 52 populations were collected in spring and seven in autumn. Seven locations were sampled in both 2014 and 2015 to facilitate a temporal analysis, of which five locations were Brassica vegetable crops from the major Brassica vegetable production areas in each Australian state (Fig. 1). Sex was determined for the 681 pupal individuals (82% of all individuals) but not larvae. The overall sex ratio was not different from 1:1 (364 males, 317 females, \(\chi ^2=3.2438\), p = 0.0717) and most populations had a reasonably balanced sex ratio (Table 1).

Figure 1
figure 1

Geographic locations of 59 P. xylostella populations collected in Australia in 2014 and 2015 and sequenced using RAD-seq. Collections from different Brassica host types are represented by different colours. Canola is grown in dryland cropping areas of southern Australia, represented in grey shading.

Table 1 Summary of P. xylostella collections from Australia.

Read filtering and variant calling

RAD-seq was performed for 909 P. xylostella individuals from 60 collection locations, including 15 individuals randomly selected from different library pools and sequenced as technical duplicates to check the robustness of genotype calls. Illumina sequencing yielded 2.36 billion raw sequence reads after de-multiplexing. Following read trimming and filtering, mapping, genotype calling and hard-filtering, we excluded 50 individuals with greater than 60% missing data, which was largely due to low sequencing depth (Supplementary Fig. S1), then excluded the 15 technical duplicates and a population with only two individuals remaining. Nine individuals with unusually high levels of polypmorphism and investigated using mtDNA amplicon sequencing were found to be contaminated and were excluded. Genotyping and hard-filtering steps were then repeated for the remaining 833 individuals across 59 population samples, including 434 individuals from 31 populations collected in 2014 and 399 individuals from 28 populations collected in 2015. Hard-filtering retained 590,086 confidently-called (GQ \(\geqslant\) 30) variant and invariant sites at a mean depth of 33.4 per individual, and a subset of 1,032 widely-dispersed (to avoid linkage bias) bi-allelic SNP variants at a mean depth of 34.0 per individual, for downstream analyses. In reference-aligned SNP datasets with read depth > 30, genotyping error rates are expected to be < 0.0151. The datasets for 2014 and 2015 were analysed separately.

For the 15 technical duplicates, the VCF output from HaplotypeCaller was hard-filtered using our parameters to retain the 30 samples and a set of 1,473 widely-dispersed bi-allelic SNP variants. Principal components analysis showed that sample pairs group closely together, indicating that genotype calls were highly consistent (Supplementary Fig. S2).

Genetic diversity statistics

Population genetic diversity was estimated using the 590,086 variant and invariant sites. The mean observed heterozygosity per population averaged 0.0092 ± 0.0002 SD (range = 0.0088, 0.0097) across the 59 populations and showed little variation across populations collected from different years (2014, n = 31 and 2015, n = 28), host plant types (canola, n = 30, Brassica vegetable crops, n = 15, Brassica forage crops, n = 2, and wild brassicas, n = 12) or seasons (autumn, n = 7 or spring, n = 52). In general, observed heterozygosity was lower than expected as shown by mostly positive \(F_{\text{IS}}\) values, suggesting some inbreeding (Tables 23, Supplementary Fig. S3). The population from Southend 2015 had reduced gene diversity and fewer private sites relative to other populations. Across the 1,032 SNPs, observed heterozygosity and gene diversity within each year showed reasonable agreement (Tables 23, Supplementary Fig. S3), indicating allele frequencies at these loci are in Hardy–Weinberg proportions. Again, for this marker set, the Southend 2015 population had the lowest genetic diversity among populations, contributing to a negative \(F_{\text{IS}}\) value.

Table 2 Population statistics for all 590,086 confidently-called variant and invariant sites, and a subset of 1,032 hard-filtered SNP loci, for 31 P. xylostella populations collected from Australia in 2014.
Table 3 Population statistics for all 590,086 confidently-called variant and invariant sites, and a subset of 1,032 hard-filtered SNP loci, for 28 P. xylostella populations collected from Australia in 2015.

Power analysis

The power analysis indicated that our SNP marker loci had a high level of statistical power to detect even weak population structure. The 1,032 SNP loci had 100% probability of detecting true \(F_{\text {ST}}\) values of 0.0027 or 0.0056 (Supplementary Table S1), corresponding to the estimated global \(F_{\text {ST}}\) values for the 2014 and 2015 datasets.

Population differentiation

The global estimates of \(F_{\text {ST}}\) calculated using 1,032 SNPs were not significantly different from zero in either 2014 (\(F_{\text {ST}}=0.0027\), 99% CL \(=\,-\) 0.0043, 0.0107) or 2015 (\(F_{\text {ST}}=0.0056\), 99% CL \(=\,-\) 0.0019, 0.0138), indicating a lack of genetic differentiation among populations within years. Pairwise \(F_{\text {ST}}\) values were generally very low, ranging from 0.0065 to 0.0178 (mean 0.0029 ± 0.0040 SD) in 2014 and − 0.0077 to 0.0344 (mean 0.0054 ± 0.0075 SD) in 2015 (Fig. 2). After correction for multiple comparisons (2014: n = 465 comparisons, 2015: n = 365 comparisons), no pairwise \(F_{\text {ST}}\) values were significant at the target \(\alpha =0.05\) level, indicating a lack of genetic differentiation among P. xylostella populations collected within a single year. The highest pairwise \(F_{\text {ST}}\) values were associated with the Southend 2015 population, ranging from 0.0221 to 0.0344 (mean 0.0265 ± 0.0035 SD, n = 27 comparisons), indicating allele frequencies in this population were the most divergent from other populations (Fig. 2). AMOVA analysis using 1,032 SNPs indicated a lack of any spatial, temporal or host-plant related genetic structure among populations (Table 4). In model A, where populations were divided into years and Brassica host types, > 99% of variance was found within populations with negligible variance among populations explained by year or host type. Similarly, in model B, where seven locations were sampled in both years, > 99% of variance was found within populations. These results precluded interpretation of whether there was more spatial or temporal variance among populations.

Figure 2
figure 2

Heat maps showing pairwise comparisons of genetic distance measured as Weir and Cockerham’s (1984) \(F_{\text {ST}}\) (top panels) and geographic distance in kilometres (bottom panels) among P. xylostella populations collected from Australia in 2014 (left panels) and 2015 (right panels). Within each year, populations on x and y-axes are sorted geographically from north-western to north-eastern Australia in an arc following the southern coast. Visual comparison of the \(F_{\text {ST}}\) and geographic distance heat maps within each year shows no congruence between genetic and geographic distance among population pairs in 2014 or 2015.

Under isolation by distance, geographic and genetic distances should be positively correlated52. Populations were collected across geographic distances of up to 3,756 km (Northampton WA/Bundaberg QLD) in 2014 (mean distance 1,323 ± 960 SD km) and 3,624 kilometres (Manjimup WA/Bundaberg QLD) in 2015 (mean distance 1,263 ± 917 SD km). Given the vast sampling scale, we expected higher \(F_{\text {ST}}\) values at greater geographic distances between population pairs, however heat maps revealed no such pattern (Fig. 2). Mantel tests confirmed a lack of genetic isolation by distance in both 2014 (Mantel’s r = 0.1136, p = 0.1316) and 2015 (Mantel’s r = − 0.0901, p = 0.8222) datasets, indicating that P. xylostella populations in close proximity or separated by thousands of kilometres were equally differentiated.

Table 4 Analysis of molecular variance under two hierarchical model structures.

Population structure was explored using two different individual-based clustering approaches. First, STRUCTURE analysis was performed using the widely-dispersed 1,032 SNPs and analysing 2014 and 2015 populations separately. We first determined the predicted optimal values for K, then examined bar plots for several K values to assess hierarchical population structure. In 2014, the data most likely formed two genotypic clusters, with the delta K method and mean likelihood value both producing an optimal at K = 2 (Supplemementary Fig. S4). At this K value, bar plots showed that most individuals shared nearly uniform ancestry across the major genotypic cluster regardless of geographic location (Fig. 3). A second genotypic cluster was largely associated with three individuals from Esperance, which showed 98.7%, 98.7% and 56.5% cluster assignment, while of the remaining 396 individuals, only 17 individuals were greater than 1% (1.0 to 9.3%) admixed across this cluster. At K = 3 and K = 4, no significant additional population structure was detected, with the additional genotypic clusters associated with two individuals from Boyup Brook and two individuals from Cocata (Supplementary Fig. S5).

Figure 3
figure 3

Proportional assignment to genotypic clusters, K, based on STRUCTURE analysis of P. xylostella individuals from Australia in 2014 and 2015. Individuals are represented by vertical bars and genotypic clusters are represented by different colours. Individuals collected each year were analysed separately and in both years the data most likely formed two genotypic clusters. Top panel: Analysis at K = 2 for 434 individuals collected from 31 locations in 2014. Bottom panel: Analysis at K = 2 for 399 individuals collected from 28 locations in 2015. Within years, bar plots show a high degree of genotypic admixture across individuals regardless of geographic location, as shown by sharing of blue-coloured bars, with a second genotypic cluster represented by red-coloured bars shared predominantly by several individuals at a single location.

In 2015, the delta K method produced an optimal at K = 2 and weaker secondary modes at K = 3 and K = 5 while the highest log-likelihood occurred at K = 5 (Supplementary Fig. S4). The modes at K = 3 and K = 5 indicate sub-structure in the data. At K = 2, most individuals shared nearly uniform ancestry across the major genotypic cluster regardless of geographic location (Fig. 3). The second genotypic cluster was predominantly associated with individuals from Southend, where 10 individuals showed 31.7 to 99.4% cluster assignment. At higher K values, further geographic structure was identified. At K = 3, two clusters were mainly associated with Southend (cluster A, 7 individuals with 26.1–98.6% assignment; cluster B: 10 individuals with 33.2–99.5% assignment) (Supplementary Fig. S5). At K = 4, the additional cluster was mainly associated with individuals collected from Walkers Beach in both autumn and spring 2015, showing a consistent pattern at both time points. At K = 5, the additional cluster was mostly represented by three individuals from Werombi. To further examine hierarchical structure, we reanalysed the 2015 data after removing Southend. This resulted in a weak delta K optimal at K = 3, but showed the same clustering pattern as the full 2015 dataset at K = 5 and is not presented.

Individual-based PCA analysis identified clustering patterns consistent with the STRUCTURE analysis. In both years, eigenvalues for the first principle component (PC) were not strongly different from eigenvalues for other PCs, indicating no clear axis of variance in the data, and individuals across different geographic populations clustered together to a high degree (Fig. 4). In both years, PCA identified the most divergent individuals consistent with those in the STRUCTURE analysis. In 2014, three individuals from Esperance and two individuals from Cocata clustered distinctly along separate PC axes. In 2015, two groups of individuals from Southend clustered distinctly along the two PCs axes, and three individuals from Werombi formed an identifiable cluster along the vertical PC axis.

Figure 4
figure 4

Principal components analysis of P. xylostella individuals collected from Australia in 2014 and 2015. Individuals are represented by small circles colour-coded by geographic population. Two populations with the most divergent individuals in each year are labelled.

Discussion

In successful invading species, colonizing populations often exhibit reduced genetic diversity compared to their population of origin53. Previous molecular studies found a lack of genetic structure19 or inconsistent patterns of genetic structure25,36 among Australian populations of P. xylostella, and low genetic diversity implied a bottleneck during colonization19,31. To elucidate the movement patterns of P. xylostella in Australian canola cropping systems, we performed a comprehensive study of genetic structure among P. xylostella populations from crop and non-crop brassicaceous host plants throughout southern Australia. The study design included extensive field sampling to reflect the dispersal ecology of the species, molecular species identification, minimisation of sex bias, and a powerful SNP marker set derived from RAD-seq.

Genome-wide analysis revealed a distinct lack of genetic structure among Australian P. xylostella populations, irrespective of geographic location, host plant, or sampling year. This pattern was temporally stable at seven locations sampled in both 2014 and 2015. Our findings based on SNPs were highly consistent with those based on six microsatellites19. Our SNP-based estimates of global \(F_{\text {ST}}\) within Australia were 0.0027 in 2014 and 0.0056 in 2015, compared to 0.0051 from microsatellites19. In both studies, > 99% of genetic variance occurred within populations with negligible variance explained by different locations or host plants. There was no evidence for isolation by distance, implying that any two populations, whether separated by distances of only several km or > 3,000 km, may be equally differentiated. Australian P. xylostella forms a single homogeneous population across bi-allelic neutral SNP markers.

Cluster analysis confirmed the overall lack of genetic divergence among populations. In both years, STRUCTURE analysis identified K = 2 as the most likely number of genotypic clusters. This is a common result among studies employing the delta K method54, because K = 1 cannot be obtained and because K = 2 often represents the top level of hierarchical population structure. At K = 2, a small number of divergent samples were identified at single geographic locations, including Esperance in 2014 and Southend in 2015. In 2015, at K values \(\geqslant\) 3, two genotypic clusters occurred predominantly within the Southend population. Do these admixture patterns reflect genetic isolation? STRUCTURE sorts groups into Hardy–Weinberg linkage populations under the assumption of independent loci55,56. Among all populations, Southend had the lowest gene diversity, the fewest private sites and the highest \(F_{\text {ST}}\) values in pairwise population comparisons. Notably, this population was collected from a small and isolated patch of sea rocket consisting only of several large plants. It is likely that cluster patterns reflect an artefact of sampling related individuals at Southend57,58. In hierarchical STRUCTURE analysis of 2014 and 2015 samples, additional genotype clusters at successively higher K values occurred in a small number of individuals from single locations as STRUCTURE simply grouped the next most related samples at each hierarchical level. These results highlight the need for caution when samples of related individuals are present, to avoid false inferences of population structure.

There is no evidence to suggest divergent samples represent interspecific hybrids of P. xylostella and the sympatric species, P. australiana. Although these species can hybridize in laboratory crosses, whether hybridization occurs in the wild is unknown35. Whole genome analysis of 29 Plutella individuals found no evidence for widespread introgression between these two species59. In our study, all Esperance and Southend individuals exhibited levels of heterozygosity across > 550,000 variant and invariant genome-wide loci that were similar to other P. xylostella. We would expect substantially higher heterozygosity if individuals were interspecific hybrids or if DNA samples were contaminated. PCA analysis of P. xylostella from Esperance 2014, Southend 2015 and five other locations, together with five P. australiana populations from Perry et al.35, showed clear species groupings and no evidence of introgressed individuals (Supplementary Fig. S6).

Genetic variation within a species is shaped by historical and contemporary evolutionary processes7,60. Because genetic homogeneity among Australian P. xylostella could reflect common ancestry18, present gene flow patterns are not clear from our data. Considering the vast size of the continent, it seems unlikely that P. xylostella forms a panmictic population in Australia in the sense that interbreeding is completely random. Saw et al.31 reported a small degree of sub-population structure among P. xylostella from 14 Australian locations based on geographic variation in frequencies of the dominant mtDNA haplotype. Insecticide resistance profiles of P. xylostella can vary spatially when the intensity of insecticide use differs across locations and host plant types, indicating that gene flow is often insufficient to overwhelm the effects of local selection on frequencies of resistance alleles6,29. Mo et al.23 found that P. xylostella moves over limited distances within actively growing Brassica vegetable crops where suitable host plants are continuously available. There is little propensity to emigrate during summer when crops are irrigated and the surrounding landscape is dry. Limited movement in these crops can lead to functional isolation of sub-populations, whereby resistance is selected by the insecticide spray regimes of individual farmers. Local selection, rather than spread of resistance alleles through gene flow, largely explained variation in levels of resistance to synthetic pyrethroids among Australian P. xylostella populations6.

Genetic differentiation indices are likely to over-estimate rates of gene flow, to the extent that genetic uniformity among P. xylostella reflects a genetic bottleneck and range expansion during colonization of Australia61. Although outgroups from other continents were unavailable for comparison, our recent data support the view that P. xylostella within Australia displays reduced genomic diversity compared to populations elsewhere. Australian P. xylostella, including 47 individuals from our study, were previously shown to exhibit 1.5-fold lower levels of heterozygosity across the nuclear genome relative to the endemic congeneric species, P. australiana35. Similarly, a study of microsatellite variation reported lower nucleotide diversity within P. xylostella populations from Australia than populations from Kenya and Indonesia19. Plutella xylostella exhibits lower mtDNA haplotype diversity within Australia26,31,34,35 than within parts of Asia, Africa, Europe and North America10,26,27,28,31,34,62. Analysis of mtDNA in 102 Australian P. xylostella individuals, including 44 individuals from our study, identified only five closely-related COI haplotypes (613 bp) and two dominant shared haplotypes35. Reduced diversity across the genome is strong evidence for a bottleneck63. By contrast, a recent global study of P. xylostella whole genomes reported unexpectedly high levels of SNP diversity within populations from the Oceania region, including Australia, New Zealand, Vanuatu and Samoa, compared to populations from putative historical source regions in the Americas, Europe, Africa and Asia64. Whether loci under selection may explain this pattern warrants further study. Plutella xylostella within Australia and New Zealand appears to have been founded by a small number of females derived from an ancestral lineage in southern Asia26,31,34,35,64.

It is likely that genetic homogeneity across the Australian P. xylostella distribution is maintained by some level of ongoing gene flow. RAD-seq markers revealed weak but significant genetic differentiation among field and laboratory-reared Australian P. xylostella populations (\(F_{0}\) to \(F_{6}\))50 but not wild populations (\(F_{0}\), this study), implying that intermixing prevents divergence. Neutral bi-allelic SNPs failed to reveal the scale, frequency and timing of gene flow. Even very few migrants per generation can eliminate genetic differentiation among populations65,66, especially where genetic diversity is low. If a small founding P. xylostella population originally colonized Australia31, its present distribution throughout Australia demonstrates past gene flow at a continental level. This is consistent with the wide distribution of two shared haplotypes and with P. xylostella being a migratory species4,13. Within Australia, there is ample indirect evidence of P. xylostella dispersal, including the seasonal widespread colonization of winter-grown canola crops18,67 and detection of moth flights in light trapping and pheromone trapping studies19,37,67,68,69. In canola-growing areas remote from Brassica vegetable production, the annual canola cropping cycle of crop planting, senescence and harvest forces P. xylostella to disperse regularly between crops and wild brassicaceous host plants in the landscape. This tends to homogenise the insecticide resistance profiles of P. xylostella across Australian canola-growing regions, and among canola crops and Brassica weeds within each region24.

Gene flow creates potential for the spread of resistance alleles within and among Australian Brassica cropping systems. For certain newer insecticide chemistries, elevated resistance levels occur in P. xylostella within intensively sprayed Brassica vegetable crops relative to insects from canola crops or weeds6. Emigration of resistant moths from these cropping areas could contribute to the risk of P. xylostella insecticide resistance in canola cropping systems. Conversely, seasonal flights of P. xylostella moths into Brassica vegetable crops during spring32,68, perhaps originating from senescing canola crops or weeds, could dilute resistance levels if immigrants are more susceptible, and provide opportunities for rotation strategies to manage resistance. Insecticide-resistant P. xylostella genotypes can persist locally in Australian canola-growing areas where summer-active brassicas occur67. Evidence for large-scale gene flow among Australian P. xylostella suggests insecticide resistance alleles arising in one location could readily spread to other locations. These alleles may be selected to a high frequency in local areas if insecticides are used repeatedly70.

Neutral genome-wide SNPs were uninformative in identifying the dispersal patterns of P. xylostella in Australia, confirming previous conclusions19. Whether larger marker sets derived from massively parallel sequencing can provide insights into seasonal migration of P. xylostella in other global regions, where the species displays higher genetic diversity, remains to be evaluated.

Methods

Sample collection

Immature life stages (larvae or pupae; rarely, eggs) of Plutella species were collected between March 2014 and December 2015 throughout agricultural areas of southern Australia (Fig. 1). The dryland cropping areas of Australia experience climatic conditions most likely to support year-round persistence of P. xylostella67,71, and therefore represent the gene pool. Four brassicaceous P. xylostella host plant types were sampled: canola crops, Brassica vegetable and forage crops, and wild brassicaceous species (Fig. 1). The wild species were wild radish, Raphanus raphanistrum, turnip weed, Rapistrum rugosum, sea rocket, Cakile maritima, Ward’s weed, Carrichtera annua, and mixed stands of sand rocket, Diplotaxis tenuifolia and wall rocket, D. muralis (Table 1). In both years, most sampling was temporally restricted to periods in autumn (March to April) and spring (September to October) to minimise potential for migration to affect genetic structure7. Spring sampling corresponds to population peaks of P. xylostella in crops while autumn sampling corresponds to population troughs at the end of the summer/autumn non-cropping period, when host plants and the insect are often locally rare. Several locations were sampled in both years to allow temporal comparisons. At each location, \(\geqslant\) 25 individuals were collected from multiple plants across a representative area to minimise sampling of related individuals, either using a sweep net in canola crops, Brassica forage crops and wild host plants, by hand in Brassica vegetable crops, or by beating plants over a collection tray for sea rocket. To eliminate parasitised individuals, each population was reared separately in a ventilated plastic container on leaves of the original host plant for 1–2 days and thereafter on cabbage leaves. Non-parasitised pupae or late-instar larvae were fresh frozen at − 80 °C.

DNA isolation and species identification

For each population, 16 individuals were sequenced where possible after removing parasitised individuals. To avoid biases due to sex-linked markers72, we visually determined the sex of individual pupae (but not larvae) by examining external genital morphology73 under a dissecting microscope, then male and female individuals were selected to achieve a balanced sex ratio within each population where possible. Genomic DNA was isolated by homogenising whole individuals using a TissueLyser II (Qiagen) followed by two phenol and one chloroform extractions according to Zraket et al.74. DNA was treated with RNase A, then precipitated and resuspended in TE buffer. To distinguish P. xylostella from P. australiana, species identification was performed using a PCR-RFLP assay35 and P. xylostella individuals were retained for analysis.

RAD-seq library preparation and sequencing

Libraries were prepared for restriction-site-associated DNA sequencing (RAD-seq) according to a protocol modified from Baird et al.41 as described in Perry et al.35. Genomic DNA was quantified using a Qubit 2.0 fluorometer (Invitrogen) and 200ng digested with 10 units of high fidelity Sbf1 in Cutsmart buffer (NEB) for 1 h at 37 °C, then heat inactivated at 80 °C for 20 min. One microlitre of P1 adapter (100nM) with a 6-base molecular identifier (MID) (top strand 5\(^{\prime }\)-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGxxxxxxTGCA-3\(^{\prime }\), bottom strand 5\(^{\prime }\)-[P]xxxxxxCTGTCTCTTATACACATCTGACGCTGCCGACGA-3\(^{\prime }\), x represents sites for MIDs) were then added using 0.5\(\upmu\)L T4 DNA ligase (Promega), 1nM ATP and Cutsmart buffer. Sixteen individuals with unique P1 adapters were pooled per library. To minimise sequencing biases or batch effects, individuals from each population were randomised across 2–4 (usually 4) libraries and each library was sequenced across 2–4 sequencing lanes. Library pools were sheared using a Bioruptor sonicator (Diagenode), ends repaired using a Quick Blunting Kit (NEB), adenine overhangs added then P2 adapters (top strand 5\(^{\prime }\)-[P]CTGTCTCTTATACACATCTCCAGAATAG-3\(^\prime\), bottom strand 5\(^\prime\)-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGT-3\(^{\prime }\)) ligated, then (majority of libraries) size-selected (300–700 bp) on agarose gel to remove primer dimer. DNA purification between steps was performed using a MinElute PCR purification kit (Qiagen). Library amplification was performed using KAPA HiFi Hotstart Readymix (Kapa Biosystems) and Nextera i7 and i5 indexed primers with PCR conditions as described in Perry et al.35: 95 °C for 3 min, two cycles of 98 °C for 20 s, 54 °C for 15 s, 72 °C for 1 min, then 15 cycles of 98 °C for 20 s, 65 °C for 15 s, 72 °C for 1 min followed by a final extension of 72 °C for 5 min. Libraries were size-selected (300–700 bp) on agarose gel and purified using a minElute Gel Extraction Kit (Qiagen). Illumina paired-end sequencing was performed across seven lanes using HiSeq2500 (100 bp) or NextSeq500 (75 bp) at the Australian Genome Research Facility (AGRF). Additionally, 16 individuals from a separate sequencing run as described in Perry et al.50 were included in downstream analysis.

Read filtering and variant calling

Sequence read quality was examined using FastQC75. As Nextseq reads had low quality base calls within restriction sites (a common problem when using fixed-length MIDs on this platform, which cause low sequence diversity and cluster signal in this region), we opted to remove restriction sites from all reads for downstream analysis. Sequence reads were de-multiplexed using RADtools version 1.2.442 allowing one base MID mismatch, then TRIMMOMATIC v0.3276 was used to remove restriction sites, adapter sequences, a thymine base from reverse reads introduced by the P2 adapter, and quality filter using the ILLUMINACLIP tool with parameters: TRAILING:10 SLIDINGWINDOW:4:15 MINLEN:40. Paired reads were aligned to the P. xylostella reference genome (accession number: GCF_000330985.1) using STAMPY version 1.0.2177 with –baq and –gatkcigarworkaround options and expected substitution rate set to 0.03 to reflect our expectations of sequence divergence from the reference strain. Duplicate reads were removed and individual sample BAM files merged using PICARD version 1.7178. Genotypes were jointly called for all individuals using the Genome Analysis Tool Kit version 3.3-079,80 HaplotypeCaller tool. We determined that base quality score recalibration using bootstrapped SNP databases was inappropriate for this dataset as it globally reduced quality scores. The variant call set was hard-filtered using VCFtools version 0.1.12a81. After iteratively testing multiple filtering parameter sets, we removed indels and retained confidently called bi-allelic SNPs (GQ \(\geqslant\) 30) genotyped in at least 80% of individuals with a minimum genotype depth of 5, minQ \(\geqslant\) 400, average site depth of 12–100, minimum minor allele frequency of 0.01 and in Hardy–Weinberg equilibrium at an alpha level of 0.05. To avoid closely-linked sites, we retained only SNPs separated by a minimum of 2,000 bp using the VCFtools—thin function. In order to estimate population-level genetic diversity, from the output of GATK HaplotypeCaller we generated a set of all confidently-called (GQ \(\geqslant\) 30) variant and invariant sites and hard filtered to remove sites within repetitive regions and retain sites genotyped in at least 80% of individuals with an average site depth of 12–100. The filtered VCFs were converted to other file formats for downstream analysis using PGDSpider version 2.1.1.282 and custom R scripts83.

Genetic diversity

The R package hierfstat84 was used to calculate within-population gene diversity (\(H_{\text {S}})\), observed heterozygosity (\(H_{\text {O}}\)) and the inbreeding coefficient (\(F_{\text{IS}}\)) according to Nei85. Population means for site depth and number of SNPs, indels and private sites were calculated using the –depth function and vcfstats module in VCFtools version 0.1.12a81.

Population differentiation

To examine population differentiation, a global estimate of \(F_{\text {ST}}\)86 with bootstrapped 99% confidence intervals (\(10^4\) bootstrap iterations) was calculated in R package diveRsity87. Pairwise \(F_{\text {ST}}\) values for all population pairs were calculated and significance of differentiation determined using exact G tests (\(10^4\) MCMC burnins, \(10^3\) batches, \(10^4\) iterations per batch) in GENEPOP v4.688 after correction for multiple comparisons using the Bonferroni–Holm correction method89,90. Isolation by distance among populations52 was investigated separately for 2014 and 2015 datasets. We used R91 to construct heat maps and visually inspected the congruence between pairwise matrices of untransformed geographic distances in kilometres and genetic distances, \(F_{\text {ST}}\), for corresponding population pairs. Significance of the regressions of pairwise linearized genetic distances92 onto log-transformed geographic distances was determined using a Mantel test with \(10^4\) permutations in R package ade4 version 1.7-693. Geographic distances were calculated using R package geosphere version 1.5-794. Analysis of molecular variance (AMOVA) was performed using the pegas implementation in R package poppr version 2.7.195. The data were analysed under two hierarchical model structures. In model A, all individuals were analysed together and populations were grouped into sampling years and Brassica host types. In Model B, a temporal analysis was performed for locations sampled in both 2014 and 2015, to investigate whether variance was greater among years within locations or vice versa.

Population structure

Two individual-based clustering approaches were used to investigate population structure. First, Bayesian clustering was implemented in the program STRUCTURE version 2.3.455. Variant data were converted from VCF to STRUCTURE file format using PDGSpider version 2.1.1.282. For all runs, we used a burnin length of \(5\times 10^5\) followed by a run length of 10\(^6\) MCMC iterations and performed fifteen independent runs for each K value, where K is the number of genotypic clusters, using a different random seed for each run, assuming the locprior model with correlated allele frequencies and \(\lambda\) set to 1. As preliminary runs showed that most structure was identified at low K values, we analysed K-values from 1 to 10 in both years. The optimal value of K was estimated using the delta K method96 implemented in STRUCTURE HARVESTER97 and inspection of the likelihood distribution for each model. Q-matrices were aligned using CLUMPP version 1.1.298 and visualised using DISTRUCT version 1.199. To further explore clustering, we performed individual-based principal components analysis (PCA) separately for 2014 and 2015 datasets using R package adegenet version 2.0.1100,101, using scaled and centred allele frequencies and imputing missing data by taking the mean of population allele frequencies.

Power analysis

The statistical power of the SNP marker set to detect population structure was assessed using POWSIM version 4.1102. This program allows the user to test the likelihood of loci of detecting genetic differentiation for pre-defined values of \(F_{\text {ST}}\). For the dataset, 1,000 simulations were performed over a range of \(F_{\text {ST}}\) values from 0.001 to 0.01 assuming an effective population size of 5,000. The number of subpopulations, sample sizes and allele frequencies from our data were used and the generations of drift varied to achieve the target \(F_{\text {ST}}\). As POWSIM currently handles a maximum of 30 populations, for the 2014 dataset the number of subpopulations was set to this value. The null hypothesis of genetic homogeneity was tested using Fisher’s exact test and a Chi-square test.

Accession codes

RAD sequences are available from the Sequence Read Archive under accession PRJNA471964.