Selection
We tested for both local and species-wide genomic signals of selection associated with the recent range expansion in C. anna . We looked for potential genomic regions under selection in the expanded range using an FST outlier approach. FSToutliers are a common metric for identifying selection. Peaks of significantly different allele frequencies between populations at close loci are often an indication of potential selection . In this case, we compared the northern (WAS) and eastern (EAS) expansion regions to their nearest native range regions, Central California (BAY) and Southern California (PAC), respectively. We used the pFst tool in VCFlib (https://github.com/vcflib/vcflib) after creating a BCF file using ANGSD (-dobcf) and converting it to a VCF file with BCFtools accessed through Samtools. The pFst tool uses a likelihood ratio test to detect allele frequency differences between populations.
While the expectation for the magnitude and direction of gene flow is unknown in C. anna, largely due to enigmatic movement patterns, a previous study suggested high gene flow between three California populations . Another California hummingbird, Allen’s Hummingbird (S. sasin ), was found to have high geneflow among the mainland populations, potentially indicative of high overall levels of mobility in hummingbirds. If gene flow in C. anna is extremely high, we might expect signatures of selection caused by exposure to novel selective agents during range expansion to be present across the entire species rather than divergent between populations. We therefore used all samples to test for the presence of recent selective sweeps using SweeD v. 3.2.1 . We first estimated minor allele frequencies at polymorphic sites using ANGSD (for parameter details see Table S1). We converted these into the required allele count input for SweeD by multiplying the minor allele frequency by the number of individuals sequenced for each site and rounding to the nearest integer. All sites were considered folded. We ran SweeD separately for each chromosome, with a grid equal to the length of the chromosome divided by 5000 (so that we tested every 5kb).