# Raw-Data – HIV-1 Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection This archive contains the data used in the manuscript "HIV-1 Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection". This archive can be used in conjunction with the code repository to reproduce the data analysis. ## Abstract Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, `virolution`, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and purifying selection that is concordant in both compartments. We found that, under the selection regime, migration rates are significantly overestimated with a stochastic mixture model and a structured coalescent model in the Bayesian inference framework BEAST2. Our results reveal that commonly used phylogeographic methods, which assume neutral evolution, can significantly bias migration rate estimates in selective regimes. This study underscores the need for assessing the robustness of phylodynamic analysis with respect to more realistic selection regimes. ## Data The raw data used in the manuscript constitutes to about 600 GB of uncompressed log files from the BEAST2 analysis and simulated sequences. Additionally, there are several already processed datasets that are stored as `csv` and can be loaded with common data analysis libraries or tools. ## Overview The archive contains: - `archive_250731.tar.gz`: Raw data - `out/cache/.csv`: Processed data - `README.md`: Description ## Usage Use in conjunction with the code at [10.5281/zenodo.16744272](https://doi.org/10.5281/zenodo.16744272).