Does Immigration Boost or Reduce Household Incomes?
Spatial Econometric Analysis of the Impact of Foreign-Born Populations on Median Household Income in the United States
1. Introduction
Immigration policy has been one of the most controversial topics in American society for years. Since 2022, there has been a shift towards supporting stricter immigration policies (Gallup, 2024). According to a Gallup poll in June 2024, 55% of Americans want immigration levels to decrease, the highest figure since 2001. Public support for restrictive measures, like expanding the border wall and deporting undocumented immigrants, has also risen. In fact, as the country with the largest immigrant population in the world, the United States has a significant proportion of foreign-born residents in every state. Figure 1 illustrates the percentage of foreign-born population by state in 2023 (Data source: U.S. Census Bureau).
Figure 1. Percentage of the Foreign-Born Population by State (2023)
Some researchers advocate more inclusive immigration policies because they recognize immigrants' economic contributions (Konard et al., 2020). Others favor stricter immigration policies due to concerns about job competition (Albert, 2021), consumption of public resources, and potential crime. However, I noticed that most studies on the relationship between immigration and economics focus on non-spatial analysis. Therefore, I want to do a spatial analysis to uncover the correlation between the immigrant population and household income.
My research aims to explore whether the proportion of the foreign-born population in each state influences household income. Does this impact exhibit spatial dependence and spatial spillover effects? By addressing these questions, I aim to provide data-driven evidence for policymakers to evaluate the economic effects of immigration policies. My study area includes all U.S. states and the District of Columbia, except Hawaii and Alaska.
2. Background
Several studies have examined the impact of immigration on various economic factors in the United States using spatial econometric methods. For instance, Mussa et al. (2017) analyzed how immigration affects the U.S. housing market, employing a Spatial Durbin Model to capture both the direct and indirect effects of immigration inflows on rents and house prices. Their findings indicate that increased immigration leads to higher rents and house prices in specific metropolitan areas, as well as spillover effects in neighboring regions.
In addition to the proportion of immigrants, other factors such as educational attainment, the proportion of the elderly population, the proportion of the rural population, the unemployment rate, and the crime rate are also correlated with median household income. For instance, in 2023, individuals with a Bachelor's Degree or higher had a median household income of $126,800, compared to $70,470 for high school graduates and $55,810 for those with less than high school education (U.S. Census Bureau 2023). The proportion of the population aged 65 and older may also impact regional median household income. In 2023, households headed by individuals aged 65 to 74 had a median income of $61,830, lower than households headed by those aged 55 to 64 ($90,640) and those aged 45 to 54 ($110,700) (U.S. Census Bureau 2023). The unemployment rate is another factor. In most cases, unemployed individuals tend to be poorer than employed individuals, the latter often coming from wealthier backgrounds and having higher levels of education (Martinez et al., 2001). Regarding rural versus urban areas, rural communities in the United States had a 2.4 percent higher poverty rate compared to urban communities (USDA, 2017). Additionally, areas with high crime rates often face greater social security challenges, which are closely related to income.
3. Data
I selected the median household income across U.S. states as the dependent variable because it represents the typical family income. The percentage of the foreign-born population was the core explanatory variable. Control variables included the percentage of the population 25 years and over with a Bachelor's degree or higher, the unemployment rate, the percentage of the population 65 years and over, the rural population percentage, and the crime rate. The data were sourced from authoritative U.S. government agencies, including the U.S. Census Bureau, Bureau of Labor Statistics, and FBI. Data covered all U.S. states in 2023, ensuring consistency in both temporal and geographical scopes.
Figure2. Median Household Income in 2023 by State
4. Method
Figure 3 shows my workflow in this research. After collecting the data, I first performed data cleaning, detected and verified outliers, and then merged the data. After completing data preparation, I conducted exploratory data analysis (EDA) to examine the distributional characteristics and relationships among variables. Additionally, I conducted Exploratory Spatial Data Analysis (ESDA) to reveal the spatial properties and potential correlations of the variables.
Figure 3. Work Flow of This Study
To establish a baseline model, I employed Ordinary Least Squares (OLS) regression to analyze the linear relationships between variables. To ensure the reliability of the selected variables across all regression models, I performed a Variance Inflation Factor (VIF) analysis on the explanatory variables used in the baseline OLS model. The results show that all variables have VIF values below the commonly accepted threshold of 5, indicating no significant multicollinearity. Since the same set of explanatory variables is used in subsequent spatial regression models (SLM, SEM, SARAR, and SDM), these results confirm that multicollinearity does not affect the reliability of the findings in any model. I also used stepwise regression combined with AIC and BIC criteria to select the optimal combination of variables that best explain the data. I used the Residuals vs. Fitted plot to examine whether OLS residuals meet the assumptions of having a mean of zero and homoscedasticity. Additionally, I utilized the Q-Q plot to check if the residuals conform to the assumption of normality. Subsequently, I conducted Moran's I test on the residuals of the optimized OLS model to verify spatial autocorrelation. The results indicated significant spatial dependence, demonstrating the necessity of applying spatial models. I further used the Lagrange Multiplier (LM) test to determine the appropriate spatial model, such as the Spatial Lag Model (SLM) or the Spatial Error Model (SEM). The results of the Lagrange Multiplier (LM) test indicate that the data may exhibit both spatial lag effects of the dependent variable and spatial correlation in the error term. Therefore, I employed the Spatial Autoregressive with Additional Autoregressive Error Structure (SARAR) model to verify the presence of significant unmodeled spatial correlation (λ) and spatial lag effects of the dependent variable (ρ). To explore whether the proportion of immigrants has spillover effects on household income in neighboring regions, I employed the Spatial Durbin Model (SDM).
All spatial models were constructed based on the Inverse Distance Weights Matrix, with the distance threshold adjusted to minimize the p-value of Moran's I, ensuring the most significant spatial correlation within the given data structure.
Finally, I conducted a comprehensive comparison and results analysis across all models. By evaluating Adjusted R², AIC, BIC, Log-Likelihood, and Moran's I statistics of residuals, I assessed the performance and spatial characteristics of each model. Based on the model results, I provided an in-depth discussion on how foreign-born population percentage and other key variables affect median household income, offering robust empirical evidence to inform policy-making.
Spatial Weight Matrix
Figure 4. Spatial Weights Matrix Visualization Map
As shown in Figure 4, I constructed an inverse distance spatial weights matrix in this study. Compared to adjacency-based matrices, the inverse distance matrix allows weights to decrease with increasing distance, capturing spillover effects between adjacent and non-adjacent states and providing a more realistic reflection of the spatial diffusion patterns of immigration's impact. The distance threshold was determined as 11.11, which can minimize the p-value of Moran's I, indicating that this threshold best reflects spatial autocorrelation under the current data structure.
5. Results and Discussion
5.1 Exploratory Data Analysis(EDA)
Figure 5. Histograms of the variables used in this study
Figure 5 illustrates the distribution of variables: Median household income follows an approximately normal distribution, primarily ranging from $70,000 to $90,000. The proportion of the foreign-born population is generally low and right-skewed, with a few states exceeding 25%. The percentage of individuals with a bachelor's degree or higher displays a bell-shaped distribution concentrated between 30% and 50%. The proportion of the population aged 65 and older is mostly between 16% and 22%. Unemployment rates are narrowly distributed, ranging from 2% to 5%. The rural population percentage is right-skewed, with most states falling between 10% and 50%. Crime rates are also right-skewed, with extreme values as outliers in a few states. After experimentation and careful consideration, I opted not to apply a log transformation, as using raw values directly enables a clearer interpretation of the results from a policy perspective.
Figure 6. Scatter Plots of Median Household Income Against Key Explanatory Variables
Scatterplots in Figure 6 illustrate the relationships between median household income and various explanatory variables. The proportion of the population with a bachelor's degree or higher shows a strong positive correlation with income, indicating that education potentially has the greatest impact. The proportion of foreign-born residents also exhibits a slight positive correlation. Rural population percentage shows a negative correlation, suggesting that urbanization level may influence income. Relationships with the aging population, unemployment rate, and crime rate appear weaker or more dispersed, requiring further analysis to determine their effects.
5.2 Exploratory Spatial Data Analysis (ESDA)
Figure 7 illustrates the spatial distribution of median household income and explanatory variables. Median household income is notably higher in the Northeast and coastal regions of the West, while it is lower in the South and Midwest. The proportion of the foreign-born population is significantly higher in coastal states, particularly in California and New York, indicating that immigrants tend to concentrate in economically developed regions. The percentage of individuals with a bachelor's degree or higher is highest in the Northeast and the West, aligning with these regions' higher median household income levels. The proportion of the population aged 65 and over is notably higher in the Southeast, reflecting a greater degree of aging in these areas. Unemployment rates are higher in parts of the West, such as Nevada, and lower in the Midwest. Rural population percentages are higher in the Midwest and South, corresponding to lower income levels in these areas. Crime rates are elevated in parts of the South and West, such as Louisiana and New Mexico, potentially impacting the socioeconomic conditions of these states. These maps visually highlight the spatial heterogeneity among variables, providing a foundation for further spatial econometric analysis.
Figure 7. Spatial Distributions of Key Explanatory Variables
The Global Moran's I test result shows that the statistic is 0.279677457, indicating a moderate level of positive spatial autocorrelation for median household income. The P-value is 3.047e-10, highly significant, allowing us to confidently reject the null hypothesis and confirm the presence of spatial autocorrelation.
Figure 8. Moran scatterplot of Median Household Income
The Moran scatterplot in Figure 8 reveals that the first quadrant (upper right) and the third quadrant (lower left) dominate, indicating significant clustering of high values and low values, respectively. Points in the second quadrant (upper left) and the fourth quadrant (lower right) are relatively sparse, suggesting limited spatial heterogeneity. This further confirms the spatial clustering effect of median household income. Therefore, a spatial lag model (SLM) or a spatial error model (SEM) is suitable for explaining the spatial structure of median household income.
5.3 Models Results
5.3.1 Optimized OLS Model
Figure 9. Result of the Optimized OLS Model
In Figure 9, after stepwise regression, the optimized OLS model excluded "unemployment rate" and "rural population percentage" as they were found to be insignificant variables. The results indicate that the proportion of the foreign-born population and the percentage of individuals with a bachelor's degree or higher are significantly positively correlated with median household income, while the proportion of the population aged 65 and over and the crime rate are significantly negatively correlated. The Adjusted R² of the optimized OLS model is 0.876, slightly higher than the original OLS model's 0.8759. Additionally, the AIC decreased from 844.54 in the original OLS to 839.02 in the optimized OLS model, indicating that the optimized OLS model achieves simplification while maintaining explanatory power.
Figure 10. Residuals vs Fitted Plot and Q-Q Plot of optimized OLS model
Figure 10 shows that the distribution of residuals versus fitted values from the optimized OLS model is not completely random, indicating a certain degree of heteroscedasticity. The Q-Q plot suggests that the model residuals may not fully satisfy the normality assumption.
From Moran's I test of residuals of the optimized OLS model, the Moran's I statistic is 0.2143. The P-Value is 6.424e-7, highly significant, indicating spatial autocorrelation in the residuals of the OLS model. This suggests that the OLS model has not fully captured the spatial structure of the data, necessitating the use of spatial regression models. An LM test is needed to help determine the appropriate spatial regression model.
Figure 11. Result of Lagrange Multiplier (LM) test
The LM test results in Figure 11 indicate significant spatial dependence in the data. The RSErr test (p = 0.0001612) suggests spatial error dependence, while the RSLag test (p = 0.0006747) indicates spatial lag dependence. The SARMA test (p = 9.92e-05) confirms the presence of both types of spatial dependence, suggesting that a SARAR model may be the most appropriate to account for both spatial lag and spatial error effects.
5.3.2 Spatial Error Model
The results of the Spatial Error Model (SEM) indicate that the proportion of the foreign-born population (471.971) and the percentage of individuals with a bachelor's degree or higher (1378.143) have a significant positive impact on median household income. In contrast, the proportion of the population aged 65 years and over (-819.98) and the crime rate (-16.56) exhibit significant negative effects. The λ value of 0.66858 is significant (p = 0.0035), indicating strong spatial dependence in the error terms and suggesting the presence of omitted spatially correlated variables. The model's AIC of 964.1 and significant Wald test (p < 0.001) demonstrate its strong performance in capturing spatial effects.
The Moran's I test for the residuals of the SEM shows a Moran's I statistic of 0.0116 with a p-value of 0.2523, indicating no significant spatial autocorrelation in the residuals. This result confirms that the SEM effectively accounts for the spatial dependence present in the data.
5.3.3 Spatial Lag Model
The SLM results show that the percentage of the foreign-born population (468.68) and the percentage of individuals with a bachelor's degree or higher (1262.23) have significant positive impacts on median household income. Conversely, the percentage of the population aged 65 years and over (-1268.12) and the crime rate (-15.75) exhibit significant negative effects. The spatial lag coefficient (ρ = 0.35464, p < 0.01) is positive and significant, indicating a substantial positive spatial lag effect, where higher median household incomes in neighboring states contribute to increases in a state's median household income. The model achieves a log-likelihood of -474.73 and an AIC of 963.46, further affirming its explanatory power while addressing spatial dependence.
The Moran's I test for the SLM residuals yields a statistic of 0.0525 with a p-value of 0.06539, indicating no significant spatial autocorrelation in the residuals. This suggests that the SLM successfully captures the spatial dependence in the data.
5.3.4 SARAR Model
The SARAR model results indicate that the proportion of the foreign-born population (479.304) and the percentage of individuals with a bachelor's degree or higher (1322.620) are significantly and positively associated with median household income. In contrast, the percentage of the population aged 65 years and over (-1051.749) and the crime rate (-15.997) are significantly and negatively associated with median household income. The spatial parameters, rho (ρ = 0.24966, p = 0.12199) and lambda (λ = 0.40731, p = 0.20258), are not statistically significant, suggesting that neither spatial lag nor spatial error strongly influences the model.
The Moran's I test for the residuals of the SARAR model shows the Moran's I statistic is close to zero (-0.00006), with a p-value of 0.3344, which is not statistically significant. This indicates that the SARAR model has effectively captured the spatial dependency in the data.
5.3.5 Spatial Durbin Model
In the results of the Spatial Durbin Model (SDM), the direct effects reveal that the percentage of the foreign-born population and the percentage of the population with a bachelor's degree or higher positively and significantly influence the median household income, while the percentage of the population aged 65 years and over and the crime rate have significant negative impacts. However, the lagged effects of all variables are not statistically significant, indicating that spillover effects are not evident in this model. The spatial lag coefficient (ρ) is also insignificant (p-value = 0.3198), suggesting limited spatial dependence in the dependent variable.
Based on the Moran's I test for SDM residuals, the statistic is -0.0476, with a p-value of 0.7094. This indicates that the residuals of the SDM model do not exhibit significant spatial autocorrelation. Therefore, the SDM effectively captures the spatial structure in the data and resolves spatial dependence issues in the residuals.
5.3.6 Comparison of Model Results
Based on the results in Table 2, we can observe that all models consistently show a significant positive correlation between the proportion of foreign-born population and median household income. This indicates that immigration has a positive economic contribution.
To evaluate the impact of foreign-born population percentages on median household income in the United States, different spatial and non-spatial models were compared. The OLS model, while straightforward with a strong Adjusted R² (0.876), failed to address spatial autocorrelation (Moran's I = 0.2143, p < 0.001), indicating its inability to fully capture the spatial structure of the data. The SLM and SEM models addressed spatial dependence effectively, with SLM accounting for spatial lag ( ρ = 0.3546, p < 0.01) and SEM capturing spatial error (λ= 0.6686, p < 0.01), resulting in improved AIC values (963.46 and 964.1, respectively) over OLS. The SARAR model, while incorporating both spatial lag and spatial error terms, did not yield significant estimates for these parameters and added unnecessary complexity. The SDM model, though the most comprehensive, introduced lag variables that were not statistically significant and slightly increased the AIC (965.85), suggesting overfitting. Considering the research focus on the economic spillover effects of immigration, the SLM and SEM are the most appropriate models as they adequately capture spatial dependencies without unnecessary complexity, providing robust results for policy implications.
6. Conclusion
This study explores the impact of the foreign-born population percentage on median household income in the United States using spatial econometric models. The results of SLM show that each 1% increase in the foreign-born population percentage is associated with an increase of approximately $468.68 in median household income. Furthermore, a higher percentage of residents with a bachelor's degree or higher also contributes significantly to higher median household income, while aging populations and higher crime rates have significant negative effects. Among the spatial models, the Spatial Lag Model (SLM) is the most appropriate, revealing spatial dependency in income levels across states. However, the Spatial Durbin Model (SDM) does not detect significant spillover effects, indicating that the influence of foreign-born population is primarily localized within states rather than extending to neighboring regions.
This study is limited by its reliance on 2023 data, which captures only a snapshot in time and may miss dynamic trends or causal relationships; future research could use panel data to explore temporal effects. Additionally, potential omitted variable bias remains due to unobserved factors like regional policies or cultural differences, suggesting the need for more comprehensive models or additional variables. Lastly, the use of an inverse distance weight matrix assumes spatial interactions are purely geographic, and alternative matrices based on economic or social connections could enhance the robustness of future analyses.
In conclusion, the results of this study highlight the role of immigrants in boosting income. Policymakers should maintain liberal immigration policies rather than impose restrictions.