Socio-Economic Development Across Turkish Provinces
Interactive Dashboard with Regression, Cluster, and Convergence Analysis
Overview
Türkiye
Regional Comparison
Western provinces dominate GDP per capita, reflecting heavy industrial concentration in Marmara and Aegean regions. The East-West divide is evident: western and southern coastal regions show 2-3x higher GDP per capita than eastern and southeastern provinces. This pattern reflects:
- Industrial concentration: Manufacturing clusters in Istanbul, Kocaeli, and İzmir
- Service sector dominance: Finance, trade, and tourism in western metropolises
- Agricultural dependence: Eastern provinces rely more on lower-productivity agriculture
- Infrastructure quality: Better transportation and utilities in the west
Istanbul and Ankara are clear outliers, far exceeding other provinces due to their roles as economic and administrative capitals. Mid-tier industrial cities (Bursa, Izmir, Antalya) form a second tier, while rural and eastern provinces lag significantly behind.
Top & Bottom Provinces
Top 10 Provinces by GDP
Bottom 10 Provinces by GDP
Turkey Map - GDP per Capita
Regression Analysis
This section presents the results of an OLS regression analysis to identify factors that explain differences in GDP per capita across Turkish provinces.
Model Specification
The regression model estimates the determinants of GDP per capita across Turkish provinces:
\[ \begin{aligned} \text{GDP per capita}_i = \alpha &+ \beta_1 \text{Literacy}_i + \beta_2 \text{Unemployment}_i \\ &+ \beta_3 \text{Labor Force Participation}_i + \sum_{r} \gamma_r \text{Region}_r + \epsilon_i \end{aligned} \]
Why these regressors?
- Human capital (literacy): Proxy for education and skills
- Labor market (unemployment, participation): Reflect employment efficiency and labor supply
- Agglomeration (population density): Captures urbanization economies
- Geography (regional fixed effects): Controls for location-specific factors
Limitations and caveats:
- Omitted variables: Industry composition, foreign investment, natural resources, trade openness not included due to data constraints
- Endogeneity concerns: Education and infrastructure may be correlated with unobserved productivity shocks; causality cannot be firmly established from cross-sectional data
- Measurement error: GDP estimates at provincial level may not capture informal economy
- Spatial correlation: Neighboring provinces likely have correlated errors (not addressed in OLS)
Despite these limitations, the model provides valuable insights into correlates of provincial development.
Regression Coefficients
Significance codes: *** p < 0.001, ** p < 0.01, * p < 0.05, . p < 0.1
What do these numbers mean in real terms?
Concrete Examples: The East-West Divide
- Literacy Rate: Coefficient = 47527.95
- A 10 percentage point increase in literacy is associated with ₺475,280 higher GDP per capita
- Example: If Hakkari (literacy 96.5%) matched Kocaeli (literacy 98.5%), all else equal, GDP would increase by ~₺91,729
- Unemployment Rate: Coefficient = -1108.05
- A 1 percentage point increase in unemployment correlates with ₺1,108 lower GDP per capita
- This reflects labor market inefficiency and lost productive capacity
Real Provincial Comparison (East vs. West):
| Province | GDP/Capita | Literacy | Unemployment | Classification |
|---|---|---|---|---|
| İstanbul | ₺802,669 | 98.5% | 8.8% | Western metropolis |
| Kocaeli | ₺788,873 | 98.5% | 8.7% | Industrial hub |
| Hakkari | ₺304,752 | 96.5% | 18.3% | Eastern rural |
The 2.6x GDP gap between Kocaeli and Hakkari reflects differences in: - Human capital: 1.9 percentage point literacy gap - Industrial structure: Kocaeli’s manufacturing vs. Hakkari’s agriculture - Infrastructure: Better transportation, internet, healthcare in the west
Statistical significance: Coefficients with *** or ** are highly reliable; those with * or . are marginally significant; those without stars are not statistically distinguishable from zero.
Model Fit Statistics
How to interpret this visualization:
- Point estimate: The dot shows the coefficient value (effect size)
- Confidence intervals: Horizontal bars show 95% confidence intervals
- If the bar crosses zero → effect is not statistically significant
- If the bar doesn’t cross zero → effect is statistically significant
- If the bar crosses zero → effect is not statistically significant
- Distance from zero: Larger absolute values → stronger effect on GDP per capita
- Direction: Right of zero → positive effect; Left of zero → negative effect
What to look for:
- Education variables should be far right (positive, significant)
- Unemployment should be far left (negative, significant)
- Regional dummies: Western regions should show positive coefficients (higher GDP), eastern regions negative (lower GDP)
- Infrastructure (internet, healthcare) should be positive
Variables whose confidence intervals cross zero have statistically insignificant effects—we cannot distinguish them from “no effect.”
Multicollinearity Check (VIF)
Multicollinearity occurs when independent variables are highly correlated with each other, making it difficult to isolate individual effects. This inflates standard errors and makes coefficients unstable.
Variance Inflation Factor (VIF) measures how much the variance of a coefficient is inflated due to correlation with other predictors:
- VIF < 5: ✅ No multicollinearity concern
- 5 ≤ VIF < 10: ⚠️ Moderate multicollinearity (monitor)
- VIF ≥ 10: ❌ Serious multicollinearity (consider removing variable)
Rule of thumb: If two variables have VIF > 10, consider dropping one or combining them.
✅ No multicollinearity concerns: All VIF values are below 5. The independent variables are sufficiently uncorrelated, and coefficient estimates are reliable.
Correlation Matrix
Strong positive correlations (> 0.7): - Variables that move together (e.g., GDP and literacy often correlate)
Strong negative correlations (< -0.7): - Variables that move in opposite directions (e.g., unemployment and GDP)
Implications: - High correlation between predictors can cause multicollinearity - However, correlation with the dependent variable (GDP) is desirable
Residual Analysis
Cluster Analysis
K-Means Clustering
Provinces are classified into 4 clusters based on the following socio-economic indicators:
- GDP per capita
- Literacy rate
- Unemployment rate
- Labor force participation
Why k-means?
- Simplicity and interpretability: Creates clear, non-overlapping groups of provinces
- Computational efficiency: Fast convergence on medium-sized datasets
- Assumption: Clusters are roughly spherical and of similar variance (reasonable for socio-economic data after standardization)
Distance metric: Euclidean distance on standardized variables (z-scores). Standardization ensures variables with different scales (e.g., GDP in thousands vs. percentages) contribute equally to cluster formation.
Number of clusters (k=4): Chosen using the elbow method (see plot below). k=4 represents a balance between parsimony and capturing heterogeneity: - Fewer clusters (k=2,3): Oversimplify provincial diversity - More clusters (k>5): Marginal gains in within-cluster homogeneity; harder to interpret
Algorithm parameters: - nstart = 25: Run 25 random initializations and keep best solution (mitigates local optima problem) - seed = 42: For reproducibility
The k-means algorithm partitions provinces into clusters that minimize within-cluster variance while maximizing between-cluster variance.
Optimal Number of Clusters
Cluster Characteristics
Based on the cluster profiles above, we can interpret each group:
Cluster 1 - “Highly Developed Metropolises” (Expected: Istanbul, Ankara, İzmir, Kocaeli) - Highest GDP per capita and education levels - Dominated by industrialized western provinces and major urban centers - Strong service sectors, manufacturing, and knowledge economy - Best infrastructure (internet, healthcare) - Economic engines of Turkey
Cluster 2 - “Mid-Tier Industrial/Tourism Provinces” (Expected: Antalya, Bursa, Eskişehir, Muğla) - Above-average GDP but below metropolises - Mix of secondary industrial cities and coastal tourism hubs - Relatively high literacy and internet access - Moderate unemployment, solid infrastructure
Cluster 3 - “Transitional/Rural Provinces” (Expected: Central Anatolia, some Black Sea) - Below-average GDP per capita - Lower education levels, particularly university graduates - More reliant on agriculture and small-scale industry - Higher unemployment, weaker infrastructure - Catching up slowly but face structural challenges
Cluster 4 - “Underdeveloped Eastern Provinces” (Expected: Southeastern and Eastern Anatolia) - Lowest GDP per capita and human development indicators - Significantly lower literacy and education - Limited infrastructure (internet, healthcare) - High unemployment and informality - Structural poverty and underdevelopment
These clusters align closely with Turkey’s geographic and historical development patterns: western industrialization vs. eastern agricultural tradition.
Provinces by Cluster
Cluster Scatter Plot
Cluster Map: Geographic Distribution
Do clusters map to geographic regions?
Cluster 1: ⚠️ Moderately coherent - Spans 2 geographic regions - Dominant region: Southeastern (54.5% of cluster) - Regions: Eastern, Southeastern
Cluster 2: ❌ Geographically fragmented - Spans 5 geographic regions - Dominant region: Central (23.5% of cluster) - Regions: Mediterranean, Southeastern, Eastern, Black Sea, Central
Cluster 3: ❌ Geographically fragmented - Spans 6 geographic regions - Dominant region: Black Sea (39.3% of cluster) - Regions: Aegean, Black Sea, Mediterranean, Marmara, Central, Eastern
Cluster 4: ❌ Geographically fragmented - Spans 5 geographic regions - Dominant region: Marmara (40% of cluster) - Regions: Central, Mediterranean, Black Sea, Aegean, Marmara
Critical Analysis:
❌ Weak geographic clustering (avg 39.3% coherence): The map looks ‘patchy’—clusters are geographically fragmented. This could indicate: - Poor variable selection: Variables may not capture regional differences - Improper scaling: Variables need standardization - Too many/few clusters: k=4 may not be optimal
Recommendation: Re-examine variable choice or add spatial constraints to clustering.
Multi-dimensional View
Convergence Analysis
What is Beta Convergence?
Beta (β) convergence tests whether initially poorer regions grow faster than richer ones, leading to income equalization over time.
We test absolute β-convergence by estimating:
\[\text{Average Annual Growth Rate}_{i} = \alpha + \beta \cdot \ln(\text{Initial GDP}_{i,2010}) + \epsilon_i\]
Economic intuition:
- If β < 0 (negative): Poorer provinces grow faster → Convergence ✅
- Mechanism: Diminishing returns to capital (Solow model)
- Poor regions have lower capital-labor ratios → higher marginal returns → faster growth
- Technology diffusion from rich to poor regions
- If β > 0 (positive): Richer provinces grow faster → Divergence ❌
- Mechanism: Agglomeration economies, increasing returns, poverty traps
- Rich regions attract more investment and talent
- If β ≈ 0: No convergence or divergence → Persistent inequality
Are Turkish provinces catching up? The sign and significance of β will tell us.
Results
❌ No Convergence Found
Beta coefficient: 0.0014
The regression coefficient is not significantly negative, suggesting provinces are not converging.
Convergence Plot
✅ No significant outliers detected: All provinces follow the convergence trend relatively closely.
What does the plot show?
Negative slope: Provinces with lower initial GDP (left side) tend to have higher growth rates (upper part of plot), and vice versa. This is visual evidence of β-convergence.
Statistical significance: The shaded confidence interval shows the uncertainty around the regression line. If the interval excludes zero throughout, convergence is statistically robust.
Economic story:
❌ No convergence detected: The coefficient β = 0.0014 is positive or zero. Richer provinces are growing as fast or faster than poorer ones. This suggests:
- Divergence or persistent inequality
- Agglomeration economies may dominate diminishing returns
- Poor regions face structural barriers (geography, institutions, human capital) preventing catch-up
What is Sigma Convergence?
Sigma (σ) convergence tests whether the dispersion (inequality) of income across regions is decreasing over time.
We measure σ-convergence by tracking dispersion metrics over time:
- Coefficient of Variation (CV): \(CV_t = \frac{\sigma_t}{\mu_t}\) (standard deviation / mean)
- Standard Deviation of Log GDP: \(\sigma(\ln GDP_t)\)
- Max/Min Ratio: Ratio of richest to poorest province
Economic intuition:
- If dispersion is decreasing over time → σ-convergence ✅
- Income inequality across provinces is shrinking
- The gap between rich and poor regions is closing
- If dispersion is increasing or stable → No σ-convergence ❌
- Regional inequality is growing or persistent
- Poor provinces are not catching up to rich ones
Key distinction from β-convergence:
- β-convergence asks: “Do poor provinces grow faster?”
- σ-convergence asks: “Is the income gap actually shrinking?”
β-convergence is necessary but not sufficient for σ-convergence. Even if poor provinces grow faster, random shocks can prevent inequality from declining.
Are Turkish provinces becoming more equal? The trend in CV and log-SD will tell us.
Dispersion Over Time
Sigma Convergence Plot
What does the plot show?
⚠️ Stable inequality: The coefficient of variation (CV) has remained relatively stable (changed by 2.2%). This means:
- Regional income inequality is persistent
- No clear trend toward convergence or divergence
- Economic growth is proportional across regions, maintaining the status quo
Why might β and σ-convergence differ?
Even if poor provinces grow faster (β-convergence), shocks to individual provinces (natural disasters, policy changes, investment booms) can increase dispersion, preventing σ-convergence.
Evolution of GDP Distribution
What does this plot tell us?
Rightward shift: If the entire distribution moves right over time, all provinces are experiencing absolute growth in GDP per capita (the economy is growing).
Widening spread: If the distribution becomes more dispersed (flatter, wider), inequality is increasing (no σ-convergence or divergence).
Narrowing spread: If the distribution becomes more concentrated (taller, narrower), inequality is decreasing (σ-convergence).
Shape changes: If the distribution shifts from bimodal (two peaks) to unimodal (one peak), regional clusters are converging into a more homogeneous group.
What should we see if convergence is happening?
The distribution should become both rightward-shifted (growth) and more concentrated (narrower), with the left tail (poorest provinces) moving faster than the right tail (richest provinces).
Data
This section displays real-time data scraped from Wikipedia about Turkish provinces, demonstrating web scraping capabilities using the rvest package.
Source URLs: - Main data: Provinces of Turkey - Wikipedia
Scraping approach: - Extract HTML tables using rvest::read_html() and html_table() - Intelligent column detection (handles varying Wikipedia table structures) - Data cleaning: Remove citations, parse numbers from formatted text - Cached to avoid repeated requests during rendering
Variables extracted: - Province names and plate codes (1-81) - Population, area, and population density - Geographic regions - Provincial capitals - Establishment years (historical data)
Primary Data Sources
| Source | Description | Variables |
|---|---|---|
| TÜİK | Turkish Statistical Institute (Official) | GDP per capita, Literacy rates |
| SGK | Social Security Institution | Unemployment rates, Labor force participation |
| Wikipedia | Crowd-sourced encyclopedia (scraped) | Population data |
Web Scraping Implementation
This dashboard demonstrates automated data collection from Wikipedia using R’s rvest package:
Technical details:
# Core scraping workflow
library(rvest)
page <- read_html("https://tr.wikipedia.org/wiki/N%C3%BCfusuna_g%C3%B6re_T%C3%BCrkiye%27nin_illeri")
tables <- page %>% html_table(fill = TRUE)
data <- tables[[1]] %>% clean_and_process()Features implemented: - Simple table extraction - Data cleaning (parse formatted numbers, standardize names) - Merge integration (combines scraped data with local CSVs)
Wikipedia URLs scraped: 1. Nüfusuna göre Türkiye’nin illeri - Population data
Last Updated
- Analysis rendered: January 19, 2026 at 20:04 +03
- Primary data vintage: 2024 (TÜİK/SGK)