Socio-Economic Development Across Turkish Provinces

Interactive Dashboard with Regression, Cluster, and Convergence Analysis

Overview

Türkiye

Provinces

85.6M

Total Population

₺349,990

Avg GDP per Capita

97.2%

Avg Literacy Rate

8.4%

Avg Unemployment

Regional Comparison

Key Insights: Regional Disparities

Western provinces dominate GDP per capita, reflecting heavy industrial concentration in Marmara and Aegean regions. The East-West divide is evident: western and southern coastal regions show 2-3x higher GDP per capita than eastern and southeastern provinces. This pattern reflects:

Industrial concentration: Manufacturing clusters in Istanbul, Kocaeli, and İzmir
Service sector dominance: Finance, trade, and tourism in western metropolises
Agricultural dependence: Eastern provinces rely more on lower-productivity agriculture
Infrastructure quality: Better transportation and utilities in the west

Istanbul and Ankara are clear outliers, far exceeding other provinces due to their roles as economic and administrative capitals. Mid-tier industrial cities (Bursa, Izmir, Antalya) form a second tier, while rural and eastern provinces lag significantly behind.

Top & Bottom Provinces

Top 10 Provinces by GDP

Bottom 10 Provinces by GDP

Turkey Map - GDP per Capita

Regression Analysis

This section presents the results of an OLS regression analysis to identify factors that explain differences in GDP per capita across Turkish provinces.

Model Specification

The regression model estimates the determinants of GDP per capita across Turkish provinces:

\[ \begin{aligned} \text{GDP per capita}_i = \alpha &+ \beta_1 \text{Literacy}_i + \beta_2 \text{Unemployment}_i \\ &+ \beta_3 \text{Labor Force Participation}_i + \sum_{r} \gamma_r \text{Region}_r + \epsilon_i \end{aligned} \]

Model Justification

Why these regressors?

Human capital (literacy): Proxy for education and skills
Labor market (unemployment, participation): Reflect employment efficiency and labor supply
Agglomeration (population density): Captures urbanization economies
Geography (regional fixed effects): Controls for location-specific factors

Limitations and caveats:

Omitted variables: Industry composition, foreign investment, natural resources, trade openness not included due to data constraints
Endogeneity concerns: Education and infrastructure may be correlated with unobserved productivity shocks; causality cannot be firmly established from cross-sectional data
Measurement error: GDP estimates at provincial level may not capture informal economy
Spatial correlation: Neighboring provinces likely have correlated errors (not addressed in OLS)

Despite these limitations, the model provides valuable insights into correlates of provincial development.

Regression Coefficients

Significance codes: *** p < 0.001, ** p < 0.01, * p < 0.05, . p < 0.1

Economic Interpretation of Coefficients

What do these numbers mean in real terms?

Concrete Examples: The East-West Divide

Literacy Rate: Coefficient = 47527.95
- A 10 percentage point increase in literacy is associated with ₺475,280 higher GDP per capita
- Example: If Hakkari (literacy 96.5%) matched Kocaeli (literacy 98.5%), all else equal, GDP would increase by ~₺91,729
Unemployment Rate: Coefficient = -1108.05
- A 1 percentage point increase in unemployment correlates with ₺1,108 lower GDP per capita
- This reflects labor market inefficiency and lost productive capacity

Real Provincial Comparison (East vs. West):

Province	GDP/Capita	Literacy	Unemployment	Classification
İstanbul	₺802,669	98.5%	8.8%	Western metropolis
Kocaeli	₺788,873	98.5%	8.7%	Industrial hub
Hakkari	₺304,752	96.5%	18.3%	Eastern rural

The 2.6x GDP gap between Kocaeli and Hakkari reflects differences in: - Human capital: 1.9 percentage point literacy gap - Industrial structure: Kocaeli’s manufacturing vs. Hakkari’s agriculture - Infrastructure: Better transportation, internet, healthcare in the west

Statistical significance: Coefficients with *** or ** are highly reliable; those with * or . are marginally significant; those without stars are not statistically distinguishable from zero.

Model Fit Statistics

0.391

R²

0.314

Adjusted R²

5.08

F-statistic

2.73e-05

p-value

Reading the Coefficient Plot

How to interpret this visualization:

Point estimate: The dot shows the coefficient value (effect size)
Confidence intervals: Horizontal bars show 95% confidence intervals
- If the bar crosses zero → effect is not statistically significant
- If the bar doesn’t cross zero → effect is statistically significant
Distance from zero: Larger absolute values → stronger effect on GDP per capita
Direction: Right of zero → positive effect; Left of zero → negative effect

What to look for:

Education variables should be far right (positive, significant)
Unemployment should be far left (negative, significant)
Regional dummies: Western regions should show positive coefficients (higher GDP), eastern regions negative (lower GDP)
Infrastructure (internet, healthcare) should be positive

Variables whose confidence intervals cross zero have statistically insignificant effects—we cannot distinguish them from “no effect.”

Multicollinearity Check (VIF)

What is Multicollinearity?

Multicollinearity occurs when independent variables are highly correlated with each other, making it difficult to isolate individual effects. This inflates standard errors and makes coefficients unstable.

Variance Inflation Factor (VIF) measures how much the variance of a coefficient is inflated due to correlation with other predictors:

VIF < 5: ✅ No multicollinearity concern
5 ≤ VIF < 10: ⚠️ Moderate multicollinearity (monitor)
VIF ≥ 10: ❌ Serious multicollinearity (consider removing variable)

Rule of thumb: If two variables have VIF > 10, consider dropping one or combining them.

Interpreting VIF Results

✅ No multicollinearity concerns: All VIF values are below 5. The independent variables are sufficiently uncorrelated, and coefficient estimates are reliable.

Correlation Matrix

Key Correlations to Note

Strong positive correlations (> 0.7): - Variables that move together (e.g., GDP and literacy often correlate)

Strong negative correlations (< -0.7): - Variables that move in opposite directions (e.g., unemployment and GDP)

Implications: - High correlation between predictors can cause multicollinearity - However, correlation with the dependent variable (GDP) is desirable

Residual Analysis

Cluster Analysis

K-Means Clustering

Provinces are classified into 4 clusters based on the following socio-economic indicators:

GDP per capita
Literacy rate
Unemployment rate
Labor force participation

Clustering Methodology Justification

Why k-means?

Simplicity and interpretability: Creates clear, non-overlapping groups of provinces
Computational efficiency: Fast convergence on medium-sized datasets
Assumption: Clusters are roughly spherical and of similar variance (reasonable for socio-economic data after standardization)

Distance metric: Euclidean distance on standardized variables (z-scores). Standardization ensures variables with different scales (e.g., GDP in thousands vs. percentages) contribute equally to cluster formation.

Number of clusters (k=4): Chosen using the elbow method (see plot below). k=4 represents a balance between parsimony and capturing heterogeneity: - Fewer clusters (k=2,3): Oversimplify provincial diversity - More clusters (k>5): Marginal gains in within-cluster homogeneity; harder to interpret

Algorithm parameters: - nstart = 25: Run 25 random initializations and keep best solution (mitigates local optima problem) - seed = 42: For reproducibility

The k-means algorithm partitions provinces into clusters that minimize within-cluster variance while maximizing between-cluster variance.

Optimal Number of Clusters

Cluster Characteristics

What Do These Clusters Mean?

Based on the cluster profiles above, we can interpret each group:

Cluster 1 - “Highly Developed Metropolises” (Expected: Istanbul, Ankara, İzmir, Kocaeli) - Highest GDP per capita and education levels - Dominated by industrialized western provinces and major urban centers - Strong service sectors, manufacturing, and knowledge economy - Best infrastructure (internet, healthcare) - Economic engines of Turkey

Cluster 2 - “Mid-Tier Industrial/Tourism Provinces” (Expected: Antalya, Bursa, Eskişehir, Muğla) - Above-average GDP but below metropolises - Mix of secondary industrial cities and coastal tourism hubs - Relatively high literacy and internet access - Moderate unemployment, solid infrastructure

Cluster 3 - “Transitional/Rural Provinces” (Expected: Central Anatolia, some Black Sea) - Below-average GDP per capita - Lower education levels, particularly university graduates - More reliant on agriculture and small-scale industry - Higher unemployment, weaker infrastructure - Catching up slowly but face structural challenges

Cluster 4 - “Underdeveloped Eastern Provinces” (Expected: Southeastern and Eastern Anatolia) - Lowest GDP per capita and human development indicators - Significantly lower literacy and education - Limited infrastructure (internet, healthcare) - High unemployment and informality - Structural poverty and underdevelopment

These clusters align closely with Turkey’s geographic and historical development patterns: western industrialization vs. eastern agricultural tradition.

Provinces by Cluster

Cluster Scatter Plot

Cluster Map: Geographic Distribution

Geographic Coherence Assessment

Do clusters map to geographic regions?

Cluster 1: ⚠️ Moderately coherent - Spans 2 geographic regions - Dominant region: Southeastern (54.5% of cluster) - Regions: Eastern, Southeastern

Cluster 2: ❌ Geographically fragmented - Spans 5 geographic regions - Dominant region: Central (23.5% of cluster) - Regions: Mediterranean, Southeastern, Eastern, Black Sea, Central

Cluster 3: ❌ Geographically fragmented - Spans 6 geographic regions - Dominant region: Black Sea (39.3% of cluster) - Regions: Aegean, Black Sea, Mediterranean, Marmara, Central, Eastern

Cluster 4: ❌ Geographically fragmented - Spans 5 geographic regions - Dominant region: Marmara (40% of cluster) - Regions: Central, Mediterranean, Black Sea, Aegean, Marmara

Critical Analysis:

❌ Weak geographic clustering (avg 39.3% coherence): The map looks ‘patchy’—clusters are geographically fragmented. This could indicate: - Poor variable selection: Variables may not capture regional differences - Improper scaling: Variables need standardization - Too many/few clusters: k=4 may not be optimal

Recommendation: Re-examine variable choice or add spatial constraints to clustering.

Multi-dimensional View

Convergence Analysis

What is Beta Convergence?

Beta (β) convergence tests whether initially poorer regions grow faster than richer ones, leading to income equalization over time.

Conceptual Framework

We test absolute β-convergence by estimating:

\[\text{Average Annual Growth Rate}_{i} = \alpha + \beta \cdot \ln(\text{Initial GDP}_{i,2010}) + \epsilon_i\]

Economic intuition:

If β < 0 (negative): Poorer provinces grow faster → Convergence ✅
- Mechanism: Diminishing returns to capital (Solow model)
- Poor regions have lower capital-labor ratios → higher marginal returns → faster growth
- Technology diffusion from rich to poor regions
If β > 0 (positive): Richer provinces grow faster → Divergence ❌
- Mechanism: Agglomeration economies, increasing returns, poverty traps
- Rich regions attract more investment and talent
If β ≈ 0: No convergence or divergence → Persistent inequality

Are Turkish provinces catching up? The sign and significance of β will tell us.

Results

❌ No Convergence Found

Beta coefficient: 0.0014

The regression coefficient is not significantly negative, suggesting provinces are not converging.

Convergence Plot

Outlier Analysis: Provinces with Unusual Growth Patterns

✅ No significant outliers detected: All provinces follow the convergence trend relatively closely.

Interpreting the Beta Convergence Results

What does the plot show?

Negative slope: Provinces with lower initial GDP (left side) tend to have higher growth rates (upper part of plot), and vice versa. This is visual evidence of β-convergence.
Statistical significance: The shaded confidence interval shows the uncertainty around the regression line. If the interval excludes zero throughout, convergence is statistically robust.

Economic story:

❌ No convergence detected: The coefficient β = 0.0014 is positive or zero. Richer provinces are growing as fast or faster than poorer ones. This suggests:

Divergence or persistent inequality
Agglomeration economies may dominate diminishing returns
Poor regions face structural barriers (geography, institutions, human capital) preventing catch-up

What is Sigma Convergence?

Sigma (σ) convergence tests whether the dispersion (inequality) of income across regions is decreasing over time.

Conceptual Framework

We measure σ-convergence by tracking dispersion metrics over time:

Coefficient of Variation (CV): \(CV_t = \frac{\sigma_t}{\mu_t}\) (standard deviation / mean)
Standard Deviation of Log GDP: \(\sigma(\ln GDP_t)\)
Max/Min Ratio: Ratio of richest to poorest province

Economic intuition:

If dispersion is decreasing over time → σ-convergence ✅
- Income inequality across provinces is shrinking
- The gap between rich and poor regions is closing
If dispersion is increasing or stable → No σ-convergence ❌
- Regional inequality is growing or persistent
- Poor provinces are not catching up to rich ones

Key distinction from β-convergence:
- β-convergence asks: “Do poor provinces grow faster?”
- σ-convergence asks: “Is the income gap actually shrinking?”

β-convergence is necessary but not sufficient for σ-convergence. Even if poor provinces grow faster, random shocks can prevent inequality from declining.

Are Turkish provinces becoming more equal? The trend in CV and log-SD will tell us.

Dispersion Over Time

Sigma Convergence Plot

Interpreting the Sigma Convergence Results

What does the plot show?

⚠️ Stable inequality: The coefficient of variation (CV) has remained relatively stable (changed by 2.2%). This means:

Regional income inequality is persistent
No clear trend toward convergence or divergence
Economic growth is proportional across regions, maintaining the status quo

Why might β and σ-convergence differ?
Even if poor provinces grow faster (β-convergence), shocks to individual provinces (natural disasters, policy changes, investment booms) can increase dispersion, preventing σ-convergence.

Evolution of GDP Distribution

Key Insights from GDP Distribution Evolution

What does this plot tell us?

Rightward shift: If the entire distribution moves right over time, all provinces are experiencing absolute growth in GDP per capita (the economy is growing).
Widening spread: If the distribution becomes more dispersed (flatter, wider), inequality is increasing (no σ-convergence or divergence).
Narrowing spread: If the distribution becomes more concentrated (taller, narrower), inequality is decreasing (σ-convergence).
Shape changes: If the distribution shifts from bimodal (two peaks) to unimodal (one peak), regional clusters are converging into a more homogeneous group.

What should we see if convergence is happening?
The distribution should become both rightward-shifted (growth) and more concentrated (narrower), with the left tail (poorest provinces) moving faster than the right tail (richest provinces).

Data

This section displays real-time data scraped from Wikipedia about Turkish provinces, demonstrating web scraping capabilities using the rvest package.

Data Source & Methodology

Source URLs: - Main data: Provinces of Turkey - Wikipedia

Scraping approach: - Extract HTML tables using rvest::read_html() and html_table() - Intelligent column detection (handles varying Wikipedia table structures) - Data cleaning: Remove citations, parse numbers from formatted text - Cached to avoid repeated requests during rendering

Variables extracted: - Province names and plate codes (1-81) - Population, area, and population density - Geographic regions - Provincial capitals - Establishment years (historical data)

Primary Data Sources

Source	Description	Variables
TÜİK	Turkish Statistical Institute (Official)	GDP per capita, Literacy rates
SGK	Social Security Institution	Unemployment rates, Labor force participation
Wikipedia	Crowd-sourced encyclopedia (scraped)	Population data

Web Scraping Implementation

This dashboard demonstrates automated data collection from Wikipedia using R’s rvest package:

Technical details:

# Core scraping workflow
library(rvest)
page <- read_html("https://tr.wikipedia.org/wiki/N%C3%BCfusuna_g%C3%B6re_T%C3%BCrkiye%27nin_illeri")
tables <- page %>% html_table(fill = TRUE)
data <- tables[[1]] %>% clean_and_process()

Features implemented: - Simple table extraction - Data cleaning (parse formatted numbers, standardize names) - Merge integration (combines scraped data with local CSVs)

Wikipedia URLs scraped: 1. Nüfusuna göre Türkiye’nin illeri - Population data

Last Updated

Analysis rendered: January 19, 2026 at 20:04 +03
Primary data vintage: 2024 (TÜİK/SGK)