Etymological Origins of First Names in France

A century of cultural transformation measured through large-scale AI classification (1900-2024)

We classified all 48,516 unique first names in the French civil registry into 20 etymological origin categories using an LLM (Claude Haiku 4.5). The chart below shows how the share of each origin evolved from 1920 to 2024, with Monte Carlo projections to 2050. Click any origin to toggle it. The shaded projection zone shows 50% and 90% confidence intervals from 10,000 simulated trajectories.

Embed this interactive chart on your website.

Data

The primary dataset is the Fichier des prenoms published by INSEE (2025 edition), covering all 87 million births registered in France from 1900 to 2024. The dataset contains 711,069 rows, each representing a unique combination of (name, gender, year) with counts rounded to the nearest multiple of 5 for privacy. After excluding the rare names placeholder, 48,516 unique names remain.

20 Etymological Categories

Each name is assigned to its earliest known etymological origin, not its contemporary cultural association. "Marie" is classified as Hebrew (from Miriam) despite being quintessentially French. "Louis" is Germanic (from Frankish Chlodwig) rather than French. Our categories describe the history of words, not the history of the cultures that use them today.

CategoryDescriptionExamples
Hebrew / BiblicalBiblical/Hebrew etymologyMarie, Jean, David, Michel, Anne
LatinLatin/Roman etymologyPierre, Paul, Maxime, Victor, Dominique
GermanicFrankish/Germanic rootsLouis, Henri, Richard, Charles, Francoise
GreekGreek etymologyPhilippe, Alexandre, Sophie, Catherine, Nicolas
ArabicArabic language etymologyMohamed, Fatima, Karim, Yasmine, Rayan
CelticBreton/Celtic/Gaelic rootsAlain, Brigitte, Arthur, Nolwenn, Tristan
Anglo-SaxonEnglish language namesKevin, Dylan, Brandon, Jennifer, Audrey
FrenchNative French formationsManon, Colette, Gaston, Garance
AfricanSub-Saharan African etymologyMamadou, Aminata, Ousmane, Fatoumata
NordicScandinavian/Norse rootsEric, Ingrid, Astrid, Oscar, Nils
SlavicSlavic language etymologyNadia, Ivan, Katia, Sacha, Mila
ItalianDistinctly Italian formsGiovanni, Salvatore, Concetta, Enzo, Giulia
SpanishDistinctly Spanish formsCarmen, Dolores, Pilar, Jade, Lola
BerberAmazigh/Berber rootsKenza, Massinissa, Idir, Jugurtha
TurkishTurkish etymologyElif, Emre, Ayse, Ayla
PersianIranian etymologyCyrus, Darius, Soraya, Roxane, Gaspard
AsianEast/South/Southeast AsianLinh, Mei, Ravi, Kenzo, Tao
BasqueBasque etymologyIker, Eneko, Amaia, Xavier
PortuguesePortuguese formsJoaquim, Conceicao, Rui, Nuno
OtherUnclassifiable, invented, or mixed(various)

Classification

Classification was performed using Anthropic's Claude Haiku 4.5 as an automated onomastic classifier. The model was prompted with the complete taxonomy, classification rules, and examples, then asked to classify names in batches of 200 with an associated confidence score (0.0-1.0). Processing parameters: 243 batches, 10 parallel requests, temperature 0. Total processing time: 8 minutes. Cost: under $3.

Validation. 500 names were reclassified independently by Claude Opus for cross-validation. Agreement was 76% overall (Cohen's kappa: 0.74), rising above 90% for Arabic (94%), African (92%), Berber and Basque (100% each). Disagreements concentrated on boundaries between linguistically adjacent European categories (Latin/Greek, Germanic/French), which do not affect the European/extra-European divide. External validation against reference onomastic sources (Dauzat, 1951; Tanet & Horde, 2000) yielded 87% agreement (95% CI: 82-91%).

Look Up a Name

Search any of the 48,516 classified names to see its etymological origin and classification confidence.

Projections

Forward projections to 2050 use Monte Carlo simulation (10,000 trajectories, random seed 42). For each origin, the most recent 5-year rolling slope is evolved as a random walk: st+1 = st + ε, where ε ~ N(0, σ²) and σ is calibrated on 1990-2024 slope volatility. Values are clamped to [0, 100] at each step.

The bands show the 50% interval (25th-75th percentiles, darker shading) and 90% interval (5th-95th percentiles, lighter shading). Categories are projected independently: no constraint ensures shares sum to 100%. These projections assume future volatility comparable to 1990-2024. They do not capture potential structural breaks (migration policy, fertility changes, economic shocks). The intervals measure statistical uncertainty, not scenario uncertainty.

Four Historical Phases

Phase 1: Traditional Dominance (1900-1945). Hebrew, Latin, and Germanic origins collectively account for 80-85% of births. Hebrew names peak at 40% in 1946 (driven by Marie, Jean, Joseph), while Germanic names begin their secular decline from 28%.

Phase 2: Post-War Recomposition (1945-1975). Hebrew names collapse from 40% to 18% as traditional Catholic naming practices weaken. Latin names rise to 37% in 1966 (Pierre, Paul, Michel). Greek names surge from 10% to 29% by 1975 (Philippe, Catherine, Sophie, Alexandre), the most rapid single-origin expansion in the dataset.

Phase 3: Diversification (1975-2000). All four traditional origins decline simultaneously. Arabic names grow from 3% to 5%. Anglo-Saxon names peak at 3.7% in 1991 (Kevin, Dylan, Brandon). Celtic names rise to 7% by 2007. The Shannon diversity index increases sharply.

Phase 4: New Equilibrium? (2000-2024). Arabic names accelerate from 6% to 15.7%, becoming the fifth-largest origin group. Traditional European origins continue declining but at reduced rates. The diversity index continues to increase. Aggregating the four traditional European categories (Hebrew + Latin + Germanic + Greek), their combined share falls from 85% in 1945 to 51% in 2024.

What the Data Cannot Tell Us

The etymological origin of a name is not the ethnic, religious, or national origin of the individual bearing it. Three distinct mechanisms connect names to demographics: cultural continuity (families choosing names from their ancestral tradition), cultural fashion (families adopting names perceived as attractive regardless of background), and assimilation (immigrant families adopting majority-culture names).

Critically, assimilation means the measured 15.7% share of Arabic-origin names likely underestimates the share of births in families with North African heritage. This naming data does not measure immigration rates, population composition, religious affiliation, or demographic "replacement." It documents how the etymological spectrum of names given to newborns in France has shifted over 125 years.

Simon R. (2026). Etymological Origins of First Names in France (1900-2024). DOI: 10.5281/zenodo.19334944
Licensed under CC BY 4.0. Free to share and adapt with attribution.

DOI: 10.5281/zenodo.19334944
Yuki Capital
© 2026 Yuki Capital