saturn
/home/coolhand/servers/diachronica/etymology_atlas/processed/word_forms.csv 25,731 rows sample n=25,731 seed 42 2026-05-01T18:05:52+00:00
Overview
| Source | /home/coolhand/servers/diachronica/etymology_atlas/processed/word_forms.csv |
| Total rows | 25,731 |
| Profiled sample | 25,731 |
| Columns | 8 |
| Generated | 2026-05-01T18:05:52+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
Errors during insight pass (9)
dataset:__global__:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvne8AfKi53LSoY3ge'}column:form:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvoAdT8zkyHcA8YYwT'}column:language_id:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvoirvgjDyGG63tmcH'}column:language_name:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvpHM5iYyzY2qursTC'}column:glottocode:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvpksWTP5W1AfXVaP4'}column:iso_639_3:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvqE9cZuXUCkeRoXa4'}column:concept:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvrfDudrSeAyYS6hxZ'}column:cognate_id:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvsCDmeJd8mhiHyTDg'}column:source_dataset:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvshUh6Gf8EhaoLRkr'}
Numeric correlation
form text
94.6% rows are a single word
95th-percentile length under 20 chars
24.9% duplicate strings
rows25,731
null0 (0.0%)
unique19,334
len_min1
len_max63
len_mean5.373
len_median5.000
len_p959.000
word_mean1.067
word_median1.000
n_empty0
n_duplicates6,397
duplicate_rate0.249
vocab_size16,219
readability_flesch_mean86.619
emoji_rate0.000
url_rate0.000
one_word_rate0.946
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
- luftoj
- sö
- spaga
- era
- čanč
- mř′e-
- enc
- berg
- qaḷb
- aigua
language_id numeric
rows25,731
null0 (0.0%)
unique160
min3.000
max317.000
mean166.006
median174.000
std101.394
q165.000
q3266.000
iqr201.000
skew-0.049
kurtosis-1.471
n_outliers0
outlier_rate0.000
zero_rate0.000
language_name categorical
rows25,731
null0 (0.0%)
unique160
top_valueBakhtiari
top_rate6.92e-03
cardinality160
entropy7.294
entropy_ratio0.996
Top values (rank 1–20)
- Bakhtiari — 178
- Nepali — 177
- Greek: Italiot — 177
- Old Spanish — 177
- Greek: Pontic — 177
- Breton: Treger — 177
- Hindi — 176
- Romanian — 176
- Greek: Cappadocian — 176
- Breton: Gwened — 176
- Middle Welsh — 176
- Ladin — 175
- Old Church Slavonic — 175
- Elfdalian — 175
- Old Swedish — 175
- Welsh: North — 175
- Old Polish — 175
- Lithuanian — 174
- Sinhalese — 174
- Urdu — 174
glottocode categorical
rows25,731
null0 (0.0%)
unique152
top_valuemace1250
top_rate0.019
cardinality152
entropy7.184
entropy_ratio0.991
Top values (rank 1–20)
- mace1250 — 497
- swed1254 — 347
- czec1258 — 346
- poli1260 — 345
- sout2640 — 345
- slov1268 — 342
- oldc1252 — 317
- bakh1245 — 178
- east1436 — 177
- apul1236 — 177
- olds1249 — 177
- pont1253 — 177
- treg1244 — 177
- hind1269 — 176
- roma1327 — 176
- capp1239 — 176
- vann1244 — 176
- midd1363 — 176
- ladi1250 — 175
- chur1257 — 175
iso_639_3 categorical
rows25,731
null173 (0.7%)
unique142
top_valueell
top_rate0.020
cardinality142
entropy7.044
entropy_ratio0.985
Top values (rank 1–20)
- ell — 522
- slv — 509
- mkd — 497
- bre — 353
- swe — 347
- ces — 346
- pol — 345
- sdh — 345
- src — 343
- por — 341
- oss — 341
- cat — 340
- grc — 332
- bsh — 289
- bqi — 178
- nep — 177
- osp — 177
- pnt — 177
- hin — 176
- ron — 176
concept categorical
rows25,731
null0 (0.0%)
unique170
top_valuesay
top_rate6.61e-03
cardinality170
entropy7.408
entropy_ratio1.000
Top values (rank 1–20)
- say — 170
- man — 166
- big — 163
- stone — 163
- house — 163
- foot — 161
- hand — 161
- head — 161
- see — 161
- woman — 161
- year — 161
- day — 160
- good — 160
- name — 160
- water — 160
- do — 160
- come — 159
- give — 159
- know — 159
- red — 159
cognate_id numeric
rows25,731
null0 (0.0%)
unique4,979
min3.000
max9,982
mean3,086
median1,610
std3,024
q1411.000
q35,640
iqr5,229
skew0.731
kurtosis-0.905
n_outliers0
outlier_rate0.000
zero_rate0.000
source_dataset categorical
top value is 100.0% of rows
rows25,731
null0 (0.0%)
unique1
top_valueiecor
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
- iecor — 25,731