saturn

/home/coolhand/servers/diachronica/etymology_atlas/processed/word_forms.csv 25,731 rows sample n=25,731 seed 42 2026-05-01T18:05:52+00:00

Overview

Source/home/coolhand/servers/diachronica/etymology_atlas/processed/word_forms.csv
Total rows25,731
Profiled sample25,731
Columns8
Generated2026-05-01T18:05:52+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Errors during insight pass (9)
  • dataset:__global__:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvne8AfKi53LSoY3ge'}
  • column:form:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvoAdT8zkyHcA8YYwT'}
  • column:language_id:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvoirvgjDyGG63tmcH'}
  • column:language_name:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvpHM5iYyzY2qursTC'}
  • column:glottocode:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvpksWTP5W1AfXVaP4'}
  • column:iso_639_3:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvqE9cZuXUCkeRoXa4'}
  • column:concept:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvrfDudrSeAyYS6hxZ'}
  • column:cognate_id:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvsCDmeJd8mhiHyTDg'}
  • column:source_dataset:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacHvshUh6Gf8EhaoLRkr'}

Numeric correlation

form text

94.6% rows are a single word 95th-percentile length under 20 chars 24.9% duplicate strings
rows25,731
null0 (0.0%)
unique19,334
len_min1
len_max63
len_mean5.373
len_median5.000
len_p959.000
word_mean1.067
word_median1.000
n_empty0
n_duplicates6,397
duplicate_rate0.249
vocab_size16,219
readability_flesch_mean86.619
emoji_rate0.000
url_rate0.000
one_word_rate0.946
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. luftoj
  2. spaga
  3. era
  4. čanč
  5. mř′e-
  6. enc
  7. berg
  8. qaḷb
  9. aigua

language_id numeric

rows25,731
null0 (0.0%)
unique160
min3.000
max317.000
mean166.006
median174.000
std101.394
q165.000
q3266.000
iqr201.000
skew-0.049
kurtosis-1.471
n_outliers0
outlier_rate0.000
zero_rate0.000

language_name categorical

rows25,731
null0 (0.0%)
unique160
top_valueBakhtiari
top_rate6.92e-03
cardinality160
entropy7.294
entropy_ratio0.996
Top values (rank 1–20)
  1. Bakhtiari — 178
  2. Nepali — 177
  3. Greek: Italiot — 177
  4. Old Spanish — 177
  5. Greek: Pontic — 177
  6. Breton: Treger — 177
  7. Hindi — 176
  8. Romanian — 176
  9. Greek: Cappadocian — 176
  10. Breton: Gwened — 176
  11. Middle Welsh — 176
  12. Ladin — 175
  13. Old Church Slavonic — 175
  14. Elfdalian — 175
  15. Old Swedish — 175
  16. Welsh: North — 175
  17. Old Polish — 175
  18. Lithuanian — 174
  19. Sinhalese — 174
  20. Urdu — 174

glottocode categorical

rows25,731
null0 (0.0%)
unique152
top_valuemace1250
top_rate0.019
cardinality152
entropy7.184
entropy_ratio0.991
Top values (rank 1–20)
  1. mace1250 — 497
  2. swed1254 — 347
  3. czec1258 — 346
  4. poli1260 — 345
  5. sout2640 — 345
  6. slov1268 — 342
  7. oldc1252 — 317
  8. bakh1245 — 178
  9. east1436 — 177
  10. apul1236 — 177
  11. olds1249 — 177
  12. pont1253 — 177
  13. treg1244 — 177
  14. hind1269 — 176
  15. roma1327 — 176
  16. capp1239 — 176
  17. vann1244 — 176
  18. midd1363 — 176
  19. ladi1250 — 175
  20. chur1257 — 175

iso_639_3 categorical

rows25,731
null173 (0.7%)
unique142
top_valueell
top_rate0.020
cardinality142
entropy7.044
entropy_ratio0.985
Top values (rank 1–20)
  1. ell — 522
  2. slv — 509
  3. mkd — 497
  4. bre — 353
  5. swe — 347
  6. ces — 346
  7. pol — 345
  8. sdh — 345
  9. src — 343
  10. por — 341
  11. oss — 341
  12. cat — 340
  13. grc — 332
  14. bsh — 289
  15. bqi — 178
  16. nep — 177
  17. osp — 177
  18. pnt — 177
  19. hin — 176
  20. ron — 176

concept categorical

rows25,731
null0 (0.0%)
unique170
top_valuesay
top_rate6.61e-03
cardinality170
entropy7.408
entropy_ratio1.000
Top values (rank 1–20)
  1. say — 170
  2. man — 166
  3. big — 163
  4. stone — 163
  5. house — 163
  6. foot — 161
  7. hand — 161
  8. head — 161
  9. see — 161
  10. woman — 161
  11. year — 161
  12. day — 160
  13. good — 160
  14. name — 160
  15. water — 160
  16. do — 160
  17. come — 159
  18. give — 159
  19. know — 159
  20. red — 159

cognate_id numeric

rows25,731
null0 (0.0%)
unique4,979
min3.000
max9,982
mean3,086
median1,610
std3,024
q1411.000
q35,640
iqr5,229
skew0.731
kurtosis-0.905
n_outliers0
outlier_rate0.000
zero_rate0.000

source_dataset categorical

top value is 100.0% of rows
rows25,731
null0 (0.0%)
unique1
top_valueiecor
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. iecor — 25,731