saturn

/home/coolhand/servers/diachronica/etymology_atlas/parquet/cognate_sets.parquet 4,981 rows sample n=4,981 seed 42 2026-05-01T17:52:34+00:00

Overview

Source/home/coolhand/servers/diachronica/etymology_atlas/parquet/cognate_sets.parquet
Total rows4,981
Profiled sample4,981
Columns7
Generated2026-05-01T17:52:34+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Errors during insight pass (8)
  • dataset:__global__:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuur4r7Zdxk1iJjbGX'}
  • column:cognate_id:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuvMaZVBmMBiD3ZfHn'}
  • column:concept:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuvv4UNvr7cmfyAzCy'}
  • column:word_count:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuweiYcAFWhx4qsxh8'}
  • column:language_count:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGux9Uj4jkn47uauwiC'}
  • column:words:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuxgj6Sf37gZZZjoFu'}
  • column:source_dataset:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuyDUXDiq9D5e9qwqL'}
  • column:confidence:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacGuyiVGkWMg3F5hcvci'}

Numeric correlation

cognate_id text

100.0% of rows are unique strings 100.0% rows are a single word 95th-percentile length under 20 chars
rows4,981
null0 (0.0%)
unique4,981
len_min7
len_max10
len_mean9.884
len_median10.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size4,981
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. iecor:12
  2. iecor:8032
  3. iecor:9076
  4. iecor:8599
  5. iecor:5170
  6. iecor:9758
  7. iecor:5291
  8. iecor:9282
  9. iecor:8613
  10. iecor:322

concept categorical

top value is 100.0% of rows
rows4,981
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. — 4,981

word_count numeric

skew=+6.84 13.0% rows beyond 1.5 IQR
rows4,981
null0 (0.0%)
unique93
min1.000
max157.000
mean5.168
median2.000
std12.135
q11.000
q34.000
iqr3.000
skew6.837
kurtosis59.740
n_outliers649
outlier_rate0.130
zero_rate0.000

language_count numeric

skew=+6.84 13.0% rows beyond 1.5 IQR
rows4,981
null0 (0.0%)
unique94
min1.000
max157.000
mean5.166
median2.000
std12.130
q11.000
q34.000
iqr3.000
skew6.838
kurtosis59.775
n_outliers649
outlier_rate0.130
zero_rate0.000

words text

99.6% of rows are unique strings
rows4,981
null0 (0.0%)
unique4,963
len_min83
len_max14,956
len_mean498.881
len_median184.000
len_p951,988
word_mean44.534
word_median16.000
n_empty0
n_duplicates18
duplicate_rate3.61e-03
vocab_size12,094
readability_flesch_mean48.433
emoji_rate0.000
url_rate0.000
one_word_rate0.000
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. [{"form": "s\u016bxtan", "language": "Persian: Tehran", "iso_639_3": "pes", "glottocode": "west2369"}, {"form": "su\u0292yn", "language": "Ossetic: Iron", "iso_639_3": "oss", "glottocode": "iron1242"}, {"form": "so\u0292un", "language": "Ossetic: Digor", "iso_639_3": "oss", "glot…
  2. [{"form": "r\u00e9st\u00e2", "language": "Franco-Proven\u00e7al", "iso_639_3": "frp", "glottocode": "fran1269"}]
  3. [{"form": "s\u0101pt", "language": "Sogdian", "iso_639_3": "sog", "glottocode": "sogd1245"}]
  4. [{"form": "wa\u0113", "language": "Kurdish S.: Elami", "iso_639_3": "sdh", "glottocode": "sout2640"}, {"form": "a-w(a)-\u0101", "language": "Kurdish S.: Qorveh", "iso_639_3": "sdh", "glottocode": "sout2640"}]
  5. [{"form": "sp\u00edti", "language": "Greek: Modern Std", "iso_639_3": "ell", "glottocode": "mode1248"}, {"form": "sp\u00edtin", "language": "Greek: Cypriot", "iso_639_3": "ell", "glottocode": "cypr1249"}, {"form": "sp\u00edti", "language": "Greek: Italiot", "iso_639_3": "ell", "g…
  6. [{"form": "\u01f0\u0259tu", "language": "Pashai: North-West", "iso_639_3": "glh", "glottocode": "nort2665"}]
  7. [{"form": "c\u014dgit\u0101re", "language": "Latin", "iso_639_3": "lat", "glottocode": "lati1261"}, {"form": "cuider", "language": "Anglo-Norman", "iso_639_3": "xno", "glottocode": "angl1258"}, {"form": "cuidier", "language": "Old French", "iso_639_3": "fro", "glottocode": "oldf1…
  8. [{"form": "peden", "language": "Old Occitan", "iso_639_3": "pro", "glottocode": "oldp1253"}]
  9. [{"form": "t\u0113\u03b3", "language": "Khwarazmian", "iso_639_3": "xco", "glottocode": "khwa1238"}]
  10. [{"form": "denken", "language": "Dutch", "iso_639_3": "nld", "glottocode": "dutc1256"}, {"form": "think", "language": "English", "iso_639_3": "eng", "glottocode": "stan1293"}, {"form": "denken", "language": "Flemish", "iso_639_3": "vls", "glottocode": "vlaa1240"}, {"form": "tinke…

source_dataset categorical

top value is 100.0% of rows
rows4,981
null0 (0.0%)
unique1
top_valueiecor
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. iecor — 4,981

confidence numeric

only one distinct value
rows4,981
null0 (0.0%)
unique1
min1.000
max1.000
mean1.000
median1.000
std0.000
q11.000
q31.000
iqr0.000
skew0.000
kurtosis0.000
n_outliers0
outlier_rate0.000
zero_rate0.000