saturn
/home/coolhand/servers/diachronica/etymology_atlas/processed/lexibank_references.json 11,359 rows sample n=11,359 seed 42 2026-05-01T18:36:26+00:00
Overview
| Source | /home/coolhand/servers/diachronica/etymology_atlas/processed/lexibank_references.json |
| Total rows | 11,359 |
| Profiled sample | 11,359 |
| Columns | 9 |
| Generated | 2026-05-01T18:36:26+00:00 |
Insights opt-in
Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.
Errors during insight pass (10)
dataset:__global__:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFxk6x7poKKWL1hg57'}column:key:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFyHLyELp1nQfjgFnN'}column:author:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFypbPZNcFW7pT5ypg'}column:year:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFzFtNGFk3jEoa772h'}column:title:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFzggkea8R3VnNnsAC'}column:journal:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG1HQDJKa8EfWEd8NF'}column:publisher:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG1p9hoeTv9AnBkCs6'}column:editor:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG2RcV6mGLVtrNFEXY'}column:url:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG2xc9JDVpTRaJ2UmC'}column:citation:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG3dYF8WHvgMw5uGAY'}
Languages detected
Per-string language detection across text columns (sampled).
key text
100.0% of rows are unique strings
100.0% rows are a single word
99.1% rows are all-caps
95th-percentile length under 20 chars
rows11,359
null0 (0.0%)
unique11,359
len_min1
len_max5
len_mean4.074
len_median4.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size11,359
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.991
boilerplate_rate0.000
Sample values (first 10)
- 82
- 9048
- 10407
- 9838
- 5031
- 11690
- 5220
- 10815
- 9943
- 59
author text
31 languages detected in sample
66.3% duplicate strings
rows11,359
null0 (0.0%)
unique3,830
len_min0
len_max453
len_mean20.427
len_median17.000
len_p9550.000
word_mean3.382
word_median3.000
n_empty1,277
n_duplicates7,529
duplicate_rate0.663
vocab_size6,656
readability_flesch_mean53.214
emoji_rate0.000
url_rate0.000
one_word_rate0.184
allcaps_rate0.059
boilerplate_rate0.000
Sample values (first 10)
- Boutkan, Dirk and Siebinga, Sjoerd M.
- Peiros, Ilia
- Chlenova, Svetlana
- Spratt, David and Nancy
- Oshika, Beatrice R. T.
- Stokhof, W. A. L.
- Blench, Roger M.
- Haupers, Ralph and Haupers, Lorraine
- Daud, Bukhari and Durie, Mark
year categorical
rows11,359
null0 (0.0%)
unique271
top_value
top_rate0.114
cardinality271
entropy6.100
entropy_ratio0.755
Top values (rank 1–20)
- — 1,300
- n.d. — 574
- 1992 — 338
- 2007 — 282
- 1971 — 271
- 1979 — 254
- 1980 — 225
- 2005 — 221
- 2015 — 217
- 1986 — 208
- 1997 — 208
- 2006 — 204
- 2009 — 202
- 2011 — 196
- 1975 — 195
- 1963 [1854] — 193
- 1981 — 188
- 2016 — 188
- 2004 — 185
- 2000 — 185
title text
26 languages detected in sample
50.1% duplicate strings
rows11,359
null0 (0.0%)
unique5,663
len_min0
len_max1,562
len_mean120.616
len_median104.000
len_p95261.000
word_mean14.659
word_median12.000
n_empty7
n_duplicates5,696
duplicate_rate0.501
vocab_size21,846
readability_flesch_mean8.003
emoji_rate0.000
url_rate0.174
one_word_rate0.070
allcaps_rate0.062
boilerplate_rate0.000
Sample values (first 10)
- Old Frisian Etymological Dictionary
- Comparative Linguistics in Southeast Asia. Pacific Linguistics C-142. Canberra: Australian National University.
- http://language.psy.auckland.ac.nz/austronesian/about.php (accessed February 2008)
- Manusela, Yazyk Tsentral'nogo Serama: Materialy i Zametki. Moscow: Econ-Inform.
- Kusal. In Kropp Dakubu, M. E. (ed.), West African language data sheets, vol. 1. Legon, Ghana: West African Linguistic Society.
- The relationship of Kam-Sui-Mak to Tai. Ph.D. dissertation, University of Michigan.
- Preliminary notes on the Alor and Pantar languages (East Indonesia). Pacific Linguistics B-43. Canberra: Research School of Pacific, Studies Department of Linguistics, Australian National University.
- The Upper Cross languages: a comparative study. Manuscript in preparation.
- Stieng-English dictionary. Dallas: Summer Institute of Linguistics microfiche publications.
- Kamus Basa Aceh. Kamus Bahasa aceh. Acehnese-Indonesian-English Thesaurus. Canberra: Pacific Linguistics.
journal categorical
top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique4
top_value
top_rate1.000
cardinality4
entropy5.08e-03
entropy_ratio2.54e-03
Top values (rank 1–20)
- — 11,355
- Münchener Studien zur Sprachwissenschaft — 2
- Zeitschrift für vergleichende Sprachforschung — 1
- Historische Sprachforschung — 1
publisher categorical
13 singleton categories
top value is 99.9% of rows
rows11,359
null0 (0.0%)
unique15
top_value
top_rate0.999
cardinality15
entropy0.020
entropy_ratio5.00e-03
Top values (rank 1–20)
- — 11,344
- Brill — 2
- Winter — 1
- Reichert — 1
- Vostočnaja Literatura — 1
- Rodopi — 1
- Harrassowitz — 1
- K. J. Trübner — 1
- Belaruskaâ navuka — 1
- Fitzroy Dearborn Publishers — 1
- Karl J. Trübner — 1
- Institut für Sprachwissenschaft der Universität Innsbruck — 1
- Vandenhoeck & Ruprecht — 1
- Walter de Gruyter & Co. — 1
- Nova Fronteira — 1
editor categorical
3 singleton categories
top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique4
top_value
top_rate1.000
cardinality4
entropy3.94e-03
entropy_ratio1.97e-03
Top values (rank 1–20)
- — 11,356
- Tischler, Johann — 1
- Martynaǔ, V. and G., Cyhun — 1
- Mallory, James P. — 1
url categorical
top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
- — 11,359
citation text
31 languages detected in sample
54.4% duplicate strings
rows11,359
null0 (0.0%)
unique5,178
len_min8
len_max142
len_mean67.938
len_median71.000
len_p9577.000
word_mean8.827
word_median10.000
n_empty0
n_duplicates6,181
duplicate_rate0.544
vocab_size15,535
readability_flesch_mean29.261
emoji_rate0.000
url_rate0.133
one_word_rate0.000
allcaps_rate0.060
boilerplate_rate0.000
Sample values (first 10)
- Boutkan 2005. Old Frisian Etymological Dictionary
- Peiros 1998. Comparative Linguistics in Southeast Asia. Pacific Lingui…
- Anon. n.d.. http://language.psy.auckland.ac.nz/austronesian/about.php…
- Chlenova 2012. Manusela, Yazyk Tsentral'nogo Serama: Materialy i Zametki…
- Spratt 1977. Kusal. In Kropp Dakubu, M. E. (ed.), West African languag…
- Oshika 1973. The relationship of Kam-Sui-Mak to Tai. Ph.D. dissertatio…
- Stokhof 1975. Preliminary notes on the Alor and Pantar languages (East …
- Blench 2014. The Upper Cross languages: a comparative study. Manuscrip…
- Haupers 1991. Stieng-English dictionary. Dallas: Summer Institute of Li…
- Daud 1999. Kamus Basa Aceh. Kamus Bahasa aceh. Acehnese-Indonesian-E…