saturn

/home/coolhand/servers/diachronica/etymology_atlas/processed/lexibank_references.json 11,359 rows sample n=11,359 seed 42 2026-05-01T18:36:26+00:00

Overview

Source/home/coolhand/servers/diachronica/etymology_atlas/processed/lexibank_references.json
Total rows11,359
Profiled sample11,359
Columns9
Generated2026-05-01T18:36:26+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Errors during insight pass (10)
  • dataset:__global__:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFxk6x7poKKWL1hg57'}
  • column:key:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFyHLyELp1nQfjgFnN'}
  • column:author:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFypbPZNcFW7pT5ypg'}
  • column:year:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFzFtNGFk3jEoa772h'}
  • column:title:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLFzggkea8R3VnNnsAC'}
  • column:journal:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG1HQDJKa8EfWEd8NF'}
  • column:publisher:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG1p9hoeTv9AnBkCs6'}
  • column:editor:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG2RcV6mGLVtrNFEXY'}
  • column:url:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG2xc9JDVpTRaJ2UmC'}
  • column:citation:anthropic:claude-opus-4-7: BadRequestError — Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits.'}, 'request_id': 'req_011CacLG3dYF8WHvgMw5uGAY'}

Languages detected

Per-string language detection across text columns (sampled).

key text

100.0% of rows are unique strings 100.0% rows are a single word 99.1% rows are all-caps 95th-percentile length under 20 chars
rows11,359
null0 (0.0%)
unique11,359
len_min1
len_max5
len_mean4.074
len_median4.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size11,359
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.991
boilerplate_rate0.000
Sample values (first 10)
  1. 82
  2. 9048
  3. 10407
  4. 9838
  5. 5031
  6. 11690
  7. 5220
  8. 10815
  9. 9943
  10. 59

author text

31 languages detected in sample 66.3% duplicate strings
rows11,359
null0 (0.0%)
unique3,830
len_min0
len_max453
len_mean20.427
len_median17.000
len_p9550.000
word_mean3.382
word_median3.000
n_empty1,277
n_duplicates7,529
duplicate_rate0.663
vocab_size6,656
readability_flesch_mean53.214
emoji_rate0.000
url_rate0.000
one_word_rate0.184
allcaps_rate0.059
boilerplate_rate0.000
Sample values (first 10)
  1. Boutkan, Dirk and Siebinga, Sjoerd M.
  2. Peiros, Ilia
  3. Chlenova, Svetlana
  4. Spratt, David and Nancy
  5. Oshika, Beatrice R. T.
  6. Stokhof, W. A. L.
  7. Blench, Roger M.
  8. Haupers, Ralph and Haupers, Lorraine
  9. Daud, Bukhari and Durie, Mark

year categorical

rows11,359
null0 (0.0%)
unique271
top_value
top_rate0.114
cardinality271
entropy6.100
entropy_ratio0.755
Top values (rank 1–20)
  1. — 1,300
  2. n.d. — 574
  3. 1992 — 338
  4. 2007 — 282
  5. 1971 — 271
  6. 1979 — 254
  7. 1980 — 225
  8. 2005 — 221
  9. 2015 — 217
  10. 1986 — 208
  11. 1997 — 208
  12. 2006 — 204
  13. 2009 — 202
  14. 2011 — 196
  15. 1975 — 195
  16. 1963 [1854] — 193
  17. 1981 — 188
  18. 2016 — 188
  19. 2004 — 185
  20. 2000 — 185

title text

26 languages detected in sample 50.1% duplicate strings
rows11,359
null0 (0.0%)
unique5,663
len_min0
len_max1,562
len_mean120.616
len_median104.000
len_p95261.000
word_mean14.659
word_median12.000
n_empty7
n_duplicates5,696
duplicate_rate0.501
vocab_size21,846
readability_flesch_mean8.003
emoji_rate0.000
url_rate0.174
one_word_rate0.070
allcaps_rate0.062
boilerplate_rate0.000
Sample values (first 10)
  1. Old Frisian Etymological Dictionary
  2. Comparative Linguistics in Southeast Asia. Pacific Linguistics C-142. Canberra: Australian National University.
  3. http://language.psy.auckland.ac.nz/austronesian/about.php (accessed February 2008)
  4. Manusela, Yazyk Tsentral'nogo Serama: Materialy i Zametki. Moscow: Econ-Inform.
  5. Kusal. In Kropp Dakubu, M. E. (ed.), West African language data sheets, vol. 1. Legon, Ghana: West African Linguistic Society.
  6. The relationship of Kam-Sui-Mak to Tai. Ph.D. dissertation, University of Michigan.
  7. Preliminary notes on the Alor and Pantar languages (East Indonesia). Pacific Linguistics B-43. Canberra: Research School of Pacific, Studies Department of Linguistics, Australian National University.
  8. The Upper Cross languages: a comparative study. Manuscript in preparation.
  9. Stieng-English dictionary. Dallas: Summer Institute of Linguistics microfiche publications.
  10. Kamus Basa Aceh. Kamus Bahasa aceh. Acehnese-Indonesian-English Thesaurus. Canberra: Pacific Linguistics.

journal categorical

top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique4
top_value
top_rate1.000
cardinality4
entropy5.08e-03
entropy_ratio2.54e-03
Top values (rank 1–20)
  1. — 11,355
  2. Münchener Studien zur Sprachwissenschaft — 2
  3. Zeitschrift für vergleichende Sprachforschung — 1
  4. Historische Sprachforschung — 1

publisher categorical

13 singleton categories top value is 99.9% of rows
rows11,359
null0 (0.0%)
unique15
top_value
top_rate0.999
cardinality15
entropy0.020
entropy_ratio5.00e-03
Top values (rank 1–20)
  1. — 11,344
  2. Brill — 2
  3. Winter — 1
  4. Reichert — 1
  5. Vostočnaja Literatura — 1
  6. Rodopi — 1
  7. Harrassowitz — 1
  8. K. J. Trübner — 1
  9. Belaruskaâ navuka — 1
  10. Fitzroy Dearborn Publishers — 1
  11. Karl J. Trübner — 1
  12. Institut für Sprachwissenschaft der Universität Innsbruck — 1
  13. Vandenhoeck & Ruprecht — 1
  14. Walter de Gruyter & Co. — 1
  15. Nova Fronteira — 1

editor categorical

3 singleton categories top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique4
top_value
top_rate1.000
cardinality4
entropy3.94e-03
entropy_ratio1.97e-03
Top values (rank 1–20)
  1. — 11,356
  2. Tischler, Johann — 1
  3. Martynaǔ, V. and G., Cyhun — 1
  4. Mallory, James P. — 1

url categorical

top value is 100.0% of rows
rows11,359
null0 (0.0%)
unique1
top_value
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. — 11,359

citation text

31 languages detected in sample 54.4% duplicate strings
rows11,359
null0 (0.0%)
unique5,178
len_min8
len_max142
len_mean67.938
len_median71.000
len_p9577.000
word_mean8.827
word_median10.000
n_empty0
n_duplicates6,181
duplicate_rate0.544
vocab_size15,535
readability_flesch_mean29.261
emoji_rate0.000
url_rate0.133
one_word_rate0.000
allcaps_rate0.060
boilerplate_rate0.000
Sample values (first 10)
  1. Boutkan 2005. Old Frisian Etymological Dictionary
  2. Peiros 1998. Comparative Linguistics in Southeast Asia. Pacific Lingui…
  3. Anon. n.d.. http://language.psy.auckland.ac.nz/austronesian/about.php…
  4. Chlenova 2012. Manusela, Yazyk Tsentral'nogo Serama: Materialy i Zametki…
  5. Spratt 1977. Kusal. In Kropp Dakubu, M. E. (ed.), West African languag…
  6. Oshika 1973. The relationship of Kam-Sui-Mak to Tai. Ph.D. dissertatio…
  7. Stokhof 1975. Preliminary notes on the Alor and Pantar languages (East …
  8. Blench 2014. The Upper Cross languages: a comparative study. Manuscrip…
  9. Haupers 1991. Stieng-English dictionary. Dallas: Summer Institute of Li…
  10. Daud 1999. Kamus Basa Aceh. Kamus Bahasa aceh. Acehnese-Indonesian-E…