saturn

/home/coolhand/html/datavis/data_trove/data/quirky/silence_data.json 6,998 rows sample n=6,998 seed 42 2026-06-22T00:13:02+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/data/quirky/silence_data.json
Total rows6,998
Profiled sample6,998
Columns6
Generated2026-06-22T00:13:02+00:00
Show data table
Per-column null rate across the corpus.
columnkindnull %
ntext0.0%
latnumeric0.0%
lngnumeric0.0%
pnumeric0.0%
snumeric0.0%
sscategorical0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset catalogues approximately 7,000 world languages, each with a name, geographic coordinates, speaker population, and endangerment status. The most striking finding is the extreme inequality in speaker populations: the median language has only 11,000 speakers while the maximum reaches nearly 1 billion, with over 16% of languages flagged as outliers — a classic long-tail distribution reflecting how a handful of dominant languages vastly outnumber the rest. Equally notable is the endangerment picture: while 44% of languages are classified as 'safe', a substantial share face real risk — 1,753 are 'definitely endangered', 327 are 'critically endangered', and 219 are already extinct. Top words in language names include 'sign', 'zapotec', 'mixtec', and directional qualifiers like 'southern' and 'northern', hinting at rich dialect clustering worth exploring geographically.

n high anthropic:default

This column appears to be a name field for human languages or dialects — top words include 'language', 'sign', 'Zapotec', 'Mixtec', and directional qualifiers like 'southern', 'northern', 'western', 'eastern', 'central', consistent with a linguistic taxonomy dataset. All 6,998 rows are unique with zero duplicates and zero nulls, confirming it functions as a label or identifier. The majority of values (73%) are single words, but multi-word entries push the mean length to ~8.97 characters and mean word count to ~1.37, reflecting compound names like 'Southern Zapotec'. The vocabulary size (7,003) slightly exceeding unique row count (6,998) is unremarkable given tokenization.

p medium anthropic:default

Column 'p' is a numeric field likely representing a price, population, or some monetary/count quantity with extreme positive skew (skew = 39.45, kurtosis = 1870.01). The median is 11,000 while the mean is 1,157,741 — a 100× gap — driven by a long upper tail that reaches 964,553,200, with 1,159 outliers (16.6% of rows). An additional 13.5% of values are exactly zero, suggesting a two-population distribution (absent/zero vs. non-zero values) that may require separate treatment.

lat high anthropic:default

This column contains geographic latitude values, spanning from -55.27° (southern South America) to 73.14° (Arctic latitudes), covering a wide swath of the globe with concentration in tropical and subtropical regions (median 6.37°, Q1 -4.65°, Q3 18.29°). The 4,048 unique values out of 6,998 rows indicates coordinate granularity likely at 2 decimal places, with some location repetition. Mild positive skew (0.697) and near-mesokurtic kurtosis (0.477) confirm most records cluster in equatorial-to-subtropical bands, consistent with datasets heavy in African, South/Southeast Asian, or Latin American records. Only 149 outliers (~2.1%) were flagged, likely corresponding to high-latitude locations in Europe or North America.

lng high anthropic:default

This column contains geographic longitude values, spanning from -178.78 to 179.31 — nearly the full valid range of -180 to 180 degrees. The mean (52.45) and median (47.65) both skew toward positive (eastern) longitudes, suggesting the dataset has a higher concentration of locations in Europe, Asia, or Africa than in the Americas. The IQR of 115.65 and low kurtosis (-0.67) indicate a broadly spread, relatively flat distribution across the globe with only 12 outliers flagged.

s high anthropic:default

This column is almost certainly an ordinal rating or severity score with exactly 6 discrete integer values ranging from 0 to 5. The distribution is notably left-skewed (skew = -1.04), meaning high scores (4–5) dominate — the median is 4.0 and Q3 is 5.0 — which would surprise an analyst expecting a balanced scale. Only 3.6% of rows are zero, suggesting the lowest value is rare rather than a default or sentinel.

ss high anthropic:default

This column encodes a conservation or endangerment status classification, with 7 distinct ordered categories ranging from 'safe' through 'extinct'. The dominant class is 'safe' at 43.9% of 6,998 rows, while 'extinct' accounts for 219 records — a meaningful but minority signal. Entropy ratio of 0.756 indicates reasonable spread across categories, though the distribution is notably right-skewed toward safer statuses. No nulls are present, and the label set is clean with no obvious noise.

Numeric correlation

Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
latlngps
lat+1.00-0.35+0.13-0.09
lng-0.35+1.00+0.00+0.16
p+0.13+0.00+1.00+0.09
s-0.09+0.16+0.09+1.00

n text

100.0% of rows are unique strings 73.1% rows are a single word
rows6,998
null0 (0.0%)
unique6,998
len_min1
len_max43
len_mean8.972
len_median7.000
len_p9521.000
word_mean1.369
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size7,003
readability_flesch_mean57.130
emoji_rate0.000
url_rate0.000
one_word_rate0.731
allcaps_rate0.000
boilerplate_rate0.000
Show data table
Character-length distribution for n (mean: 8.971991997713632).
charscount
1 – 225
2 – 3193
3 – 4748
4 – 51095
5 – 61082
6 – 7833
7 – 8521
8 – 9333
9 – 10271
10 – 12231
12 – 13208
13 – 14213
14 – 15188
15 – 16199
16 – 17144
17 – 1891
18 – 1975
19 – 2068
20 – 2163
21 – 2269
22 – 23150
23 – 2448
24 – 2526
25 – 2630
26 – 2721
27 – 288
28 – 297
29 – 3010
30 – 3114
31 – 326
32 – 348
34 – 352
35 – 3610
36 – 371
37 – 384
38 – 390
39 – 401
40 – 410
41 – 421
42 – 431
Sample values (first 10)
  1. Abaza
  2. Rajbanshi
  3. Tày
  4. Solos
  5. Kachi Koli
  6. Yangum Gel
  7. Eastern Krahn
  8. San Martín Itunyoso Triqui
  9. Sonia
  10. Angal Enen

lat numeric

rows6,998
null0 (0.0%)
unique4,048
min-55.270
max73.140
mean8.437
median6.370
std17.999
q1-4.650
q318.290
iqr22.940
skew0.697
kurtosis0.477
n_outliers149
outlier_rate0.021
zero_rate0.000
Show data table
Histogram bins for lat (median: 6.37).
bincount
-55.27 – -52.061
-52.06 – -48.851
-48.85 – -45.641
-45.64 – -42.430
-42.43 – -39.222
-39.22 – -36.015
-36.01 – -32.89
-32.8 – -29.5913
-29.59 – -26.3823
-26.38 – -23.1748
-23.17 – -19.96100
-19.96 – -16.75101
-16.75 – -13.54217
-13.54 – -10.33202
-10.33 – -7.116461
-7.116 – -3.906749
-3.906 – -0.6958640
-0.6958 – 2.514344
2.514 – 5.725434
5.725 – 8.935609
8.935 – 12.15676
12.15 – 15.36264
15.36 – 18.57368
18.57 – 21.78210
21.78 – 24.99282
24.99 – 28.2320
28.2 – 31.41131
31.41 – 34.62124
34.62 – 37.83148
37.83 – 41.0481
41.04 – 44.25107
44.25 – 47.4665
47.46 – 50.6768
50.67 – 53.8868
53.88 – 57.0935
57.09 – 60.320
60.3 – 63.5134
63.51 – 66.7218
66.72 – 69.9313
69.93 – 73.146

lng numeric

rows6,998
null0 (0.0%)
unique5,560
min-178.780
max179.310
mean52.459
median47.650
std79.673
q18.282
q3123.935
iqr115.653
skew-0.498
kurtosis-0.673
n_outliers12
outlier_rate1.71e-03
zero_rate0.000
Show data table
Histogram bins for lng (median: 47.65).
bincount
-178.8 – -169.811
-169.8 – -160.94
-160.9 – -151.99
-151.9 – -14311
-143 – -13410
-134 – -125.115
-125.1 – -116.194
-116.1 – -107.237
-107.2 – -98.2171
-98.21 – -89.26257
-89.26 – -80.3150
-80.31 – -71.35176
-71.35 – -62.4150
-62.4 – -53.45112
-53.45 – -44.548
-44.5 – -35.5421
-35.54 – -26.590
-26.59 – -17.643
-17.64 – -8.68795
-8.687 – 0.265261
0.265 – 9.217420
9.217 – 18.17687
18.17 – 27.12297
27.12 – 36.07402
36.07 – 45.03211
45.03 – 53.98113
53.98 – 62.9327
62.93 – 71.8873
71.88 – 80.84212
80.84 – 89.79191
89.79 – 98.74242
98.74 – 107.7386
107.7 – 116.6202
116.6 – 125.6437
125.6 – 134.5266
134.5 – 143.5512
143.5 – 152.5597
152.5 – 161.4106
161.4 – 170.4170
170.4 – 179.312

p numeric

skew=+39.45 16.6% rows beyond 1.5 IQR
rows6,998
null0 (0.0%)
unique1,627
min0.000
max964,553,200
mean1,157,742
median11,000
std17,303,554
q11,200
q383,000
iqr81,800
skew39.452
kurtosis1,870
n_outliers1,159
outlier_rate0.166
zero_rate0.135
Show data table
Histogram bins for p (median: 11000.0).
bincount
0 – 2.411e+076943
2.411e+07 – 4.823e+0728
4.823e+07 – 7.234e+075
7.234e+07 – 9.646e+0712
9.646e+07 – 1.206e+080
1.206e+08 – 1.447e+083
1.447e+08 – 1.688e+081
1.688e+08 – 1.929e+080
1.929e+08 – 2.17e+080
2.17e+08 – 2.411e+082
2.411e+08 – 2.653e+080
2.653e+08 – 2.894e+080
2.894e+08 – 3.135e+080
3.135e+08 – 3.376e+080
3.376e+08 – 3.617e+080
3.617e+08 – 3.858e+081
3.858e+08 – 4.099e+080
4.099e+08 – 4.34e+080
4.34e+08 – 4.582e+081
4.582e+08 – 4.823e+080
4.823e+08 – 5.064e+080
5.064e+08 – 5.305e+080
5.305e+08 – 5.546e+080
5.546e+08 – 5.787e+080
5.787e+08 – 6.028e+080
6.028e+08 – 6.27e+080
6.27e+08 – 6.511e+080
6.511e+08 – 6.752e+080
6.752e+08 – 6.993e+080
6.993e+08 – 7.234e+081
7.234e+08 – 7.475e+080
7.475e+08 – 7.716e+080
7.716e+08 – 7.958e+080
7.958e+08 – 8.199e+080
8.199e+08 – 8.44e+080
8.44e+08 – 8.681e+080
8.681e+08 – 8.922e+080
8.922e+08 – 9.163e+080
9.163e+08 – 9.404e+080
9.404e+08 – 9.646e+081

s numeric

rows6,998
null0 (0.0%)
unique6
min0.000
max5.000
mean3.796
median4.000
std1.366
q13.000
q35.000
iqr2.000
skew-1.041
kurtosis0.415
n_outliers0
outlier_rate0.000
zero_rate0.036
Show data table
Histogram bins for s (median: 4.0).
bincount
0 – 0.125250
0.125 – 0.250
0.25 – 0.3750
0.375 – 0.50
0.5 – 0.6250
0.625 – 0.750
0.75 – 0.8750
0.875 – 10
1 – 1.125328
1.125 – 1.250
1.25 – 1.3750
1.375 – 1.50
1.5 – 1.6250
1.625 – 1.750
1.75 – 1.8750
1.875 – 20
2 – 2.125383
2.125 – 2.250
2.25 – 2.3750
2.375 – 2.50
2.5 – 2.6250
2.625 – 2.750
2.75 – 2.8750
2.875 – 30
3 – 3.1251768
3.125 – 3.250
3.25 – 3.3750
3.375 – 3.50
3.5 – 3.6250
3.625 – 3.750
3.75 – 3.8750
3.875 – 40
4 – 4.1251176
4.125 – 4.250
4.25 – 4.3750
4.375 – 4.50
4.5 – 4.6250
4.625 – 4.750
4.75 – 4.8750
4.875 – 53093

ss categorical

rows6,998
null0 (0.0%)
unique7
top_valuesafe
top_rate0.439
cardinality7
entropy2.122
entropy_ratio0.756
Show data table
Top values for ss (7 unique shown, of 7 total).
valuecountshare
safe307443.9%
definitely endangered175325.1%
vulnerable116016.6%
severely endangered3745.3%
critically endangered3274.7%
extinct2193.1%
unknown911.3%
Top values (rank 1–20)
  1. safe — 3,074
  2. definitely endangered — 1,753
  3. vulnerable — 1,160
  4. severely endangered — 374
  5. critically endangered — 327
  6. extinct — 219
  7. unknown — 91