saturn

/home/coolhand/html/datavis/data_trove/data/quirky/silence_data.json 6,998 rows sample n=6,998 seed 42 2026-06-22T00:13:02+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/quirky/silence_data.json
Total rows	6,998
Profiled sample	6,998
Columns	6
Generated	2026-06-22T00:13:02+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
n	text	0.0%
lat	numeric	0.0%
lng	numeric	0.0%
p	numeric	0.0%
s	numeric	0.0%
ss	categorical	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset catalogues approximately 7,000 world languages, each with a name, geographic coordinates, speaker population, and endangerment status. The most striking finding is the extreme inequality in speaker populations: the median language has only 11,000 speakers while the maximum reaches nearly 1 billion, with over 16% of languages flagged as outliers — a classic long-tail distribution reflecting how a handful of dominant languages vastly outnumber the rest. Equally notable is the endangerment picture: while 44% of languages are classified as 'safe', a substantial share face real risk — 1,753 are 'definitely endangered', 327 are 'critically endangered', and 219 are already extinct. Top words in language names include 'sign', 'zapotec', 'mixtec', and directional qualifiers like 'southern' and 'northern', hinting at rich dialect clustering worth exploring geographically.

n high anthropic:default

This column appears to be a name field for human languages or dialects — top words include 'language', 'sign', 'Zapotec', 'Mixtec', and directional qualifiers like 'southern', 'northern', 'western', 'eastern', 'central', consistent with a linguistic taxonomy dataset. All 6,998 rows are unique with zero duplicates and zero nulls, confirming it functions as a label or identifier. The majority of values (73%) are single words, but multi-word entries push the mean length to ~8.97 characters and mean word count to ~1.37, reflecting compound names like 'Southern Zapotec'. The vocabulary size (7,003) slightly exceeding unique row count (6,998) is unremarkable given tokenization.

p medium anthropic:default

Column 'p' is a numeric field likely representing a price, population, or some monetary/count quantity with extreme positive skew (skew = 39.45, kurtosis = 1870.01). The median is 11,000 while the mean is 1,157,741 — a 100× gap — driven by a long upper tail that reaches 964,553,200, with 1,159 outliers (16.6% of rows). An additional 13.5% of values are exactly zero, suggesting a two-population distribution (absent/zero vs. non-zero values) that may require separate treatment.

lat high anthropic:default

This column contains geographic latitude values, spanning from -55.27° (southern South America) to 73.14° (Arctic latitudes), covering a wide swath of the globe with concentration in tropical and subtropical regions (median 6.37°, Q1 -4.65°, Q3 18.29°). The 4,048 unique values out of 6,998 rows indicates coordinate granularity likely at 2 decimal places, with some location repetition. Mild positive skew (0.697) and near-mesokurtic kurtosis (0.477) confirm most records cluster in equatorial-to-subtropical bands, consistent with datasets heavy in African, South/Southeast Asian, or Latin American records. Only 149 outliers (~2.1%) were flagged, likely corresponding to high-latitude locations in Europe or North America.

lng high anthropic:default

This column contains geographic longitude values, spanning from -178.78 to 179.31 — nearly the full valid range of -180 to 180 degrees. The mean (52.45) and median (47.65) both skew toward positive (eastern) longitudes, suggesting the dataset has a higher concentration of locations in Europe, Asia, or Africa than in the Americas. The IQR of 115.65 and low kurtosis (-0.67) indicate a broadly spread, relatively flat distribution across the globe with only 12 outliers flagged.

s high anthropic:default

This column is almost certainly an ordinal rating or severity score with exactly 6 discrete integer values ranging from 0 to 5. The distribution is notably left-skewed (skew = -1.04), meaning high scores (4–5) dominate — the median is 4.0 and Q3 is 5.0 — which would surprise an analyst expecting a balanced scale. Only 3.6% of rows are zero, suggesting the lowest value is rare rather than a default or sentinel.

ss high anthropic:default

This column encodes a conservation or endangerment status classification, with 7 distinct ordered categories ranging from 'safe' through 'extinct'. The dominant class is 'safe' at 43.9% of 6,998 rows, while 'extinct' accounts for 219 records — a meaningful but minority signal. Entropy ratio of 0.756 indicates reasonable spread across categories, though the distribution is notably right-skewed toward safer statuses. No nulls are present, and the label set is clean with no obvious noise.

Numeric correlation

Show data table

Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
	lat	lng	p	s
lat	+1.00	-0.35	+0.13	-0.09
lng	-0.35	+1.00	+0.00	+0.16
p	+0.13	+0.00	+1.00	+0.09
s	-0.09	+0.16	+0.09	+1.00

n text

100.0% of rows are unique strings 73.1% rows are a single word

rows6,998

null0 (0.0%)

unique6,998

len_min1

len_max43

len_mean8.972

len_median7.000

len_p9521.000

word_mean1.369

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size7,003

readability_flesch_mean57.130

emoji_rate0.000

url_rate0.000

one_word_rate0.731

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for n (mean: 8.971991997713632).
chars	count
1 – 2	25
2 – 3	193
3 – 4	748
4 – 5	1095
5 – 6	1082
6 – 7	833
7 – 8	521
8 – 9	333
9 – 10	271
10 – 12	231
12 – 13	208
13 – 14	213
14 – 15	188
15 – 16	199
16 – 17	144
17 – 18	91
18 – 19	75
19 – 20	68
20 – 21	63
21 – 22	69
22 – 23	150
23 – 24	48
24 – 25	26
25 – 26	30
26 – 27	21
27 – 28	8
28 – 29	7
29 – 30	10
30 – 31	14
31 – 32	6
32 – 34	8
34 – 35	2
35 – 36	10
36 – 37	1
37 – 38	4
38 – 39	0
39 – 40	1
40 – 41	0
41 – 42	1
42 – 43	1

Sample values (first 10)

Abaza
Rajbanshi
Tày
Solos
Kachi Koli
Yangum Gel
Eastern Krahn
San Martín Itunyoso Triqui
Sonia
Angal Enen

lat numeric

rows6,998

null0 (0.0%)

unique4,048

min-55.270

max73.140

mean8.437

median6.370

std17.999

q1-4.650

q318.290

iqr22.940

skew0.697

kurtosis0.477

n_outliers149

outlier_rate0.021

zero_rate0.000

Show data table

Histogram bins for lat (median: 6.37).
bin	count
-55.27 – -52.06	1
-52.06 – -48.85	1
-48.85 – -45.64	1
-45.64 – -42.43	0
-42.43 – -39.22	2
-39.22 – -36.01	5
-36.01 – -32.8	9
-32.8 – -29.59	13
-29.59 – -26.38	23
-26.38 – -23.17	48
-23.17 – -19.96	100
-19.96 – -16.75	101
-16.75 – -13.54	217
-13.54 – -10.33	202
-10.33 – -7.116	461
-7.116 – -3.906	749
-3.906 – -0.6958	640
-0.6958 – 2.514	344
2.514 – 5.725	434
5.725 – 8.935	609
8.935 – 12.15	676
12.15 – 15.36	264
15.36 – 18.57	368
18.57 – 21.78	210
21.78 – 24.99	282
24.99 – 28.2	320
28.2 – 31.41	131
31.41 – 34.62	124
34.62 – 37.83	148
37.83 – 41.04	81
41.04 – 44.25	107
44.25 – 47.46	65
47.46 – 50.67	68
50.67 – 53.88	68
53.88 – 57.09	35
57.09 – 60.3	20
60.3 – 63.51	34
63.51 – 66.72	18
66.72 – 69.93	13
69.93 – 73.14	6

lng numeric

rows6,998

null0 (0.0%)

unique5,560

min-178.780

max179.310

mean52.459

median47.650

std79.673

q18.282

q3123.935

iqr115.653

skew-0.498

kurtosis-0.673

n_outliers12

outlier_rate1.71e-03

zero_rate0.000

Show data table

Histogram bins for lng (median: 47.65).
bin	count
-178.8 – -169.8	11
-169.8 – -160.9	4
-160.9 – -151.9	9
-151.9 – -143	11
-143 – -134	10
-134 – -125.1	15
-125.1 – -116.1	94
-116.1 – -107.2	37
-107.2 – -98.21	71
-98.21 – -89.26	257
-89.26 – -80.31	50
-80.31 – -71.35	176
-71.35 – -62.4	150
-62.4 – -53.45	112
-53.45 – -44.5	48
-44.5 – -35.54	21
-35.54 – -26.59	0
-26.59 – -17.64	3
-17.64 – -8.687	95
-8.687 – 0.265	261
0.265 – 9.217	420
9.217 – 18.17	687
18.17 – 27.12	297
27.12 – 36.07	402
36.07 – 45.03	211
45.03 – 53.98	113
53.98 – 62.93	27
62.93 – 71.88	73
71.88 – 80.84	212
80.84 – 89.79	191
89.79 – 98.74	242
98.74 – 107.7	386
107.7 – 116.6	202
116.6 – 125.6	437
125.6 – 134.5	266
134.5 – 143.5	512
143.5 – 152.5	597
152.5 – 161.4	106
161.4 – 170.4	170
170.4 – 179.3	12

p numeric

skew=+39.45 16.6% rows beyond 1.5 IQR

rows6,998

null0 (0.0%)

unique1,627

min0.000

max964,553,200

mean1,157,742

median11,000

std17,303,554

q11,200

q383,000

iqr81,800

skew39.452

kurtosis1,870

n_outliers1,159

outlier_rate0.166

zero_rate0.135

Show data table

Histogram bins for p (median: 11000.0).
bin	count
0 – 2.411e+07	6943
2.411e+07 – 4.823e+07	28
4.823e+07 – 7.234e+07	5
7.234e+07 – 9.646e+07	12
9.646e+07 – 1.206e+08	0
1.206e+08 – 1.447e+08	3
1.447e+08 – 1.688e+08	1
1.688e+08 – 1.929e+08	0
1.929e+08 – 2.17e+08	0
2.17e+08 – 2.411e+08	2
2.411e+08 – 2.653e+08	0
2.653e+08 – 2.894e+08	0
2.894e+08 – 3.135e+08	0
3.135e+08 – 3.376e+08	0
3.376e+08 – 3.617e+08	0
3.617e+08 – 3.858e+08	1
3.858e+08 – 4.099e+08	0
4.099e+08 – 4.34e+08	0
4.34e+08 – 4.582e+08	1
4.582e+08 – 4.823e+08	0
4.823e+08 – 5.064e+08	0
5.064e+08 – 5.305e+08	0
5.305e+08 – 5.546e+08	0
5.546e+08 – 5.787e+08	0
5.787e+08 – 6.028e+08	0
6.028e+08 – 6.27e+08	0
6.27e+08 – 6.511e+08	0
6.511e+08 – 6.752e+08	0
6.752e+08 – 6.993e+08	0
6.993e+08 – 7.234e+08	1
7.234e+08 – 7.475e+08	0
7.475e+08 – 7.716e+08	0
7.716e+08 – 7.958e+08	0
7.958e+08 – 8.199e+08	0
8.199e+08 – 8.44e+08	0
8.44e+08 – 8.681e+08	0
8.681e+08 – 8.922e+08	0
8.922e+08 – 9.163e+08	0
9.163e+08 – 9.404e+08	0
9.404e+08 – 9.646e+08	1

s numeric

rows6,998

null0 (0.0%)

unique6

min0.000

max5.000

mean3.796

median4.000

std1.366

q13.000

q35.000

iqr2.000

skew-1.041

kurtosis0.415

n_outliers0

outlier_rate0.000

zero_rate0.036

Show data table

Histogram bins for s (median: 4.0).
bin	count
0 – 0.125	250
0.125 – 0.25	0
0.25 – 0.375	0
0.375 – 0.5	0
0.5 – 0.625	0
0.625 – 0.75	0
0.75 – 0.875	0
0.875 – 1	0
1 – 1.125	328
1.125 – 1.25	0
1.25 – 1.375	0
1.375 – 1.5	0
1.5 – 1.625	0
1.625 – 1.75	0
1.75 – 1.875	0
1.875 – 2	0
2 – 2.125	383
2.125 – 2.25	0
2.25 – 2.375	0
2.375 – 2.5	0
2.5 – 2.625	0
2.625 – 2.75	0
2.75 – 2.875	0
2.875 – 3	0
3 – 3.125	1768
3.125 – 3.25	0
3.25 – 3.375	0
3.375 – 3.5	0
3.5 – 3.625	0
3.625 – 3.75	0
3.75 – 3.875	0
3.875 – 4	0
4 – 4.125	1176
4.125 – 4.25	0
4.25 – 4.375	0
4.375 – 4.5	0
4.5 – 4.625	0
4.625 – 4.75	0
4.75 – 4.875	0
4.875 – 5	3093

ss categorical

rows6,998

null0 (0.0%)

unique7

top_valuesafe

top_rate0.439

cardinality7

entropy2.122

entropy_ratio0.756

Show data table

Top values for ss (7 unique shown, of 7 total).
value	count	share
safe	3074	43.9%
definitely endangered	1753	25.1%
vulnerable	1160	16.6%
severely endangered	374	5.3%
critically endangered	327	4.7%
extinct	219	3.1%
unknown	91	1.3%

Top values (rank 1–20)

safe — 3,074
definitely endangered — 1,753
vulnerable — 1,160
severely endangered — 374
critically endangered — 327
extinct — 219
unknown — 91