saturn·

exoplanets exoplanets

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/data/celestial/exoplanets/exoplanets.csv

Saturn profiled 6,150 rows across 11 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/data/celestial/exoplanets/exoplanets.csv",
    "--findings", "exoplanets-exoplanets.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 6,150 exoplanets across 11 columns, mixing identifiers (pl_name, hostname), discovery metadata (discoverymethod, disc_year), sky coordinates (ra, dec), and physical measurements (pl_bmassj, pl_orbsmax, pl_rade, pl_orbper, sy_dist). Discovery is heavily dominated by the Transit method at 73.4% of records, with Radial Velocity a distant second — worth noting because it shapes which kinds of planets are represented. The physical measurement columns are all extremely skewed with heavy outliers: pl_orbper has a skew of ~43.8 and a max of 8,040,000 days, and pl_orbsmax similarly stretches to 19,000 AU, so any analysis should use log scales or trimming. Also flag that pl_bmassj is missing for 50.3% of rows and pl_orbsmax for 37.4%, which limits joint mass/orbit analyses. Discovery year peaks around 2016 (median) and ranges from 1992 to 2026, giving a clear timeline of the field's growth.

citing: row_count · column_count · discoverymethod · pl_orbper · pl_orbsmax · pl_bmassj · pl_rade · disc_year · sy_dist

Out[4]:

saturn.schema() · 11 columns

column kind n null% unique alerts
pl_name text 6,150 0.0% 6,150 near_unique short_text
hostname text 6,150 0.0% 4,582 one_word allcaps short_text duplicates
ra numeric 6,150 0.0% 4,579
dec numeric 6,150 0.0% 4,579
sy_dist numeric 6,150 2.1% 4,397 high_skew outliers
pl_orbper numeric 6,150 5.6% 5,791 high_skew outliers
pl_orbsmax numeric 6,150 37.4% 2,292 null_rate high_skew outliers
pl_bmassj numeric 6,150 50.3% 1,989 null_rate high_skew outliers
pl_rade numeric 6,150 25.7% 2,004 null_rate high_skew outliers
disc_year numeric 6,150 0.0% 34
discoverymethod categorical 6,150 0.0% 11
Fig 1.
discoverymethod · Shows how dominant the Transit method is versus all other detection techniques.
Show data table
Top values for discoverymethod (11 unique shown, of 11 total).
valuecountshare
Transit451773.4%
Radial Velocity118219.2%
Microlensing2754.5%
Imaging941.5%
Transit Timing Variations390.6%
Eclipse Timing Variations170.3%
Orbital Brightness Modulation90.1%
Pulsar Timing80.1%
Astrometry60.1%
Pulsation Timing Variations20.0%
Disk Kinematics10.0%
Fig 2.
disc_year · Reveals the timeline of exoplanet discoveries, clustered heavily in the 2010s.
Show data table
Histogram bins for disc_year (median: 2016.0).
bincount
1992 – 19932
1993 – 19940
1994 – 19951
1995 – 19951
1995 – 19966
1996 – 19971
1997 – 19980
1998 – 19996
1999 – 200013
2000 – 200016
2000 – 200112
2001 – 200229
2002 – 200322
2003 – 20040
2004 – 200527
2005 – 200636
2006 – 200632
2006 – 200752
2007 – 200863
2008 – 20090
2009 – 201091
2010 – 201198
2011 – 2012135
2012 – 2012139
2012 – 2013128
2013 – 2014869
2014 – 20150
2015 – 2016155
2016 – 20171496
2017 – 2018152
2018 – 2018315
2018 – 2019196
2019 – 2020234
2020 – 20210
2021 – 2022564
2022 – 2023369
2023 – 2023324
2023 – 2024259
2024 – 2025243
2025 – 202663
Fig 3.
pl_rade · Highlights the skewed distribution of planet radii (Earth radii) and the long tail of giants.
Show data table
Histogram bins for pl_rade (median: 2.43).
bincount
0.3098 – 2.4822355
2.482 – 4.6551149
4.655 – 6.827155
6.827 – 8.99993
8.999 – 11.17158
11.17 – 13.34266
13.34 – 15.52201
15.52 – 17.69107
17.69 – 19.8654
19.86 – 22.0315
22.03 – 24.218
24.21 – 26.382
26.38 – 28.550
28.55 – 30.721
30.72 – 32.91
32.9 – 35.071
35.07 – 37.240
37.24 – 39.410
39.41 – 41.590
41.59 – 43.760
43.76 – 45.930
45.93 – 48.10
48.1 – 50.280
50.28 – 52.450
52.45 – 54.620
54.62 – 56.790
56.79 – 58.960
58.96 – 61.140
61.14 – 63.310
63.31 – 65.480
65.48 – 67.650
67.65 – 69.830
69.83 – 720
72 – 74.170
74.17 – 76.340
76.34 – 78.521
78.52 – 80.690
80.69 – 82.860
82.86 – 85.030
85.03 – 87.211
Fig 4.
pl_orbper · Use a log scale — orbital periods span from under a day to millions of days.
Show data table
Histogram bins for pl_orbper (median: 11.129002275000001).
bincount
0.09071 – 2.01e+055800
2.01e+05 – 4.02e+050
4.02e+05 – 6.03e+050
6.03e+05 – 8.04e+050
8.04e+05 – 1.005e+060
1.005e+06 – 1.206e+060
1.206e+06 – 1.407e+060
1.407e+06 – 1.608e+060
1.608e+06 – 1.809e+061
1.809e+06 – 2.01e+060
2.01e+06 – 2.211e+060
2.211e+06 – 2.412e+060
2.412e+06 – 2.613e+060
2.613e+06 – 2.814e+060
2.814e+06 – 3.015e+060
3.015e+06 – 3.216e+060
3.216e+06 – 3.417e+060
3.417e+06 – 3.618e+060
3.618e+06 – 3.819e+060
3.819e+06 – 4.02e+060
4.02e+06 – 4.221e+060
4.221e+06 – 4.422e+060
4.422e+06 – 4.623e+060
4.623e+06 – 4.824e+060
4.824e+06 – 5.025e+060
5.025e+06 – 5.226e+060
5.226e+06 – 5.427e+060
5.427e+06 – 5.628e+060
5.628e+06 – 5.829e+061
5.829e+06 – 6.03e+060
6.03e+06 – 6.231e+060
6.231e+06 – 6.432e+060
6.432e+06 – 6.633e+060
6.633e+06 – 6.834e+060
6.834e+06 – 7.035e+060
7.035e+06 – 7.236e+060
7.236e+06 – 7.437e+061
7.437e+06 – 7.638e+060
7.638e+06 – 7.839e+060
7.839e+06 – 8.04e+061
Fig 5.
sy_dist · Shows how far the host systems are, with a long tail extending to ~8,340 parsecs.
Show data table
Histogram bins for sy_dist (median: 377.06).
bincount
1.301 – 209.82232
209.8 – 418.2969
418.2 – 626.7714
626.7 – 835.2594
835.2 – 1044539
1044 – 1252287
1252 – 1461187
1461 – 1669102
1669 – 187867
1878 – 208639
2086 – 229416
2294 – 250318
2503 – 27117
2711 – 29206
2920 – 31287
3128 – 33378
3337 – 35458
3545 – 37544
3754 – 39628
3962 – 41716
4171 – 43795
4379 – 45887
4588 – 47964
4796 – 50053
5005 – 52138
5213 – 54216
5421 – 56307
5630 – 583812
5838 – 604711
6047 – 625510
6255 – 646414
6464 – 667224
6672 – 688114
6881 – 708924
7089 – 729823
7298 – 75069
7506 – 771511
7715 – 79236
7923 – 81323
8132 – 83404
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
pl_nametext0.0%
hostnametext0.0%
ranumeric0.0%
decnumeric0.0%
sy_distnumeric2.1%
pl_orbpernumeric5.6%
pl_orbsmaxnumeric37.4%
pl_bmassjnumeric50.3%
pl_radenumeric25.7%
disc_yearnumeric0.0%
discoverymethodcategorical0.0%
Fig 7.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 8 numeric columns (values clipped to 2 decimals).
radecsy_distpl_orbperpl_orbsmaxpl_bmassjpl_radedisc_year
ra+1.00+0.36+0.08+0.05+0.04+0.11-0.10-0.06
dec+0.36+1.00+0.09+0.00+0.06+0.09-0.14-0.09
sy_dist+0.08+0.09+1.00-0.06+0.06+0.03-0.06-0.09
pl_orbper+0.05+0.00-0.06+1.00-0.01-0.02-0.01-0.02
pl_orbsmax+0.04+0.06+0.06-0.01+1.00+0.09-0.04+0.04
pl_bmassj+0.11+0.09+0.03-0.02+0.09+1.00+0.01+0.05
pl_rade-0.10-0.14-0.06-0.01-0.04+0.01+1.00+0.07
disc_year-0.06-0.09-0.09-0.02+0.04+0.05+0.07+1.00

pl_name text identifier

This is the planet name identifier (pl_name), a fully unique short text field across all 6150 rows with zero nulls or duplicates. Values are short (mean 11.4 chars, ~2.24 words) and dominated by astronomical catalog conventions: companion letters like 'b' (4535), 'c' (1052), 'd' (338) paired with host-star prefixes such as 'hd' (815), 'gj' (147), 'hip' (81), 'epic' (43). Uniqueness equals row count, so it functions as a primary key rather than a feature.

Treatment: Use as primary key for joins; drop from modelling features.

anthropic:claude-opus-4-7 · confidence high
Out[13]:

saturn.columns["pl_name"].stats

statvalue
n6,150
nulls0 (0.0%)
unique6,150
len_min 5
len_max 29
len_mean 11.42
len_median 11
len_p95 16.55
word_mean 2.242
word_median 2
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 4,713
readability_flesch_mean 97.69
emoji_rate 0
url_rate 0
one_word_rate 0.008293
allcaps_rate 0.008618
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: short_text95th-percentile length under 20 chars
Fig 8.
Character-length distribution for pl_name.
Show data table
Character-length distribution for pl_name (mean: 11.423414634146342).
charscount
5 – 62
6 – 618
6 – 70
7 – 7119
7 – 80
8 – 9584
9 – 9533
9 – 100
10 – 101076
10 – 110
11 – 12752
12 – 121560
12 – 130
13 – 131110
13 – 140
14 – 1519
15 – 1517
15 – 160
16 – 1652
16 – 170
17 – 186
18 – 189
18 – 190
19 – 1936
19 – 200
20 – 21137
21 – 2196
21 – 220
22 – 221
22 – 230
23 – 244
24 – 243
24 – 250
25 – 2512
25 – 260
26 – 272
27 – 270
27 – 280
28 – 281
28 – 291

hostname text foreign_key

This column holds astronomical host-star identifiers (KOI-351, TRAPPIST-1, HD 110067, HIP 41378), with the 'hd' prefix dominating at 815 occurrences and catalog prefixes like GJ, HIP, EPIC, 2MASS, TIC following. Values are short (mean length 9.4, median 1 word) and 51.5% are all-caps, consistent with catalog naming conventions. Duplication is high: 4582 unique values across 6150 rows (25.5% duplicate rate), suggesting multiple records per host star — likely one row per planet.

Treatment: left-join on this id to a star-level table; do not use as a model feature directly.

anthropic:claude-opus-4-7 · confidence high
Out[16]:

saturn.columns["hostname"].stats

statvalue
n6,150
nulls0 (0.0%)
unique4,582
len_min 3
len_max 27
len_mean 9.424
len_median 9
len_p95 15.55
word_mean 1.254
word_median 1
n_empty 0
n_duplicates 1,568
duplicate_rate 0.255
vocab_size 4,671
readability_flesch_mean 77.19
emoji_rate 0
url_rate 0
one_word_rate 0.7579
allcaps_rate 0.5154
boilerplate_rate 0
alert: one_word75.8% rows are a single word
alert: allcaps51.5% rows are all-caps
alert: short_text95th-percentile length under 20 chars
alert: duplicates25.5% duplicate strings
Fig 9.
Character-length distribution for hostname.
Show data table
Character-length distribution for hostname (mean: 9.423739837398374).
charscount
3 – 42
4 – 418
4 – 50
5 – 5121
5 – 60
6 – 7588
7 – 7544
7 – 80
8 – 81095
8 – 90
9 – 10708
10 – 101562
10 – 110
11 – 111112
11 – 120
12 – 1319
13 – 1317
13 – 140
14 – 1452
14 – 150
15 – 164
16 – 1610
16 – 170
17 – 1736
17 – 180
18 – 19138
19 – 1996
19 – 200
20 – 202
20 – 210
21 – 224
22 – 222
22 – 230
23 – 2316
23 – 240
24 – 252
25 – 251
25 – 260
26 – 260
26 – 271

ra numeric feature

This column is almost certainly Right Ascension (ra), a celestial longitude coordinate, with values spanning 0.186 to 359.97 — the full 0–360° range expected for RA. The distribution is left-skewed (skew -1.08) with median 284.91 well above the mean 232.89, suggesting non-uniform sky coverage concentrated toward higher RA values. With 4579 unique values across 6150 rows and no nulls or outliers, the column is clean but not a key.

Treatment: Treat as a circular/angular feature; consider sin/cos encoding before modelling rather than using raw degrees.

anthropic:claude-opus-4-7 · confidence high
Out[19]:

saturn.columns["ra"].stats

statvalue
n6,150
nulls0 (0.0%)
unique4,579
min 0.1856
max 360
mean 232.9
median 284.9
std 91.68
q1 173.3
q3 293.2
iqr 119.9
skew -1.078
kurtosis -0.144
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 10.
Distribution of ra. Vertical dash marks the median.
Show data table
Histogram bins for ra (median: 284.91497345000005).
bincount
0.1856 – 9.1861
9.18 – 18.1894
18.18 – 27.1785
27.17 – 36.1657
36.16 – 45.1658
45.16 – 54.1579
54.15 – 63.1592
63.15 – 72.14100
72.14 – 81.1474
81.14 – 90.1371
90.13 – 99.1377
99.13 – 108.172
108.1 – 117.159
117.1 – 126.175
126.1 – 135.1153
135.1 – 144.146
144.1 – 153.165
153.1 – 162.177
162.1 – 171.1123
171.1 – 180.1100
180.1 – 189.1108
189.1 – 198.180
198.1 – 207.199
207.1 – 216.156
216.1 – 225.151
225.1 – 23478
234 – 24359
243 – 25261
252 – 26183
261 – 270257
270 – 279143
279 – 288908
288 – 2971716
297 – 306414
306 – 31552
315 – 32446
324 – 33365
333 – 34293
342 – 35188
351 – 36075

dec numeric feature

This is almost certainly declination (dec) in degrees, an astronomical sky-coordinate: values span -88.12 to 86.86, well within the ±90° valid range. The distribution is left-skewed (skew -0.83) with a median of 39.13 sitting well above the mean of 18.05, suggesting a sample weighted toward the northern celestial hemisphere despite reaching deep southern declinations. With 4579 unique values across 6150 rows and no nulls or outliers, coverage is clean.

Treatment: Use as-is for spatial joins or convert to radians/sin(dec) before modelling sky density.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["dec"].stats

statvalue
n6,150
nulls0 (0.0%)
unique4,579
min -88.12
max 86.86
mean 18.05
median 39.13
std 37.07
q1 -11.17
q3 45.38
iqr 56.55
skew -0.8327
kurtosis -0.4469
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 11.
Distribution of dec. Vertical dash marks the median.
Show data table
Histogram bins for dec (median: 39.1271187).
bincount
-88.12 – -83.754
-83.75 – -79.3726
-79.37 – -7531
-75 – -70.6226
-70.62 – -66.2558
-66.25 – -61.8761
-61.87 – -57.576
-57.5 – -53.1252
-53.12 – -48.7574
-48.75 – -44.3875
-44.38 – -4079
-40 – -35.6369
-35.63 – -31.25142
-31.25 – -26.88283
-26.88 – -22.5152
-22.5 – -18.13126
-18.13 – -13.75130
-13.75 – -9.379154
-9.379 – -5.005133
-5.005 – -0.6304127
-0.6304 – 3.744148
3.744 – 8.119112
8.119 – 12.49104
12.49 – 16.87108
16.87 – 21.24138
21.24 – 25.62110
25.62 – 29.9983
29.99 – 34.3775
34.37 – 38.74243
38.74 – 43.111029
43.11 – 47.491113
47.49 – 51.86671
51.86 – 56.2453
56.24 – 60.6151
60.61 – 64.9958
64.99 – 69.3661
69.36 – 73.7449
73.74 – 78.1136
78.11 – 82.4915
82.49 – 86.8615

sy_dist numeric feature

Likely the system distance to the host star/planet (sy_dist) in parsecs, with 6,150 rows and 4,397 unique values. Distribution is heavily right-skewed (skew 3.97, kurtosis 17.0): median 377.06 sits well below the mean 713.31, and values span 1.30 to 8,340 with 321 outliers (5.3%). Null rate is low at 2.07% and there are no zeros.

Treatment: log-transform before modelling to tame the right skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["sy_dist"].stats

statvalue
n6,150
nulls127 (2.1%)
unique4,397
min 1.301
max 8,340
mean 713.3
median 377.1
std 1212
q1 100.3
q3 836.7
iqr 736.4
skew 3.967
kurtosis 17.02
n_outliers 321
outlier_rate 0.0533
zero_rate 0
alert: high_skewskew=+3.97
alert: outliers5.3% rows beyond 1.5 IQR
Fig 12.
Distribution of sy_dist. Vertical dash marks the median.
Show data table
Histogram bins for sy_dist (median: 377.06).
bincount
1.301 – 209.82232
209.8 – 418.2969
418.2 – 626.7714
626.7 – 835.2594
835.2 – 1044539
1044 – 1252287
1252 – 1461187
1461 – 1669102
1669 – 187867
1878 – 208639
2086 – 229416
2294 – 250318
2503 – 27117
2711 – 29206
2920 – 31287
3128 – 33378
3337 – 35458
3545 – 37544
3754 – 39628
3962 – 41716
4171 – 43795
4379 – 45887
4588 – 47964
4796 – 50053
5005 – 52138
5213 – 54216
5421 – 56307
5630 – 583812
5838 – 604711
6047 – 625510
6255 – 646414
6464 – 667224
6672 – 688114
6881 – 708924
7089 – 729823
7298 – 75069
7506 – 771511
7715 – 79236
7923 – 81323
8132 – 83404

pl_orbper numeric feature

This is almost certainly planetary orbital period (likely in days), with 5791 unique values across 6150 rows and a 5.63% null rate. The distribution is wildly right-skewed: median is 11.13 while mean is 4469.34 and max reaches 8,040,000, producing a skew of 43.8 and kurtosis near 1970. About 17.4% of values (1012) flag as outliers, consistent with a mix of short-period close-in planets and extreme long-period objects.

Treatment: log-transform before modelling and impute the ~5.6% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["pl_orbper"].stats

statvalue
n6,150
nulls346 (5.6%)
unique5,791
min 0.09071
max 8.04e+06
mean 4469
median 11.13
std 1.633e+05
q1 4.352
q3 39.69
iqr 35.34
skew 43.82
kurtosis 1970
n_outliers 1,012
outlier_rate 0.1744
zero_rate 0
alert: high_skewskew=+43.82
alert: outliers17.4% rows beyond 1.5 IQR
Fig 13.
Distribution of pl_orbper. Vertical dash marks the median.
Show data table
Histogram bins for pl_orbper (median: 11.129002275000001).
bincount
0.09071 – 2.01e+055800
2.01e+05 – 4.02e+050
4.02e+05 – 6.03e+050
6.03e+05 – 8.04e+050
8.04e+05 – 1.005e+060
1.005e+06 – 1.206e+060
1.206e+06 – 1.407e+060
1.407e+06 – 1.608e+060
1.608e+06 – 1.809e+061
1.809e+06 – 2.01e+060
2.01e+06 – 2.211e+060
2.211e+06 – 2.412e+060
2.412e+06 – 2.613e+060
2.613e+06 – 2.814e+060
2.814e+06 – 3.015e+060
3.015e+06 – 3.216e+060
3.216e+06 – 3.417e+060
3.417e+06 – 3.618e+060
3.618e+06 – 3.819e+060
3.819e+06 – 4.02e+060
4.02e+06 – 4.221e+060
4.221e+06 – 4.422e+060
4.422e+06 – 4.623e+060
4.623e+06 – 4.824e+060
4.824e+06 – 5.025e+060
5.025e+06 – 5.226e+060
5.226e+06 – 5.427e+060
5.427e+06 – 5.628e+060
5.628e+06 – 5.829e+061
5.829e+06 – 6.03e+060
6.03e+06 – 6.231e+060
6.231e+06 – 6.432e+060
6.432e+06 – 6.633e+060
6.633e+06 – 6.834e+060
6.834e+06 – 7.035e+060
7.035e+06 – 7.236e+060
7.236e+06 – 7.437e+061
7.437e+06 – 7.638e+060
7.638e+06 – 7.839e+060
7.839e+06 – 8.04e+061

pl_orbsmax numeric feature

This is the planet's orbital semi-major axis (pl_orbsmax), a numeric astrophysical feature spanning 0.0044 to 19000.0 with median 0.1159 — typical AU-scale values dominated by close-in planets but with extreme wide-orbit outliers. Skew of 34.66 and kurtosis of 1394.96 are extraordinary, and 604 outliers (15.7%) plus a 37.4% null rate make raw use risky. Mean (21.65) sits far above the q3 of 0.812, confirming a handful of values dominate the scale.

Treatment: log-transform and impute missing before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["pl_orbsmax"].stats

statvalue
n6,150
nulls2,301 (37.4%)
unique2,292
min 0.0044
max 19,000
mean 21.65
median 0.1159
std 412.2
q1 0.0538
q3 0.812
iqr 0.7582
skew 34.66
kurtosis 1395
n_outliers 604
outlier_rate 0.1569
zero_rate 0
alert: null_rate37.4% null
alert: high_skewskew=+34.66
alert: outliers15.7% rows beyond 1.5 IQR
Fig 14.
Distribution of pl_orbsmax. Vertical dash marks the median.
Show data table
Histogram bins for pl_orbsmax (median: 0.1159).
bincount
0.0044 – 4753826
475 – 95011
950 – 14251
1425 – 19001
1900 – 23752
2375 – 28501
2850 – 33252
3325 – 38001
3800 – 42750
4275 – 47500
4750 – 52250
5225 – 57000
5700 – 61751
6175 – 66500
6650 – 71250
7125 – 76001
7600 – 80750
8075 – 85500
8550 – 90250
9025 – 95000
9500 – 99750
9975 – 1.045e+040
1.045e+04 – 1.093e+040
1.093e+04 – 1.14e+040
1.14e+04 – 1.188e+040
1.188e+04 – 1.235e+041
1.235e+04 – 1.283e+040
1.283e+04 – 1.33e+040
1.33e+04 – 1.378e+040
1.378e+04 – 1.425e+040
1.425e+04 – 1.473e+040
1.473e+04 – 1.52e+040
1.52e+04 – 1.568e+040
1.568e+04 – 1.615e+040
1.615e+04 – 1.663e+040
1.663e+04 – 1.71e+040
1.71e+04 – 1.758e+040
1.758e+04 – 1.805e+040
1.805e+04 – 1.853e+040
1.853e+04 – 1.9e+041

pl_bmassj numeric feature

This is the planet mass measured in Jupiter masses (pl_bmassj), a numeric astrophysical feature. Half the rows are null (0.5026) and the distribution is heavily right-skewed (skew 3.07, kurtosis 10.26): the median is 0.538 MJ but the mean is 2.42 MJ and values stretch from 6.293e-05 up to 30.0, with 410 outliers (13.4%). The huge dynamic range across ~5 orders of magnitude is the dominant signal.

Treatment: Log-transform before modelling and decide on an imputation/missing-indicator strategy for the 50% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["pl_bmassj"].stats

statvalue
n6,150
nulls3,091 (50.3%)
unique1,989
min 6.293e-05
max 30
mean 2.417
median 0.538
std 4.706
q1 0.03744
q3 2.197
iqr 2.16
skew 3.075
kurtosis 10.26
n_outliers 410
outlier_rate 0.134
zero_rate 0
alert: null_rate50.3% null
alert: high_skewskew=+3.07
alert: outliers13.4% rows beyond 1.5 IQR
Fig 15.
Distribution of pl_bmassj. Vertical dash marks the median.
Show data table
Histogram bins for pl_bmassj (median: 0.53802617).
bincount
6.293e-05 – 0.75011721
0.7501 – 1.5390
1.5 – 2.25195
2.25 – 3133
3 – 3.7586
3.75 – 4.563
4.5 – 5.2551
5.25 – 649
6 – 6.7537
6.75 – 7.521
7.5 – 8.2535
8.25 – 924
9 – 9.7526
9.75 – 10.525
10.5 – 11.2519
11.25 – 1217
12 – 12.7513
12.75 – 13.518
13.5 – 14.2516
14.25 – 158
15 – 15.757
15.75 – 16.511
16.5 – 17.254
17.25 – 186
18 – 18.756
18.75 – 19.56
19.5 – 20.2510
20.25 – 2114
21 – 21.754
21.75 – 22.57
22.5 – 23.255
23.25 – 245
24 – 24.752
24.75 – 25.52
25.5 – 26.251
26.25 – 276
27 – 27.755
27.75 – 28.56
28.5 – 29.251
29.25 – 304

pl_rade numeric feature

This is `pl_rade`, the planetary radius (in Earth radii) for confirmed exoplanets. Values span 0.31 to 87.21 with a median of 2.43, but heavy right skew (3.22) and extreme kurtosis (28.66) push the mean to 4.46 and flag 872 outliers (19.1%). About 25.7% of rows are null, so a quarter of planets lack a measured radius.

Treatment: Log-transform and impute or flag the 25.7% missing before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["pl_rade"].stats

statvalue
n6,150
nulls1,582 (25.7%)
unique2,004
min 0.3098
max 87.21
mean 4.456
median 2.43
std 4.952
q1 1.62
q3 4.06
iqr 2.44
skew 3.218
kurtosis 28.66
n_outliers 872
outlier_rate 0.1909
zero_rate 0
alert: null_rate25.7% null
alert: high_skewskew=+3.22
alert: outliers19.1% rows beyond 1.5 IQR
Fig 16.
Distribution of pl_rade. Vertical dash marks the median.
Show data table
Histogram bins for pl_rade (median: 2.43).
bincount
0.3098 – 2.4822355
2.482 – 4.6551149
4.655 – 6.827155
6.827 – 8.99993
8.999 – 11.17158
11.17 – 13.34266
13.34 – 15.52201
15.52 – 17.69107
17.69 – 19.8654
19.86 – 22.0315
22.03 – 24.218
24.21 – 26.382
26.38 – 28.550
28.55 – 30.721
30.72 – 32.91
32.9 – 35.071
35.07 – 37.240
37.24 – 39.410
39.41 – 41.590
41.59 – 43.760
43.76 – 45.930
45.93 – 48.10
48.1 – 50.280
50.28 – 52.450
52.45 – 54.620
54.62 – 56.790
56.79 – 58.960
58.96 – 61.140
61.14 – 63.310
63.31 – 65.480
65.48 – 67.650
67.65 – 69.830
69.83 – 720
72 – 74.170
74.17 – 76.340
76.34 – 78.521
78.52 – 80.690
80.69 – 82.860
82.86 – 85.030
85.03 – 87.211

disc_year numeric timestamp

Discovery year for each record, ranging from 1992 to 2026 with a median of 2016 and IQR of 2014-2021. The distribution is left-skewed (skew -0.69), reflecting that most discoveries cluster in recent years while a long tail of earlier years produces 109 outliers (1.8%). Only 34 distinct years appear across 6150 rows, and nulls are negligible (0.02%).

Treatment: Treat as a discrete year; bin or use directly as an ordinal feature.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["disc_year"].stats

statvalue
n6,150
nulls1 (0.0%)
unique34
min 1,992
max 2,026
mean 2017
median 2,016
std 4.965
q1 2,014
q3 2,021
iqr 7
skew -0.6885
kurtosis 1.262
n_outliers 109
outlier_rate 0.01773
zero_rate 0
Fig 17.
Distribution of disc_year. Vertical dash marks the median.
Show data table
Histogram bins for disc_year (median: 2016.0).
bincount
1992 – 19932
1993 – 19940
1994 – 19951
1995 – 19951
1995 – 19966
1996 – 19971
1997 – 19980
1998 – 19996
1999 – 200013
2000 – 200016
2000 – 200112
2001 – 200229
2002 – 200322
2003 – 20040
2004 – 200527
2005 – 200636
2006 – 200632
2006 – 200752
2007 – 200863
2008 – 20090
2009 – 201091
2010 – 201198
2011 – 2012135
2012 – 2012139
2012 – 2013128
2013 – 2014869
2014 – 20150
2015 – 2016155
2016 – 20171496
2017 – 2018152
2018 – 2018315
2018 – 2019196
2019 – 2020234
2020 – 20210
2021 – 2022564
2022 – 2023369
2023 – 2023324
2023 – 2024259
2024 – 2025243
2025 – 202663

discoverymethod categorical feature

Categorical label recording the technique used to detect each exoplanet, with 11 distinct methods across 6150 rows and no nulls. The distribution is heavily concentrated: 'Transit' accounts for 73.4% of records and 'Radial Velocity' another 1182, leaving the remaining 9 methods as long-tail rarities (down to 2 'Pulsation Timing Variations'). Entropy ratio of 0.34 confirms the imbalance.

Treatment: One-hot encode, optionally collapsing the rare methods into an 'Other' bucket given the severe imbalance.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["discoverymethod"].stats

statvalue
n6,150
nulls0 (0.0%)
unique11
top_value Transit
top_rate 0.7345
cardinality 11
entropy 1.189
entropy_ratio 0.3436
Fig 18.
Top values for discoverymethod.
Show data table
Top values for discoverymethod (11 unique shown, of 11 total).
valuecountshare
Transit451773.4%
Radial Velocity118219.2%
Microlensing2754.5%
Imaging941.5%
Transit Timing Variations390.6%
Eclipse Timing Variations170.3%
Orbital Brightness Modulation90.1%
Pulsar Timing80.1%
Astrometry60.1%
Pulsation Timing Variations20.0%
Disk Kinematics10.0%

How to cite

click to copy

BibTeX
@misc{saturn-exoplanets-exoplanets-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: exoplanets exoplanets},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/exoplanets-exoplanets}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: exoplanets exoplanets. Source: /home/coolhand/data/celestial/exoplanets/exoplanets.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/exoplanets-exoplanets