saturn·

deepsky ngc

saturn notebook · generated 2026-05-01 Report Notebook

Overview

Source: /home/coolhand/data/celestial/deepsky/NGC.csv

Saturn profiled 13,969 rows across 32 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/data/celestial/deepsky/NGC.csv",
    "--findings", "deepsky-ngc.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This is an astronomical catalog of 13,969 deep-sky objects (NGC.csv) with 32 columns covering identifiers, sky coordinates, magnitudes across multiple bands, morphological classifications, and kinematic measurements like radial velocity and redshift. The catalog is dominated by galaxies — 75% of entries are type 'G' — with smaller populations of open clusters, duplicates, stars, and planetary nebulae. Object morphology (Hubble type) and constellation distribution are the most informative descriptive fields, while RadVel and Redshift give a clean view of the cosmological distance distribution skewing toward nearby objects (median z ≈ 0.016). Be aware that many columns are very sparsely populated: parallax (Pax), proper motions, and central-star magnitudes are >92% null, so any analysis on those fields will be limited to a small subset. Size measurements (MajAx, MinAx) are extremely skewed with heavy outliers, suggesting a few very large objects dominate the tails.

citing: Type · Hubble · Const · Redshift · RadVel · V-Mag · Pax · MajAx · B-Mag

Out[4]:

saturn.schema() · 32 columns

column kind n null% unique alerts
Name text 13,969 0.0% 13,969 near_unique one_word allcaps short_text
Type categorical 13,969 0.0% 20
RA unknown 13,969 0.0% skipped
Dec text 13,969 0.1% 13,282 near_unique one_word allcaps short_text
Const categorical 13,969 0.1% 89
MajAx numeric 13,969 14.1% 734 high_skew outliers
MinAx numeric 13,969 20.9% 465 null_rate high_skew outliers
PosAng numeric 13,969 23.2% 181 null_rate
B-Mag numeric 13,969 18.9% 1,056 outliers
V-Mag numeric 13,969 69.8% 774 null_rate outliers
J-Mag numeric 13,969 30.9% 804 null_rate
H-Mag numeric 13,969 30.6% 831 null_rate
K-Mag numeric 13,969 30.9% 823 null_rate
SurfBr numeric 13,969 26.8% 438 null_rate
Hubble categorical 13,969 27.3% 30 null_rate
Pax numeric 13,969 94.8% 676 null_rate high_skew outliers
Pm-RA numeric 13,969 92.5% 954 null_rate high_skew outliers
Pm-Dec numeric 13,969 92.5% 961 null_rate outliers
RadVel numeric 13,969 24.2% 6,691 null_rate
Redshift numeric 13,969 24.2% 7,717 null_rate
Cstar U-Mag numeric 13,969 99.9% 16 null_rate
Cstar B-Mag numeric 13,969 99.2% 97 null_rate
Cstar V-Mag numeric 13,969 99.3% 82 null_rate
M categorical 13,969 99.2% 107 long_tail null_rate
NGC categorical 13,969 93.5% 891 long_tail null_rate
IC categorical 13,969 96.7% 452 long_tail null_rate
Cstar Names categorical 13,969 99.4% 87 long_tail null_rate
Identifiers text 13,969 12.8% 12,179 near_unique allcaps
Common names categorical 13,969 99.1% 127 long_tail null_rate
NED notes text 13,969 83.6% 1,198 multilingual null_rate duplicates
OpenNGC notes categorical 13,969 98.5% 159 long_tail null_rate
Sources categorical 13,969 0.0% 344 long_tail
Fig 1.
Type · Shows the catalog is overwhelmingly galaxies (75%), with a long tail of clusters, stars, and nebulae.
Show data table
Top values for Type (20 unique shown, of 20 total).
valuecountshare
G1048175.0%
OCl6524.7%
Dup6514.7%
*5463.9%
Other4193.0%
**2431.7%
GPair2311.7%
GCl2041.5%
PN1300.9%
Neb940.7%
HII820.6%
Cl+N670.5%
*Ass620.4%
RfN380.3%
GTrpl260.2%
SNR110.1%
GGroup110.1%
NonEx100.1%
EmN80.1%
Nova30.0%
Fig 2.
Hubble · Distribution of Hubble morphological classes — ellipticals (E) and lenticulars (S0/S0-a) lead, followed by spiral subtypes.
Show data table
Top values for Hubble (20 unique shown, of 30 total).
valuecountshare
E154611.1%
S010737.7%
S0-a9967.1%
Sc9486.8%
Sb7585.4%
Sbc7275.2%
E-S06234.5%
Sa4783.4%
Sab4613.3%
SBb3232.3%
SABc2812.0%
SBc2721.9%
SBbc2691.9%
SABb2551.8%
SBa1861.3%
SBab1601.1%
SABa1220.9%
I1040.7%
Scd1030.7%
Sd740.5%
Fig 3.
Const · Top constellations by object count — Virgo, Coma, and Leo dominate, reflecting known galaxy clusters.
Show data table
Top values for Const (20 unique shown, of 89 total).
valuecountshare
Vir12368.8%
Com10447.5%
Leo8766.3%
Cet6864.9%
UMa5433.9%
Boo5283.8%
CVn5183.7%
Psc4903.5%
Peg4623.3%
Eri4573.3%
Dra3802.7%
Hya3762.7%
Cnc3542.5%
Dor3412.4%
Her3402.4%
Pav2892.1%
Aqr2772.0%
Cen2311.7%
And2091.5%
Per1631.2%
Fig 4.
Redshift · Redshift distribution is right-skewed with most objects at z < 0.03; useful for gauging the survey's distance reach.
Show data table
Histogram bins for Redshift (median: 0.01643).
bincount
-0.00161 – 0.0032211111
0.003221 – 0.0080511535
0.008051 – 0.012881344
0.01288 – 0.017711763
0.01771 – 0.022541330
0.02254 – 0.027371167
0.02737 – 0.0322844
0.0322 – 0.03704526
0.03704 – 0.04187328
0.04187 – 0.0467175
0.0467 – 0.0515396
0.05153 – 0.0563692
0.05636 – 0.0611951
0.06119 – 0.0660277
0.06602 – 0.0708546
0.07085 – 0.0756833
0.07568 – 0.0805118
0.08051 – 0.0853426
0.08534 – 0.090176
0.09017 – 0.0958
0.095 – 0.099831
0.09983 – 0.10471
0.1047 – 0.10950
0.1095 – 0.11430
0.1143 – 0.11922
0.1192 – 0.1241
0.124 – 0.12880
0.1288 – 0.13360
0.1336 – 0.13850
0.1385 – 0.14330
0.1433 – 0.14810
0.1481 – 0.1530
0.153 – 0.15780
0.1578 – 0.16260
0.1626 – 0.16750
0.1675 – 0.17230
0.1723 – 0.17710
0.1771 – 0.1821
0.182 – 0.18680
0.1868 – 0.19161
Fig 5.
B-Mag · B-band magnitude distribution centers near 14.4 with a faint tail; highlights the catalog's brightness completeness.
Show data table
Histogram bins for B-Mag (median: 14.42).
bincount
1.51 – 1.9981
1.998 – 2.4850
2.485 – 2.9733
2.973 – 3.464
3.46 – 3.9473
3.947 – 4.4357
4.435 – 4.92313
4.923 – 5.4112
5.41 – 5.8979
5.897 – 6.38528
6.385 – 6.87219
6.872 – 7.3645
7.36 – 7.84737
7.847 – 8.33540
8.335 – 8.82242
8.822 – 9.3149
9.31 – 9.79752
9.797 – 10.2973
10.29 – 10.7792
10.77 – 11.26133
11.26 – 11.75219
11.75 – 12.23322
12.23 – 12.72475
12.72 – 13.21762
13.21 – 13.71038
13.7 – 14.181374
14.18 – 14.671735
14.67 – 15.161765
15.16 – 15.651468
15.65 – 16.14678
16.14 – 16.62392
16.62 – 17.11236
17.11 – 17.6126
17.6 – 18.0948
18.09 – 18.5719
18.57 – 19.068
19.06 – 19.556
19.55 – 20.040
20.04 – 20.520
20.52 – 21.012
Fig 6.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
Nametext0.0%
Typecategorical0.0%
RAunknown0.0%
Dectext0.1%
Constcategorical0.1%
MajAxnumeric14.1%
MinAxnumeric20.9%
PosAngnumeric23.2%
B-Magnumeric18.9%
V-Magnumeric69.8%
J-Magnumeric30.9%
H-Magnumeric30.6%
K-Magnumeric30.9%
SurfBrnumeric26.8%
Hubblecategorical27.3%
Paxnumeric94.8%
Pm-RAnumeric92.5%
Pm-Decnumeric92.5%
RadVelnumeric24.2%
Redshiftnumeric24.2%
Cstar U-Magnumeric99.9%
Cstar B-Magnumeric99.2%
Cstar V-Magnumeric99.3%
Mcategorical99.2%
NGCcategorical93.5%
ICcategorical96.7%
Cstar Namescategorical99.4%
Identifierstext12.8%
Common namescategorical99.1%
NED notestext83.6%
OpenNGC notescategorical98.5%
Sourcescategorical0.0%
Fig 7.
Language mix across all text columns (per-string detection, sampled).
Show data table
Per-language counts (total 783 detected strings).
langcountshare
en78099.6%
pt10.1%
it10.1%
bs10.1%
Fig 8.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 12 numeric columns (values clipped to 2 decimals).
MajAxMinAxPosAngB-MagV-MagJ-MagH-MagK-MagSurfBrPaxPm-RAPm-Dec
MajAx+1.00+0.13-0.47-0.38-0.45-0.05-0.41+0.04+0.39-0.15-0.02+0.13
MinAx+0.13+1.00-0.04-0.50-0.78-0.29-0.15-0.23-0.07+0.27+0.13+0.13
PosAng-0.47-0.04+1.00+0.05+0.26-0.07-0.31-0.16+0.07+0.22+0.13-0.06
B-Mag-0.38-0.50+0.05+1.00+0.67+0.23+0.22+0.16-0.10+0.26+0.07+0.07
V-Mag-0.45-0.78+0.26+0.67+1.00+0.33+0.26+0.22-0.20-0.19-0.13-0.14
J-Mag-0.05-0.29-0.07+0.23+0.33+1.00-0.06+0.99-0.29-0.22+0.32+0.36
H-Mag-0.41-0.15-0.31+0.22+0.26-0.06+1.00-0.11+0.01-0.16-0.28-0.22
K-Mag+0.04-0.23-0.16+0.16+0.22+0.99-0.11+1.00-0.28-0.26+0.30+0.37
SurfBr+0.39-0.07+0.07-0.10-0.20-0.29+0.01-0.28+1.00-0.13-0.10-0.09
Pax-0.15+0.27+0.22+0.26-0.19-0.22-0.16-0.26-0.13+1.00+0.34+0.10
Pm-RA-0.02+0.13+0.13+0.07-0.13+0.32-0.28+0.30-0.10+0.34+1.00+0.93
Pm-Dec+0.13+0.13-0.06+0.07-0.14+0.36-0.22+0.37-0.09+0.10+0.93+1.00

Name text identifier

This appears to be an astronomical object designation field — short, all-caps, single-token codes like 'NED01', 'NED02', and 'IC' prefixed catalog numbers. Every one of the 13,969 rows is unique with zero nulls, lengths tightly bounded between 6 and 13 characters, and 97.4% are single-word tokens. The 'NED01'/'NED02' values recurring 175 and 168 times sit oddly against the n_unique=13969 claim, suggesting these are prefixes/substrings counted at the word level rather than full duplicates.

Treatment: Treat as a primary key; drop from modelling features and use only for joins or lookups.

anthropic:claude-opus-4-7 · confidence high
Out[14]:

saturn.columns["Name"].stats

statvalue
n13,969
nulls0 (0.0%)
unique13,969
len_min 6
len_max 13
len_mean 6.784
len_median 7
len_p95 8
word_mean 1.026
word_median 1
n_empty 0
n_duplicates 0
duplicate_rate 0
vocab_size 13,607
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 0.9738
allcaps_rate 1
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: one_word97.4% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 9.
Character-length distribution for Name.
Show data table
Character-length distribution for Name (mean: 6.783735414131291).
charscount
6 – 65386
6 – 60
6 – 70
7 – 70
7 – 70
7 – 77868
7 – 70
7 – 70
7 – 80
8 – 80
8 – 80
8 – 8349
8 – 80
8 – 80
8 – 90
9 – 90
9 – 90
9 – 90
9 – 90
9 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 120
12 – 120
12 – 120
12 – 12180
12 – 120
12 – 120
12 – 130
13 – 130
13 – 13186

Type categorical label

A categorical type code with 20 distinct values across 13,969 rows and no nulls. The distribution is highly imbalanced: 'G' alone accounts for 75.0% of records (10,481), with the next categories ('OCl', 'Dup', '*', 'Other') each below 5%, yielding an entropy ratio of just 0.38. The codes ('G', 'OCl', 'GPair', 'GCl', 'PN', 'Neb') suggest astronomical object classifications (galaxies, open/globular clusters, planetary nebulae, nebulae).

Treatment: Group rare categories or stratify by 'G' dominance before any classification task.

anthropic:claude-opus-4-7 · confidence high
Out[17]:

saturn.columns["Type"].stats

statvalue
n13,969
nulls0 (0.0%)
unique20
top_value G
top_rate 0.7503
cardinality 20
entropy 1.646
entropy_ratio 0.3808
Fig 10.
Top values for Type.
Show data table
Top values for Type (20 unique shown, of 20 total).
valuecountshare
G1048175.0%
OCl6524.7%
Dup6514.7%
*5463.9%
Other4193.0%
**2431.7%
GPair2311.7%
GCl2041.5%
PN1300.9%
Neb940.7%
HII820.6%
Cl+N670.5%
*Ass620.4%
RfN380.3%
GTrpl260.2%
SNR110.1%
GGroup110.1%
NonEx100.1%
EmN80.1%
Nova30.0%

RA unknown other

The column is named RA, which in astronomical datasets typically denotes Right Ascension, but saturn skipped profiling so kind and uniqueness are unresolved. The only confirmed signals are 13969 rows with a 0.0 null rate; no distributional statistics are available.

Treatment: Re-profile with an explicit type cast before deciding on use.

anthropic:claude-opus-4-7 · confidence low
Out[20]:

saturn.columns["RA"].stats

statvalue
n13,969
nulls0 (0.0%)
unique
alert: skippedno profiler for kind=unknown

Dec text feature

This column holds astronomical Declination coordinates formatted as signed sexagesimal strings (e.g. '-19:28:17.6'), with every value 9-11 characters long and exactly one token. It is near-unique (13,282 distinct of 13,969) yet shows 680 duplicates (4.87%), suggesting repeated observations of the same sky positions. The 'allcaps' and Flesch=121 signals are artefacts of the numeric format, not real prose.

Treatment: Parse the sexagesimal string into signed decimal degrees before any numeric use.

anthropic:claude-opus-4-7 · confidence high
Out[22]:

saturn.columns["Dec"].stats

statvalue
n13,969
nulls7 (0.1%)
unique13,282
len_min 9
len_max 11
len_mean 11
len_median 11
len_p95 11
word_mean 1
word_median 1
n_empty 0
n_duplicates 680
duplicate_rate 0.0487
vocab_size 13,282
readability_flesch_mean 121.2
emoji_rate 0
url_rate 0
one_word_rate 1
allcaps_rate 1
boilerplate_rate 0
alert: near_unique95.1% of rows are unique strings
alert: one_word100.0% rows are a single word
alert: allcaps100.0% rows are all-caps
alert: short_text95th-percentile length under 20 chars
Fig 11.
Character-length distribution for Dec.
Show data table
Character-length distribution for Dec (mean: 10.995129637587738).
charscount
9 – 934
9 – 90
9 – 90
9 – 90
9 – 90
9 – 90
9 – 90
9 – 90
9 – 90
9 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 100
10 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 110
11 – 1113928

Const categorical feature

This column holds three-letter constellation abbreviations (Vir, Com, Leo, Cet, UMa…), with 89 distinct values across 13,969 rows — consistent with the 88 IAU constellations plus possibly one stray code. Distribution is fairly even (entropy ratio 0.85), though Virgo leads at 8.85% and the top three constellations account for roughly a fifth of records. Nulls are negligible (0.05%).

Treatment: One-hot or target-encode as a categorical feature.

anthropic:claude-opus-4-7 · confidence high
Out[25]:

saturn.columns["Const"].stats

statvalue
n13,969
nulls7 (0.1%)
unique89
top_value Vir
top_rate 0.08853
cardinality 89
entropy 5.489
entropy_ratio 0.8476
Fig 12.
Top values for Const.
Show data table
Top values for Const (20 unique shown, of 89 total).
valuecountshare
Vir12368.8%
Com10447.5%
Leo8766.3%
Cet6864.9%
UMa5433.9%
Boo5283.8%
CVn5183.7%
Psc4903.5%
Peg4623.3%
Eri4573.3%
Dra3802.7%
Hya3762.7%
Cnc3542.5%
Dor3412.4%
Her3402.4%
Pav2892.1%
Aqr2772.0%
Cen2311.7%
And2091.5%
Per1631.2%

MajAx numeric feature

MajAx is a numeric measurement (likely a major-axis length, e.g. of a galaxy or ellipse) with a tight central distribution—median 1.2, IQR 0.82–1.87—but an extreme right tail reaching 299.92. Skew of 21.89 and kurtosis of 641.85 are both severe, and 10.36% of values flag as outliers. Null rate is also non-trivial at 14.06%.

Treatment: log-transform and impute missing before modelling; consider winsorising the upper tail.

anthropic:claude-opus-4-7 · confidence high
Out[28]:

saturn.columns["MajAx"].stats

statvalue
n13,969
nulls1,964 (14.1%)
unique734
min 0.02
max 299.9
mean 2.145
median 1.2
std 6.789
q1 0.82
q3 1.87
iqr 1.05
skew 21.89
kurtosis 641.8
n_outliers 1,244
outlier_rate 0.1036
zero_rate 0
alert: high_skewskew=+21.89
alert: outliers10.4% rows beyond 1.5 IQR
Fig 13.
Distribution of MajAx. Vertical dash marks the median.
Show data table
Histogram bins for MajAx (median: 1.2).
bincount
0.02 – 7.51811666
7.518 – 15.02220
15.02 – 22.5148
22.51 – 30.0128
30.01 – 37.514
37.51 – 45.017
45.01 – 52.53
52.5 – 607
60 – 67.51
67.5 – 750
75 – 82.493
82.49 – 89.991
89.99 – 97.492
97.49 – 1050
105 – 112.51
112.5 – 1200
120 – 127.55
127.5 – 1352
135 – 142.50
142.5 – 1500
150 – 157.50
157.5 – 1651
165 – 172.50
172.5 – 1801
180 – 187.53
187.5 – 1950
195 – 202.50
202.5 – 2100
210 – 217.41
217.4 – 224.90
224.9 – 232.40
232.4 – 239.90
239.9 – 247.40
247.4 – 254.90
254.9 – 262.40
262.4 – 269.90
269.9 – 277.40
277.4 – 284.90
284.9 – 292.40
292.4 – 299.91

MinAx numeric feature

MinAx is a numeric measurement (likely a minor-axis dimension) with a tight central mass — median 0.69, IQR 0.45–1.07 — but an extreme right tail stretching to 179.89. Skew of 26.82 and kurtosis near 981 are extraordinary, and 760 outliers (6.88%) sit alongside a 20.91% null rate. Only 465 distinct values across 13,969 rows suggests rounding or a discretised scale.

Treatment: Impute or flag the 20.91% nulls and apply a log transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high
Out[31]:

saturn.columns["MinAx"].stats

statvalue
n13,969
nulls2,921 (20.9%)
unique465
min 0.02
max 179.9
mean 1.113
median 0.69
std 3.738
q1 0.45
q3 1.07
iqr 0.62
skew 26.82
kurtosis 981.5
n_outliers 760
outlier_rate 0.06879
zero_rate 0
alert: null_rate20.9% null
alert: high_skewskew=+26.82
alert: outliers6.9% rows beyond 1.5 IQR
Fig 14.
Distribution of MinAx. Vertical dash marks the median.
Show data table
Histogram bins for MinAx (median: 0.69).
bincount
0.02 – 4.51710868
4.517 – 9.01391
9.013 – 13.5135
13.51 – 18.0110
18.01 – 22.58
22.5 – 273
27 – 31.512
31.5 – 35.990
35.99 – 40.497
40.49 – 44.990
44.99 – 49.480
49.48 – 53.982
53.98 – 58.480
58.48 – 62.976
62.97 – 67.470
67.47 – 71.971
71.97 – 76.460
76.46 – 80.961
80.96 – 85.460
85.46 – 89.950
89.95 – 94.450
94.45 – 98.950
98.95 – 103.41
103.4 – 107.90
107.9 – 112.40
112.4 – 116.90
116.9 – 121.41
121.4 – 125.90
125.9 – 130.40
130.4 – 134.90
134.9 – 139.40
139.4 – 143.90
143.9 – 148.40
148.4 – 152.90
152.9 – 157.40
157.4 – 161.91
161.9 – 166.40
166.4 – 170.90
170.9 – 175.40
175.4 – 179.91

PosAng numeric feature

PosAng is a numeric column bounded between 0 and 180 with mean 87.27 and median 87, consistent with a position angle in degrees (a common astronomical measurement). The distribution is nearly symmetric (skew 0.047) and platykurtic (kurtosis -1.21), spread broadly across the range with IQR 40-133 and no outliers. Notable: 23.17% of rows are null, and despite n=13969 there are only 181 unique values, suggesting integer-degree quantisation.

Treatment: Treat as a circular/angular feature (e.g. encode as sin/cos) and impute or flag the 23% missingness before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[34]:

saturn.columns["PosAng"].stats

statvalue
n13,969
nulls3,236 (23.2%)
unique181
min 0
max 180
mean 87.27
median 87
std 52.68
q1 40
q3 133
iqr 93
skew 0.04737
kurtosis -1.212
n_outliers 0
outlier_rate 0
zero_rate 0.008572
alert: null_rate23.2% null
Fig 15.
Distribution of PosAng. Vertical dash marks the median.
Show data table
Histogram bins for PosAng (median: 87.0).
bincount
0 – 4.5342
4.5 – 9281
9 – 13.5311
13.5 – 18275
18 – 22.5339
22.5 – 27262
27 – 31.5330
31.5 – 36251
36 – 40.5309
40.5 – 45194
45 – 49.5277
49.5 – 54252
54 – 58.5275
58.5 – 63234
63 – 67.5311
67.5 – 72256
72 – 76.5255
76.5 – 81257
81 – 85.5284
85.5 – 90208
90 – 94.5338
94.5 – 99254
99 – 103.5308
103.5 – 108216
108 – 112.5315
112.5 – 117206
117 – 121.5280
121.5 – 126221
126 – 130.5278
130.5 – 135204
135 – 139.5280
139.5 – 144235
144 – 148.5279
148.5 – 153243
153 – 157.5298
157.5 – 162236
162 – 166.5285
166.5 – 171229
171 – 175.5279
175.5 – 180246

B-Mag numeric feature

B-Mag is a numeric photometric measurement, almost certainly the B-band apparent magnitude of astronomical sources, with values concentrated between 13.42 and 15.2 (median 14.42). The distribution is strongly left-skewed (skew -1.69, kurtosis 5.58) with a long faint-end tail down to 1.51 and 572 outliers (5.05%). Roughly 18.86% of rows are null, so a sizeable share of objects lack a B measurement.

Treatment: Impute or flag the 18.86% missing values and consider winsorising the bright-end tail before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[37]:

saturn.columns["B-Mag"].stats

statvalue
n13,969
nulls2,634 (18.9%)
unique1,056
min 1.51
max 21.01
mean 14.12
median 14.42
std 1.833
q1 13.42
q3 15.2
iqr 1.78
skew -1.692
kurtosis 5.576
n_outliers 572
outlier_rate 0.05046
zero_rate 0
alert: outliers5.0% rows beyond 1.5 IQR
Fig 16.
Distribution of B-Mag. Vertical dash marks the median.
Show data table
Histogram bins for B-Mag (median: 14.42).
bincount
1.51 – 1.9981
1.998 – 2.4850
2.485 – 2.9733
2.973 – 3.464
3.46 – 3.9473
3.947 – 4.4357
4.435 – 4.92313
4.923 – 5.4112
5.41 – 5.8979
5.897 – 6.38528
6.385 – 6.87219
6.872 – 7.3645
7.36 – 7.84737
7.847 – 8.33540
8.335 – 8.82242
8.822 – 9.3149
9.31 – 9.79752
9.797 – 10.2973
10.29 – 10.7792
10.77 – 11.26133
11.26 – 11.75219
11.75 – 12.23322
12.23 – 12.72475
12.72 – 13.21762
13.21 – 13.71038
13.7 – 14.181374
14.18 – 14.671735
14.67 – 15.161765
15.16 – 15.651468
15.65 – 16.14678
16.14 – 16.62392
16.62 – 17.11236
17.11 – 17.6126
17.6 – 18.0948
18.09 – 18.5719
18.57 – 19.068
19.06 – 19.556
19.55 – 20.040
20.04 – 20.520
20.52 – 21.012

V-Mag numeric feature

V-Mag is a numeric column almost certainly recording visual (apparent) magnitude of astronomical objects, with values spanning 1.69 to 20.41 and a median of 12.38 consistent with that scale. The distribution is left-skewed (skew -1.22) with 323 outliers (7.66%) on the bright end, and 69.83% of rows are null, so usable coverage is limited to roughly 30% of the catalogue. The interquartile band is tight (11.31-13.28, IQR 1.97) around faint magnitudes.

Treatment: Impute or flag the 69.83% missing values before modelling; consider keeping raw scale since magnitudes are already logarithmic.

anthropic:claude-opus-4-7 · confidence high
Out[40]:

saturn.columns["V-Mag"].stats

statvalue
n13,969
nulls9,755 (69.8%)
unique774
min 1.69
max 20.41
mean 12.04
median 12.38
std 2.092
q1 11.31
q3 13.28
iqr 1.97
skew -1.223
kurtosis 2.67
n_outliers 323
outlier_rate 0.07665
zero_rate 0
alert: null_rate69.8% null
alert: outliers7.7% rows beyond 1.5 IQR
Fig 17.
Distribution of V-Mag. Vertical dash marks the median.
Show data table
Histogram bins for V-Mag (median: 12.38).
bincount
1.69 – 2.1581
2.158 – 2.6265
2.626 – 3.0942
3.094 – 3.5623
3.562 – 4.037
4.03 – 4.4988
4.498 – 4.96613
4.966 – 5.43414
5.434 – 5.90223
5.902 – 6.3726
6.37 – 6.83841
6.838 – 7.30647
7.306 – 7.77440
7.774 – 8.24246
8.242 – 8.7157
8.71 – 9.17850
9.178 – 9.64675
9.646 – 10.1197
10.11 – 10.58141
10.58 – 11.05212
11.05 – 11.52310
11.52 – 11.99458
11.99 – 12.45518
12.45 – 12.92578
12.92 – 13.39473
13.39 – 13.86376
13.86 – 14.33249
14.33 – 14.79156
14.79 – 15.2695
15.26 – 15.7339
15.73 – 16.219
16.2 – 16.6711
16.67 – 17.1312
17.13 – 17.65
17.6 – 18.073
18.07 – 18.543
18.54 – 19.010
19.01 – 19.470
19.47 – 19.940
19.94 – 20.411

J-Mag numeric feature

This is the J-band magnitude (near-infrared photometry, ~1.25 μm) for catalogued sources, with values centered near 11.42 and spanning 1.11 to 17.02. The distribution is mildly left-skewed (-0.55) with modest excess kurtosis (2.51) and 333 outliers (3.45%), consistent with a mix of bright and faint sources. A notable concern is the 30.9% null rate, meaning nearly a third of rows lack a J-Mag measurement.

Treatment: Impute or flag missing values before modelling; consider robust scaling given the left skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[43]:

saturn.columns["J-Mag"].stats

statvalue
n13,969
nulls4,317 (30.9%)
unique804
min 1.11
max 17.02
mean 11.37
median 11.42
std 1.355
q1 10.63
q3 12.17
iqr 1.54
skew -0.5524
kurtosis 2.512
n_outliers 333
outlier_rate 0.0345
zero_rate 0
alert: null_rate30.9% null
Fig 18.
Distribution of J-Mag. Vertical dash marks the median.
Show data table
Histogram bins for J-Mag (median: 11.42).
bincount
1.11 – 1.5081
1.508 – 1.9050
1.905 – 2.3033
2.303 – 2.7010
2.701 – 3.0990
3.099 – 3.4971
3.497 – 3.8941
3.894 – 4.2920
4.292 – 4.692
4.69 – 5.0885
5.088 – 5.4851
5.485 – 5.8838
5.883 – 6.28112
6.281 – 6.67910
6.679 – 7.07618
7.076 – 7.47433
7.474 – 7.87240
7.872 – 8.26976
8.269 – 8.667106
8.667 – 9.065141
9.065 – 9.463255
9.463 – 9.861378
9.861 – 10.26529
10.26 – 10.66868
10.66 – 11.051118
11.05 – 11.451328
11.45 – 11.851277
11.85 – 12.251190
12.25 – 12.64821
12.64 – 13.04591
13.04 – 13.44353
13.44 – 13.84197
13.84 – 14.24132
14.24 – 14.6380
14.63 – 15.0335
15.03 – 15.4326
15.43 – 15.839
15.83 – 16.223
16.22 – 16.621
16.62 – 17.023

H-Mag numeric feature

H-Mag is a numeric astronomical magnitude (likely absolute H-magnitude, common in asteroid/minor-planet catalogs), ranging from 0.83 to 16.67 with a mean of 10.70 and median 10.74. The distribution is mildly left-skewed (-0.45) with light tails (kurtosis 2.18) and a tight IQR of 1.55, indicating most objects cluster around magnitude 10-11.5. The notable concern is a 30.6% null rate, plus 341 low-end outliers (3.5%) representing unusually bright objects.

Treatment: Impute or flag the 30.6% missing values before modelling; consider keeping raw scale since skew is mild.

anthropic:claude-opus-4-7 · confidence high
Out[46]:

saturn.columns["H-Mag"].stats

statvalue
n13,969
nulls4,276 (30.6%)
unique831
min 0.83
max 16.67
mean 10.7
median 10.74
std 1.368
q1 9.95
q3 11.5
iqr 1.55
skew -0.4515
kurtosis 2.179
n_outliers 341
outlier_rate 0.03518
zero_rate 0
alert: null_rate30.6% null
Fig 19.
Distribution of H-Mag. Vertical dash marks the median.
Show data table
Histogram bins for H-Mag (median: 10.74).
bincount
0.83 – 1.2261
1.226 – 1.6221
1.622 – 2.0181
2.018 – 2.4141
2.414 – 2.811
2.81 – 3.2060
3.206 – 3.6020
3.602 – 3.9983
3.998 – 4.3944
4.394 – 4.790
4.79 – 5.1866
5.186 – 5.5826
5.582 – 5.97814
5.978 – 6.37423
6.374 – 6.7727
6.77 – 7.16648
7.166 – 7.56273
7.562 – 7.958102
7.958 – 8.354140
8.354 – 8.75235
8.75 – 9.146388
9.146 – 9.542511
9.542 – 9.938810
9.938 – 10.331133
10.33 – 10.731259
10.73 – 11.131334
11.13 – 11.521199
11.52 – 11.92795
11.92 – 12.31625
12.31 – 12.71386
12.71 – 13.11224
13.11 – 13.5148
13.5 – 13.995
13.9 – 14.2944
14.29 – 14.6932
14.69 – 15.0915
15.09 – 15.483
15.48 – 15.884
15.88 – 16.270
16.27 – 16.672

K-Mag numeric feature

Numeric K-band magnitude readings (likely 2MASS K-mag photometry), centered around a median of 10.46 and ranging from 0.72 to 15.76. The distribution is mildly left-skewed (-0.45) with 315 outliers (3.3%) and a notable 30.9% null rate, suggesting many sources lack K-band coverage. Spread is tight (std 1.36, IQR 1.57) across 823 unique values.

Treatment: Impute or flag the 31% missing values before modelling; no transform needed given near-symmetric spread.

anthropic:claude-opus-4-7 · confidence high
Out[49]:

saturn.columns["K-Mag"].stats

statvalue
n13,969
nulls4,317 (30.9%)
unique823
min 0.72
max 15.76
mean 10.41
median 10.46
std 1.361
q1 9.66
q3 11.23
iqr 1.57
skew -0.4513
kurtosis 2.166
n_outliers 315
outlier_rate 0.03264
zero_rate 0
alert: null_rate30.9% null
Fig 20.
Distribution of K-Mag. Vertical dash marks the median.
Show data table
Histogram bins for K-Mag (median: 10.46).
bincount
0.72 – 1.0962
1.096 – 1.4720
1.472 – 1.8481
1.848 – 2.2241
2.224 – 2.61
2.6 – 2.9760
2.976 – 3.3521
3.352 – 3.7281
3.728 – 4.1046
4.104 – 4.480
4.48 – 4.8565
4.856 – 5.2325
5.232 – 5.60810
5.608 – 5.98414
5.984 – 6.3632
6.36 – 6.73629
6.736 – 7.11262
7.112 – 7.48882
7.488 – 7.864113
7.864 – 8.24152
8.24 – 8.616290
8.616 – 8.992394
8.992 – 9.368572
9.368 – 9.744862
9.744 – 10.121108
10.12 – 10.51196
10.5 – 10.871296
10.87 – 11.251050
11.25 – 11.62800
11.62 – 12595
12 – 12.38372
12.38 – 12.75227
12.75 – 13.13166
13.13 – 13.581
13.5 – 13.8851
13.88 – 14.2641
14.26 – 14.6321
14.63 – 15.016
15.01 – 15.384
15.38 – 15.763

SurfBr numeric feature

SurfBr is a numeric measurement tightly clustered around 23.31 (median 23.33) with a narrow IQR of 0.71 and standard deviation 0.61, consistent with a surface brightness magnitude. The distribution is mildly left-skewed (-0.30) with elevated kurtosis (2.65) and 288 outliers (2.8%), and roughly 27% of rows are null which is a notable coverage gap.

Treatment: Impute or flag the 27% missing values before modelling; no transform needed given the tight, near-symmetric spread.

anthropic:claude-opus-4-7 · confidence high
Out[52]:

saturn.columns["SurfBr"].stats

statvalue
n13,969
nulls3,742 (26.8%)
unique438
min 18.36
max 28.48
mean 23.31
median 23.33
std 0.61
q1 22.97
q3 23.68
iqr 0.71
skew -0.3014
kurtosis 2.65
n_outliers 288
outlier_rate 0.02816
zero_rate 0
alert: null_rate26.8% null
Fig 21.
Distribution of SurfBr. Vertical dash marks the median.
Show data table
Histogram bins for SurfBr (median: 23.33).
bincount
18.36 – 18.611
18.61 – 18.870
18.87 – 19.120
19.12 – 19.370
19.37 – 19.620
19.62 – 19.880
19.88 – 20.133
20.13 – 20.385
20.38 – 20.644
20.64 – 20.896
20.89 – 21.1417
21.14 – 21.432
21.4 – 21.6551
21.65 – 21.971
21.9 – 22.16133
22.16 – 22.41301
22.41 – 22.66640
22.66 – 22.91996
22.91 – 23.171616
23.17 – 23.421901
23.42 – 23.671836
23.67 – 23.931275
23.93 – 24.18693
24.18 – 24.43354
24.43 – 24.68163
24.68 – 24.9478
24.94 – 25.1926
25.19 – 25.4410
25.44 – 25.74
25.7 – 25.956
25.95 – 26.21
26.2 – 26.462
26.46 – 26.710
26.71 – 26.961
26.96 – 27.210
27.21 – 27.470
27.47 – 27.720
27.72 – 27.970
27.97 – 28.230
28.23 – 28.481

Hubble categorical label

Hubble appears to be the Hubble morphological classification of galaxies, with familiar types like E (elliptical), S0 (lenticular), and the spiral sequence Sa/Sb/Sc dominating. Across 13,969 rows it has 30 distinct codes and high entropy ratio (0.83), so the type distribution is fairly spread rather than concentrated — the top value E only accounts for 15.2%. Notably, 27.26% of rows are null, meaning over a quarter of galaxies lack a morphological label.

Treatment: Treat as categorical label; impute or filter the 27% nulls before any class-conditional analysis.

anthropic:claude-opus-4-7 · confidence high
Out[55]:

saturn.columns["Hubble"].stats

statvalue
n13,969
nulls3,808 (27.3%)
unique30
top_value E
top_rate 0.1522
cardinality 30
entropy 4.096
entropy_ratio 0.8348
alert: null_rate27.3% null
Fig 22.
Top values for Hubble.
Show data table
Top values for Hubble (20 unique shown, of 30 total).
valuecountshare
E154611.1%
S010737.7%
S0-a9967.1%
Sc9486.8%
Sb7585.4%
Sbc7275.2%
E-S06234.5%
Sa4783.4%
Sab4613.3%
SBb3232.3%
SABc2812.0%
SBc2721.9%
SBbc2691.9%
SABb2551.8%
SBa1861.3%
SBab1601.1%
SABa1220.9%
I1040.7%
Scd1030.7%
Sd740.5%

Pax numeric feature

Pax is a sparse numeric measurement, populated for only ~5% of rows (null_rate 0.9485) with values ranging from 0.003 to 22.8 and a median of 0.4829. The distribution is severely right-skewed (skew 7.21, kurtosis 68.87) with 65 outliers among the non-null values, and the mean (0.919) sits well above the median, indicating a long upper tail. With 676 unique values across 13,969 rows and no zeros, this looks like a continuous rate or ratio observed only on a small subpopulation.

Treatment: Log-transform and add a missingness indicator before modelling, given the 94.85% null rate and heavy right skew.

anthropic:claude-opus-4-7 · confidence medium
Out[58]:

saturn.columns["Pax"].stats

statvalue
n13,969
nulls13,249 (94.8%)
unique676
min 0.003
max 22.8
mean 0.9192
median 0.4829
std 1.76
q1 0.2517
q3 0.9338
iqr 0.6821
skew 7.206
kurtosis 68.87
n_outliers 65
outlier_rate 0.09028
zero_rate 0
alert: null_rate94.8% null
alert: high_skewskew=+7.21
alert: outliers9.0% rows beyond 1.5 IQR
Fig 23.
Distribution of Pax. Vertical dash marks the median.
Show data table
Histogram bins for Pax (median: 0.4829).
bincount
0.003 – 0.8798525
0.8798 – 1.757119
1.757 – 2.63338
2.633 – 3.5115
3.51 – 4.3872
4.387 – 5.2645
5.264 – 6.1413
6.141 – 7.0174
7.017 – 7.8942
7.894 – 8.7711
8.771 – 9.6481
9.648 – 10.520
10.52 – 11.40
11.4 – 12.281
12.28 – 13.160
13.16 – 14.031
14.03 – 14.910
14.91 – 15.791
15.79 – 16.660
16.66 – 17.540
17.54 – 18.420
18.42 – 19.290
19.29 – 20.170
20.17 – 21.051
21.05 – 21.920
21.92 – 22.81

Pm-RA numeric feature

Pm-RA is almost certainly a proper-motion-in-right-ascension measurement (mas/yr) for catalog objects, with values centered slightly negative (median -0.9015, mean -1.374) and a tight IQR of 3.54. Two things stand out: 92.53% of rows are null, suggesting this is only populated for a subset of sources, and the distribution has heavy tails (skew 2.66, kurtosis 50.19) with extremes from -43.57 to 76.0 and a 6.8% outlier rate.

Treatment: Impute or flag the 92.53% missingness explicitly and winsorize/robust-scale before modelling.

anthropic:claude-opus-4-7 · confidence high
Out[61]:

saturn.columns["Pm-RA"].stats

statvalue
n13,969
nulls12,925 (92.5%)
unique954
min -43.57
max 76
mean -1.374
median -0.9015
std 5.826
q1 -3.103
q3 0.4365
iqr 3.54
skew 2.664
kurtosis 50.19
n_outliers 71
outlier_rate 0.06801
zero_rate 0.0009579
alert: null_rate92.5% null
alert: high_skewskew=+2.66
alert: outliers6.8% rows beyond 1.5 IQR
Fig 24.
Distribution of Pm-RA. Vertical dash marks the median.
Show data table
Histogram bins for Pm-RA (median: -0.9015).
bincount
-43.57 – -39.831
-39.83 – -36.11
-36.1 – -32.361
-32.36 – -28.621
-28.62 – -24.891
-24.89 – -21.154
-21.15 – -17.415
-17.41 – -13.685
-13.68 – -9.94110
-9.941 – -6.20468
-6.204 – -2.468228
-2.468 – 1.269558
1.269 – 5.005127
5.005 – 8.74214
8.742 – 12.488
12.48 – 16.214
16.21 – 19.953
19.95 – 23.691
23.69 – 27.420
27.42 – 31.161
31.16 – 34.90
34.9 – 38.630
38.63 – 42.371
42.37 – 46.110
46.11 – 49.840
49.84 – 53.580
53.58 – 57.320
57.32 – 61.051
61.05 – 64.790
64.79 – 68.530
68.53 – 72.260
72.26 – 761

Pm-Dec numeric feature

Pm-Dec is almost certainly proper motion in declination (mas/yr), a sparse astrometric feature with 92.53% nulls and only 1044 populated rows out of 13969. Values center near zero (median -1.088, mean -1.402) but span -58.961 to 64.0 with std 5.9, kurtosis 43.8, and 74 outliers (7.09% of non-null), indicating heavy tails typical of high-proper-motion sources.

Treatment: Impute or mask the 92.53% missing before modelling, and consider a robust scaler given the heavy tails.

anthropic:claude-opus-4-7 · confidence high
Out[64]:

saturn.columns["Pm-Dec"].stats

statvalue
n13,969
nulls12,925 (92.5%)
unique961
min -58.96
max 64
mean -1.402
median -1.088
std 5.9
q1 -3.256
q3 0.544
iqr 3.8
skew 0.7277
kurtosis 43.81
n_outliers 74
outlier_rate 0.07088
zero_rate 0
alert: null_rate92.5% null
alert: outliers7.1% rows beyond 1.5 IQR
Fig 25.
Distribution of Pm-Dec. Vertical dash marks the median.
Show data table
Histogram bins for Pm-Dec (median: -1.088).
bincount
-58.96 – -55.121
-55.12 – -51.281
-51.28 – -47.430
-47.43 – -43.590
-43.59 – -39.750
-39.75 – -35.910
-35.91 – -32.060
-32.06 – -28.221
-28.22 – -24.382
-24.38 – -20.541
-20.54 – -16.697
-16.69 – -12.8511
-12.85 – -9.00824
-9.008 – -5.16685
-5.166 – -1.323345
-1.323 – 2.52452
2.52 – 6.36289
6.362 – 10.28
10.2 – 14.057
14.05 – 17.894
17.89 – 21.730
21.73 – 25.574
25.57 – 29.420
29.42 – 33.260
33.26 – 37.10
37.1 – 40.940
40.94 – 44.790
44.79 – 48.630
48.63 – 52.470
52.47 – 56.310
56.31 – 60.160
60.16 – 642

RadVel numeric feature

RadVel reads as a radial velocity measurement (likely km/s or m/s) for ~14k objects, with values ranging from -483 to 52,025 and a median of 4,885. The distribution is right-skewed (skew 1.53, kurtosis 5.16) with 341 outliers (3.2%) and a notable 24.2% null rate flagged as an alert. Only 6,691 unique values across 13,969 rows suggests some repeated/quantized measurements.

Treatment: Impute or flag the 24% missing values and consider a robust scaler given the right skew and outliers.

anthropic:claude-opus-4-7 · confidence high
Out[67]:

saturn.columns["RadVel"].stats

statvalue
n13,969
nulls3,386 (24.2%)
unique6,691
min -483
max 52,025
mean 5541
median 4,885
std 4264
q1 2406
q3 7,632
iqr 5226
skew 1.532
kurtosis 5.16
n_outliers 341
outlier_rate 0.03222
zero_rate 0.0006614
alert: null_rate24.2% null
Fig 26.
Distribution of RadVel. Vertical dash marks the median.
Show data table
Histogram bins for RadVel (median: 4885.0).
bincount
-483 – 829.7989
829.7 – 21421432
2142 – 34551235
3455 – 47681480
4768 – 60801420
6080 – 73931204
7393 – 8706858
8706 – 1.002e+04697
1.002e+04 – 1.133e+04414
1.133e+04 – 1.264e+04275
1.264e+04 – 1.396e+04139
1.396e+04 – 1.527e+0489
1.527e+04 – 1.658e+0488
1.658e+04 – 1.789e+0446
1.789e+04 – 1.921e+0475
1.921e+04 – 2.052e+0444
2.052e+04 – 2.183e+0433
2.183e+04 – 2.315e+0418
2.315e+04 – 2.446e+0425
2.446e+04 – 2.577e+047
2.577e+04 – 2.708e+047
2.708e+04 – 2.84e+042
2.84e+04 – 2.971e+041
2.971e+04 – 3.102e+040
3.102e+04 – 3.233e+040
3.233e+04 – 3.365e+042
3.365e+04 – 3.496e+041
3.496e+04 – 3.627e+040
3.627e+04 – 3.759e+040
3.759e+04 – 3.89e+040
3.89e+04 – 4.021e+040
4.021e+04 – 4.152e+040
4.152e+04 – 4.284e+040
4.284e+04 – 4.415e+040
4.415e+04 – 4.546e+040
4.546e+04 – 4.677e+040
4.677e+04 – 4.809e+040
4.809e+04 – 4.94e+041
4.94e+04 – 5.071e+040
5.071e+04 – 5.202e+041

Redshift numeric feature

Redshift values cluster tightly between q1 0.008056 and q3 0.02579 with a median of 0.01643, consistent with cosmological redshift measurements for relatively nearby objects. The distribution is right-skewed (skew 1.65, kurtosis 6.29) with a max of 0.191616 and a small negative min of -0.00161, and 24.24% of rows are null.

Treatment: Impute or filter the 24% nulls and consider a log1p transform before modelling to tame the right skew.

anthropic:claude-opus-4-7 · confidence high
Out[70]:

saturn.columns["Redshift"].stats

statvalue
n13,969
nulls3,386 (24.2%)
unique7,717
min -0.00161
max 0.1916
mean 0.01877
median 0.01643
std 0.01467
q1 0.008056
q3 0.02579
iqr 0.01773
skew 1.652
kurtosis 6.287
n_outliers 350
outlier_rate 0.03307
zero_rate 0.000378
alert: null_rate24.2% null
Fig 27.
Distribution of Redshift. Vertical dash marks the median.
Show data table
Histogram bins for Redshift (median: 0.01643).
bincount
-0.00161 – 0.0032211111
0.003221 – 0.0080511535
0.008051 – 0.012881344
0.01288 – 0.017711763
0.01771 – 0.022541330
0.02254 – 0.027371167
0.02737 – 0.0322844
0.0322 – 0.03704526
0.03704 – 0.04187328
0.04187 – 0.0467175
0.0467 – 0.0515396
0.05153 – 0.0563692
0.05636 – 0.0611951
0.06119 – 0.0660277
0.06602 – 0.0708546
0.07085 – 0.0756833
0.07568 – 0.0805118
0.08051 – 0.0853426
0.08534 – 0.090176
0.09017 – 0.0958
0.095 – 0.099831
0.09983 – 0.10471
0.1047 – 0.10950
0.1095 – 0.11430
0.1143 – 0.11922
0.1192 – 0.1241
0.124 – 0.12880
0.1288 – 0.13360
0.1336 – 0.13850
0.1385 – 0.14330
0.1433 – 0.14810
0.1481 – 0.1530
0.153 – 0.15780
0.1578 – 0.16260
0.1626 – 0.16750
0.1675 – 0.17230
0.1723 – 0.17710
0.1771 – 0.1821
0.182 – 0.18680
0.1868 – 0.19161

Cstar U-Mag numeric feature

Cstar U-Mag appears to be the U-band magnitude of a central/companion star, populated for only ~0.11% of rows (null_rate 0.9989) — just 16 unique non-null values across 13969 records. Where present, values span 9.3 to 14.75 with mean 11.93 and median 12.09, roughly symmetric (skew 0.09) and platykurtic (kurtosis -1.12), with no flagged outliers. The extreme sparsity is the dominant signal and limits any aggregate use.

Treatment: Drop or treat as a presence indicator; too sparse (99.89% null) for direct modelling.

anthropic:claude-opus-4-7 · confidence high
Out[73]:

saturn.columns["Cstar U-Mag"].stats

statvalue
n13,969
nulls13,953 (99.9%)
unique16
min 9.3
max 14.75
mean 11.93
median 12.09
std 1.741
q1 10.38
q3 13.06
iqr 2.683
skew 0.09311
kurtosis -1.12
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate99.9% null
Fig 28.
Distribution of Cstar U-Mag. Vertical dash marks the median.
Show data table
Histogram bins for Cstar U-Mag (median: 12.09).
bincount
9.3 – 10.394
10.39 – 11.482
11.48 – 12.575
12.57 – 13.662
13.66 – 14.753

Cstar B-Mag numeric feature

Apparent B-band magnitude of a companion star (Cstar B-Mag), populated for only 0.79% of rows (null_rate 0.9921). Among the 111 non-null values there are just 97 unique magnitudes spanning 9.93 to 21.1, with mean 15.23 and median 15.5, roughly symmetric (skew -0.17) and no flagged outliers.

Treatment: Treat as sparse optional feature; impute or model missingness explicitly rather than relying on the value.

anthropic:claude-opus-4-7 · confidence high
Out[76]:

saturn.columns["Cstar B-Mag"].stats

statvalue
n13,969
nulls13,858 (99.2%)
unique97
min 9.93
max 21.1
mean 15.23
median 15.5
std 2.476
q1 13.59
q3 16.77
iqr 3.185
skew -0.1719
kurtosis -0.37
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate99.2% null
Fig 29.
Distribution of Cstar B-Mag. Vertical dash marks the median.
Show data table
Histogram bins for Cstar B-Mag (median: 15.5).
bincount
9.93 – 11.056
11.05 – 12.169
12.16 – 13.289
13.28 – 14.410
14.4 – 15.5222
15.52 – 16.6325
16.63 – 17.7515
17.75 – 18.877
18.87 – 19.985
19.98 – 21.13

Cstar V-Mag numeric feature

Visual magnitude of a companion star ('Cstar V-Mag'), populated for only ~0.7% of rows (null_rate 0.9931). The 96 non-null values span 9.42 to 19.6 with median 15.145 and a roughly symmetric distribution (skew -0.15, kurtosis -0.36), consistent with stellar photometry on the magnitude scale. Severe sparsity is the dominant signal—this field is only meaningful for systems where a companion star was characterized.

Treatment: Treat as optional astrophysical feature; impute with a missing-indicator or drop given >99% nulls.

anthropic:claude-opus-4-7 · confidence high
Out[79]:

saturn.columns["Cstar V-Mag"].stats

statvalue
n13,969
nulls13,873 (99.3%)
unique82
min 9.42
max 19.6
mean 14.86
median 15.14
std 2.308
q1 13.39
q3 16.06
iqr 2.663
skew -0.1467
kurtosis -0.3632
n_outliers 0
outlier_rate 0
zero_rate 0
alert: null_rate99.3% null
Fig 30.
Distribution of Cstar V-Mag. Vertical dash marks the median.
Show data table
Histogram bins for Cstar V-Mag (median: 15.145).
bincount
9.42 – 10.555
10.55 – 11.687
11.68 – 12.817
12.81 – 13.9411
13.94 – 15.0817
15.08 – 16.2127
16.21 – 17.346
17.34 – 18.478
18.47 – 19.68

M categorical identifier

Column M is a sparsely populated categorical code, present in only ~0.77% of the 13969 rows (null_rate 0.9923). Among the 107 non-null values, every one appears exactly once (top_rate ≈ 0.0093, entropy_ratio ≈ 1.0), so each observation is unique — values are zero-padded 3-digit strings like '024','025','110'. The combination of 99.23% nulls and perfect uniqueness on the remainder suggests an incidental tag or sub-identifier rather than a usable feature.

Treatment: Drop; near-unique values with >99% nulls offer no modelling signal.

anthropic:claude-opus-4-7 · confidence high
Out[82]:

saturn.columns["M"].stats

statvalue
n13,969
nulls13,862 (99.2%)
unique107
top_value 024
top_rate 0.009346
cardinality 107
entropy 6.741
entropy_ratio 1
alert: long_tail107 singleton categories
alert: null_rate99.2% null
Fig 31.
Top values for M.
Show data table
Top values for M (20 unique shown, of 107 total).
valuecountshare
02410.0%
02510.0%
11010.0%
03210.0%
03110.0%
10310.0%
03310.0%
07410.0%
07610.0%
03410.0%
07710.0%
07910.0%
03810.0%
00110.0%
03610.0%
04210.0%
04310.0%
07810.0%
03710.0%
03510.0%

NGC categorical identifier

This is an NGC (New General Catalogue) astronomical object identifier, populated for only 6.5% of rows (null_rate 0.935). Among the 908 non-null entries it spans 891 unique codes with near-maximal entropy_ratio 0.999 and a top value '3497' appearing just 3 times (top_rate 0.0033), so it behaves almost like a sparse identifier. Some codes carry letter suffixes (e.g. '5619B'), confirming it's a catalog string rather than a clean integer.

Treatment: Treat as a sparse cross-reference key; drop or left-join to an NGC catalog rather than using as a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[85]:

saturn.columns["NGC"].stats

statvalue
n13,969
nulls13,061 (93.5%)
unique891
top_value 3497
top_rate 0.003304
cardinality 891
entropy 9.788
entropy_ratio 0.9989
alert: long_tail875 singleton categories
alert: null_rate93.5% null
Fig 32.
Top values for NGC.
Show data table
Top values for NGC (20 unique shown, of 891 total).
valuecountshare
349730.0%
096120.0%
294720.0%
491320.0%
560720.0%
5619B20.0%
159320.0%
350820.0%
411920.0%
659020.0%
061420.0%
086320.0%
311020.0%
466420.0%
612520.0%
666720.0%
028110.0%
013510.0%
017810.0%
022310.0%

IC categorical foreign_key

IC appears to be a sparse categorical code, likely an industry/identifier classification, populated for only ~3.3% of rows (null_rate 0.9671). Among the 460 non-null entries it spreads across 452 distinct values with near-maximal entropy (entropy_ratio 0.9984) and a top frequency of just 3 (top_rate 0.0065), so essentially every present value is unique. This combination of overwhelming nulls and near-unique codes makes it unusable as a grouping feature without enrichment.

Treatment: Drop or treat as a sparse lookup key; do not use as a categorical feature given 96.7% nulls and near-unique values.

anthropic:claude-opus-4-7 · confidence high
Out[88]:

saturn.columns["IC"].stats

statvalue
n13,969
nulls13,509 (96.7%)
unique452
top_value 5003
top_rate 0.006522
cardinality 452
entropy 8.806
entropy_ratio 0.9984
alert: long_tail447 singleton categories
alert: null_rate96.7% null
Fig 33.
Top values for IC.
Show data table
Top values for IC (20 unique shown, of 452 total).
valuecountshare
500330.0%
500730.0%
145930.0%
517920.0%
100520.0%
157710.0%
167110.0%
170010.0%
177810.0%
178710.0%
188410.0%
188710.0%
188810.0%
188910.0%
196310.0%
198510.0%
212110.0%
212310.0%
212410.0%
213110.0%

Cstar Names categorical identifier

Companion-star catalogue identifiers (HD/BD designations), occasionally listing multiple names comma-separated. Effectively empty: 99.38% null with only 87 distinct values across 13,969 rows, each appearing once (top_rate 0.0115, entropy_ratio ~1.0).

Treatment: Drop or retain only as a cross-reference key; too sparse and unique to model.

anthropic:claude-opus-4-7 · confidence high
Out[91]:

saturn.columns["Cstar Names"].stats

statvalue
n13,969
nulls13,882 (99.4%)
unique87
top_value BD -12 1172,HD 35914
top_rate 0.01149
cardinality 87
entropy 6.443
entropy_ratio 1
alert: long_tail87 singleton categories
alert: null_rate99.4% null
Fig 34.
Top values for Cstar Names.
Show data table
Top values for Cstar Names (20 unique shown, of 87 total).
valuecountshare
BD -12 1172,HD 3591410.0%
HD 16104410.0%
HD 18020610.0%
HD 1175810.0%
BD +46 1067,HD 3965910.0%
HD 7899110.0%
HD 8383210.0%
HD 8836710.0%
HD 9554110.0%
HD 11398110.0%
HD 12572010.0%
BD +12 2966,HD 14564910.0%
BD -21 4483,HD 15365510.0%
HD 15407210.0%
HD 15495210.0%
HD 16102810.0%
AS Sgr10.0%
HD 16767210.0%
BD -22 18309,HD 17113110.0%
HD 17328310.0%

Identifiers text identifier

This column holds astronomical object identifiers, with 12,179 unique values across 13,969 rows and almost everything in uppercase (allcaps_rate 0.998). The token distribution reveals catalog prefixes like 2MASX (9,805), MWSC, ESO, MCG, SDSS, and PGC, suggesting each cell concatenates cross-catalog designations (word_mean 5.26). Note the 12.81% null rate and that the column is near-unique, so it carries little aggregate signal on its own.

Treatment: Treat as a join key to external catalogs; do not use as a model feature.

anthropic:claude-opus-4-7 · confidence high
Out[94]:

saturn.columns["Identifiers"].stats

statvalue
n13,969
nulls1,789 (12.8%)
unique12,179
len_min 5
len_max 211
len_mean 71.84
len_median 74
len_p95 134
word_mean 5.255
word_median 5
n_empty 0
n_duplicates 1
duplicate_rate 8.21e-05
vocab_size 50,977
readability_flesch_mean 87.54
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0.9975
boilerplate_rate 0
alert: near_unique100.0% of rows are unique strings
alert: allcaps99.8% rows are all-caps
Fig 35.
Character-length distribution for Identifiers.
Show data table
Character-length distribution for Identifiers (mean: 71.8363711001642).
charscount
5 – 10581
10 – 15359
15 – 2034
20 – 26523
26 – 3178
31 – 36569
36 – 41188
41 – 4690
46 – 511018
51 – 56112
56 – 621292
62 – 67938
67 – 7275
72 – 771608
77 – 82305
82 – 87999
87 – 93110
93 – 98431
98 – 103875
103 – 10827
108 – 113618
113 – 11895
118 – 12323
123 – 129501
129 – 13412
134 – 139251
139 – 14452
144 – 14958
149 – 154164
154 – 16079
160 – 1657
165 – 17011
170 – 17514
175 – 18052
180 – 18510
185 – 1903
190 – 1960
196 – 20113
201 – 2060
206 – 2115

Common names categorical metadata

Vernacular labels for astronomical objects (e.g., 'Antennae Galaxies', 'Flame Nebula'), occasionally comma-concatenating multiple aliases in one cell. The column is 99.06% null across 13,969 rows, leaving only ~127 unique populated values with near-uniform distribution (entropy_ratio 0.998, top_rate 1.5%). Multi-name cells like 'Butterfly Galaxies,Siamese Twins' indicate the field is a delimited list rather than a clean category.

Treatment: Treat as sparse free-form labels: split on comma into a name list and use only for display or lookup, not modelling.

anthropic:claude-opus-4-7 · confidence high
Out[97]:

saturn.columns["Common names"].stats

statvalue
n13,969
nulls13,838 (99.1%)
unique127
top_value Antennae Galaxies
top_rate 0.01527
cardinality 127
entropy 6.972
entropy_ratio 0.9977
alert: long_tail123 singleton categories
alert: null_rate99.1% null
Fig 36.
Top values for Common names.
Show data table
Top values for Common names (20 unique shown, of 127 total).
valuecountshare
Antennae Galaxies20.0%
Eyes20.0%
Butterfly Galaxies,Siamese Twins20.0%
Eastern Veil,Network Nebula20.0%
omi Per Cloud10.0%
Barnard's Merope Nebula10.0%
Flaming Star Nebula10.0%
Flame Nebula,Orion B10.0%
Gem A10.0%
gam Cyg10.0%
Toby Jug Nebula10.0%
omi Vel Cluster10.0%
Browning10.0%
Coddington's Nebula10.0%
tet Car Cluster10.0%
lam Cen Nebula10.0%
rho Oph Nebula10.0%
Eagle Nebula,Star Queen10.0%
Small Sgr Star Cloud10.0%
Pelican Nebula10.0%

NED notes text metadata

Short astronomical annotations from the NGC/IC NED catalog, averaging 6.7 words and capped at 80 characters, describing object positions and identification caveats (e.g. 'In the Large Magellanic Cloud.', 'Confused HIPASS source'). 83.6% of rows are null and only 1,198 distinct strings appear across 13,969 rows, with a 47.6% duplicate rate driven by a small set of canned remarks. Language detection flags the field as multilingual but 780 of 783 detected samples are English; the bs/it/pt hits are almost certainly false positives on terse astronomy jargon.

Treatment: Treat as sparse categorical-ish notes: bucket the top recurring phrases as flags and ignore the long tail rather than embedding free text.

anthropic:claude-opus-4-7 · confidence high
Out[100]:

saturn.columns["NED notes"].stats

statvalue
n13,969
nulls11,683 (83.6%)
unique1,198
len_min 14
len_max 80
len_mean 39.64
len_median 35
len_p95 70
word_mean 6.658
word_median 6
n_empty 0
n_duplicates 1,088
duplicate_rate 0.4759
vocab_size 1,836
readability_flesch_mean 60.05
emoji_rate 0
url_rate 0
one_word_rate 0
allcaps_rate 0
boilerplate_rate 0.01575
alert: multilingual5 languages detected in sample
alert: null_rate83.6% null
alert: duplicates47.6% duplicate strings
Fig 37.
Character-length distribution for NED notes.
Show data table
Character-length distribution for NED notes (mean: 39.63867016622922).
charscount
14 – 165
16 – 1715
17 – 194
19 – 21132
21 – 22196
22 – 247
24 – 26108
26 – 2757
27 – 2955
29 – 30292
30 – 32171
32 – 3423
34 – 35108
35 – 3774
37 – 3914
39 – 4086
40 – 4284
42 – 4480
44 – 4573
45 – 4724
47 – 4956
49 – 5056
50 – 5218
52 – 5444
54 – 5559
55 – 5721
57 – 5950
59 – 6042
60 – 6215
62 – 6458
64 – 6560
65 – 6721
67 – 6830
68 – 7047
70 – 7215
72 – 7338
73 – 7527
75 – 778
77 – 7811
78 – 802

OpenNGC notes categorical free_text

Free-text curator notes on OpenNGC catalog entries, present for only 1.52% of rows (null_rate 0.9848). Among the 213 populated cells there are 159 distinct strings with high entropy (6.72, ratio 0.92), and the most common note — 'Identification taken from Corwin's catalog.' — covers just 13.6% of non-nulls. Content is heterogeneous provenance commentary referencing LEDA, NED, SIMBAD, and Corwin sources rather than a controlled vocabulary.

Treatment: Treat as sparse free-text metadata; drop for modelling or keep as a provenance flag (note present vs absent).

anthropic:claude-opus-4-7 · confidence high
Out[103]:

saturn.columns["OpenNGC notes"].stats

statvalue
n13,969
nulls13,756 (98.5%)
unique159
top_value Identification taken from Corwin’s catalog.
top_rate 0.1362
cardinality 159
entropy 6.719
entropy_ratio 0.9187
alert: long_tail147 singleton categories
alert: null_rate98.5% null
Fig 38.
Top values for OpenNGC notes.
Show data table
Top values for OpenNGC notes (20 unique shown, of 159 total).
valuecountshare
Identification taken from Corwin’s catalog.290.2%
No data available in LEDA.110.1%
NED lists this object as type G, but there's nothing at these coords in LEDA.40.0%
All sources doesn’t agree on the identification. Taken NED as source.40.0%
No diameter data available in LEDA.30.0%
This is a superassociation in the galaxy NGC 4395.30.0%
NED lists nothing. We use the identification from Corwin’s catalog.20.0%
NED lists this object as type G, but listed as Unknown in LEDA.20.0%
Radial velocity and redshift value are taken from NED, since SIMBAD values seem wrong.20.0%
LEDA data includes both components.20.0%
Caldwell 14 refers to both NGC869 and NGC88420.0%
Corwin swaps NGC3911 with NGC3920 in his catalog.20.0%
NED claims this to be TYC 4682-651-1 but by further investigation by Corwin this seems the same as NGC056010.0%
Corwin thinks this can be NGC0677.10.0%
Corwin thinks this can be UGC1260.10.0%
Corwin thinks this can be UGC1274.10.0%
LEDA data includes IC0186 NED03.10.0%
Object not present in NED.10.0%
Corwin thinks this the same of IC2121.10.0%
Coordinates taken from Corwin’s positions. NED lists this object as duplicate of NGC1893.10.0%

Sources categorical metadata

This column encodes a per-row provenance manifest: a pipe-delimited list of astronomical measurement fields (Type, RA, Dec, Const, magnitudes, redshift, etc.) each tagged with a numeric source/catalog code. Despite 344 distinct combinations across 13,969 rows, the distribution is highly concentrated — the top template covers 41.2% of rows and entropy ratio is just 0.41, confirming the long_tail alert. There are no nulls, but the structured-string format means it is not directly usable as a category without parsing.

Treatment: Parse into per-field source codes (one column per measurement) rather than treating the raw string as a category.

anthropic:claude-opus-4-7 · confidence high
Out[106]:

saturn.columns["Sources"].stats

statvalue
n13,969
nulls0 (0.0%)
unique344
top_value Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|RadVel:2|Redshift:2
top_rate 0.4116
cardinality 344
entropy 3.486
entropy_ratio 0.4138
alert: long_tail175 singleton categories
Fig 39.
Top values for Sources.
Show data table
Top values for Sources (20 unique shown, of 344 total).
valuecountshare
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|RadVel:2|Redshift:2575041.2%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|V-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|RadVel:2|Redshift:2267819.2%
Type:1|RA:1|Dec:1|Const:99180212.9%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|SurfBr:3|Hubble:3|RadVel:2|Redshift:23782.7%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|SurfBr:3|Hubble:32732.0%
Type:1|RA:1|Dec:1|Const:99|MajAx:5|B-Mag:2|V-Mag:2|Pax:2|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:22611.9%
Type:1|RA:1|Dec:1|Const:99|MajAx:992211.6%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|V-Mag:2|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|RadVel:2|Redshift:22211.6%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|RadVel:2|Redshift:21401.0%
Type:1|RA:1|Dec:1|Const:99|MajAx:7|MinAx:7|PosAng:7|B-Mag:2|V-Mag:21260.9%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|V-Mag:3|SurfBr:3|Hubble:3|RadVel:2|Redshift:2990.7%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3980.7%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|V-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:2920.7%
Type:1|RA:1|Dec:1|Const:99|MajAx:7|MinAx:7|B-Mag:2|V-Mag:2830.6%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|V-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|Pax:2|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:2730.5%
Type:1|RA:1|Dec:1|Const:99|MajAx:5|V-Mag:2|Pax:2|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:2700.5%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:2610.4%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|SurfBr:3|Hubble:3|Pax:2|Pm-RA:2|Pm-Dec:2|RadVel:2|Redshift:2570.4%
Type:1|RA:1|Dec:1|Const:99|MajAx:5|Pm-RA:2|Pm-Dec:2570.4%
Type:1|RA:1|Dec:1|Const:99|MajAx:3|MinAx:3|PosAng:3|B-Mag:3|J-Mag:2|H-Mag:2|K-Mag:2|Hubble:3|RadVel:2|Redshift:2480.3%

How to cite

click to copy

BibTeX
@misc{saturn-deepsky-ngc-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: deepsky ngc},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/deepsky-ngc}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:claude-opus-4-7},
}
APA
Steuber, L. (2026). Saturn reading: deepsky ngc. Source: /home/coolhand/data/celestial/deepsky/NGC.csv. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:claude-opus-4-7). Retrieved from https://dr.eamer.dev/saturn/view/deepsky-ngc