saturn

/home/coolhand/data/celestial/hyg/hygdata_v41.csv 119,626 rows sample n=119,626 seed 42 2026-05-01T23:28:28+00:00

Overview

Source/home/coolhand/data/celestial/hyg/hygdata_v41.csv
Total rows119,626
Profiled sample119,626
Columns37
Generated2026-05-01T23:28:28+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is the HYG star catalog (hygdata_v41.csv) with 119,626 stars and 37 columns covering positions (ra/dec, x/y/z), motion (pmra, pmdec, vx/vy/vz, rv), brightness (mag, absmag, lum, ci), and identifiers/classifications (hd, hip, spect, con, proper). The most informative single field is the spectral type 'spect': it has 4,310 distinct values but is dominated by a handful of classes (K0 ~8.6k, G5 ~6.0k, A0 ~4.9k), giving a clean view of stellar populations. Distance and luminosity are extremely right-skewed (lum skew ≈49, dist max 100,000 pc) with 10–15% outliers, so any analysis on those should use log scales. Radial velocity 'rv' is 81% zeros — effectively a 'measured vs not' flag rather than a continuous variable. Constellation 'con' is the most evenly distributed categorical (89 values, entropy ratio 0.95) led by Cen, UMa, and Her, making it a good grouping key.

id high anthropic:claude-opus-4-7

This is a row identifier: 119,626 values, all unique, no nulls, ranging from 0 to 119,630 with a near-perfectly uniform distribution (mean 59,813.16, median 59,813.5, skew ~1.5e-05, kurtosis -1.20). The min of 0 and max of 119,630 against n=119,626 suggests a 0-based sequential id with a handful of gaps. No analytical signal lives here.

hip high anthropic:claude-opus-4-7

This is almost certainly the Hipparcos catalog identifier (HIP number) for stars: integer values running from 1 to 120404 with 117951 unique values across 119626 rows, near-perfectly uniform (skew ≈ 0.0002, kurtosis ≈ -1.2). The 1.4% null rate suggests some rows lack a Hipparcos cross-match. No outliers and no zeros, consistent with a catalog index rather than a measurement.

hd medium anthropic:claude-opus-4-7

The 'hd' column is a numeric field with 119,626 rows and 98,825 unique values, suggesting a near-identifier or high-cardinality measurement rather than a categorical feature. Values span 1 to 358,431 with a mean of 114,357 and median of 110,358, showing a roughly symmetric distribution (skew 0.28, kurtosis -0.73) and no flagged outliers. Notably, 17.34% of rows are null, which is substantial and would need handling before any downstream use.

hr medium anthropic:claude-opus-4-7

The 'hr' column is a numeric field populated for only 7.6% of the 119,626 rows (null_rate 0.9244), with values ranging from 1 to 9110 and 9,029 distinct values. Its near-zero skew (-0.003), flat kurtosis (-1.20), and mean (4563.9) almost equal to median (4566.0) suggest a near-uniform distribution across the 1–9110 range rather than a typical hour-of-day or heart-rate measure. The extreme null rate is the dominant concern.

gl high anthropic:claude-opus-4-7

This column 'gl' holds Gliese/GJ catalogue identifiers for stars (e.g., 'GJ 1293', 'Gl 914B'), with 'gj' and 'gl' being the dominant tokens (344 and 331 occurrences). It is overwhelmingly empty: 115,825 of 119,626 rows are blank, driving a 96.8% duplicate rate and leaving only ~3,800 unique designations. When populated, values are short single tokens (len_max 9, word_mean 1.03).

bf high anthropic:claude-opus-4-7

This column appears to hold Bayer/Flamsteed star designations (e.g. "41The1Ori", "66Alp Gem"), with the trailing tokens being three-letter constellation abbreviations like leo, her, cyg, vir. It is overwhelmingly empty: 116,527 of 119,626 rows are blank strings, driving a 97.4% duplicate rate and a mean length of 0.22 characters. Among the 3,099 non-empty entries the values are nearly unique, suggesting this label only applies to a small subset of named/cataloged stars.

proper high anthropic:claude-opus-4-7

This is the proper name of a star or named celestial object, populated for only a tiny fraction of rows. Empty strings dominate at 119161 of 119626 (top_rate 0.9961), leaving 465 distinct named entries like 'Sol', 'Alpheratz', and 'Caph' essentially as singletons. Entropy ratio of 0.008 confirms the column carries almost no information in aggregate.

ra high anthropic:claude-opus-4-7

Values span 0.0 to 23.998594 with a near-symmetric distribution (skew -0.012, kurtosis -1.20) and mean 12.09 close to median 12.13, consistent with right ascension expressed in hours. With 119263 unique values across 119626 rows, near-zero zero_rate, and no nulls or outliers, the column behaves like a continuous astronomical coordinate rather than a categorical feature.

dec high anthropic:claude-opus-4-7

This column is almost certainly declination (dec), the celestial latitude coordinate, with values bounded in [-89.78, 89.57] degrees matching the full sky range. The distribution is nearly symmetric (skew 0.04) and platykurtic (kurtosis -1.02) with median near -1.64 and IQR spanning ~68 degrees, suggesting broad sky coverage rather than a concentrated survey footprint. With 119,534 unique values across 119,626 rows and no nulls or outliers, it behaves as a continuous astrometric coordinate.

dist high anthropic:claude-opus-4-7

Numeric 'dist' column (likely a distance measurement) with 119,626 rows, no nulls, and 5,397 distinct values. The distribution is severely right-skewed (skew 2.97, kurtosis 6.79): median is 213.68 with IQR 115.07–392.16, yet the mean is 8,772.29 and the max reaches exactly 100,000, suggesting a capped or sentinel ceiling. Over 10% of rows (12,350) flag as outliers and std (27,890.67) dwarfs the IQR.

pmra high anthropic:claude-opus-4-7

This is `pmra`, almost certainly proper motion in right ascension (mas/yr) for ~119k astronomical sources, centered near zero (median -1.68, mean -1.31) with an interquartile range of 27.64. The distribution is extremely heavy-tailed: skew 4.61, kurtosis 433.55, std 118.18, and extremes spanning -4432.65 to 6767.26, with 16.4% of rows (19,615) flagged as outliers. No nulls and only 0.04% exact zeros, so the field is densely populated but dominated by tail behaviour.

pmdec high anthropic:claude-opus-4-7

This is `pmdec`, almost certainly proper motion in declination (mas/yr) from an astrometric catalog. The bulk sits in a tight IQR of -22.4 to 3.77 around a median of -5.76, but the distribution is extremely heavy-tailed: kurtosis of 934.5, skew -2.60, and a min of -5813 against a max of 9999.99 — the latter looks like a sentinel/missing-value flag rather than a real motion. About 14.4% of rows (17,188) fall outside the standard outlier fence.

rv high anthropic:claude-opus-4-7

`rv` is a numeric feature dominated by zeros: the median, Q1, and Q3 are all 0.0, and 81.07% of values are exactly zero (zero_rate 0.8107). The non-zero tail is wide and heavy, spanning -386.9 to 471.0 with std 13.90 and kurtosis 116.06, producing 22,643 outliers (18.93% outlier rate). Despite the extremes, mean (-0.276) and skew (0.371) are modest, suggesting roughly balanced positive/negative excursions around a sparse zero baseline.

mag high anthropic:claude-opus-4-7

Numeric column 'mag' looks like an astronomical magnitude reading: values are tightly clustered around a median of 8.46 with an IQR of 1.52, but the range stretches from -26.7 to 21.0. That extreme negative tail (consistent with very bright objects like the Sun at -26.7) drives the high kurtosis of 6.35 and flags 5,241 outliers (4.4%) despite near-symmetric skew of 0.16. No nulls or zeros, and 1,422 distinct values across 119,626 rows suggest measurements rounded to ~0.01.

absmag high anthropic:claude-opus-4-7

This is a numeric `absmag` field, almost certainly absolute magnitude on an astronomical scale, ranging from -16.68 to 19.629 with a median of 1.495 and IQR of 3.021. The distribution is left-skewed (skew -1.37) with heavier tails than normal (kurtosis 3.17), and 11.29% of rows (13,508) flag as outliers — consistent with a long bright-end tail rather than data errors. No nulls and only 0.015% zeros across 119,626 rows, with 13,452 unique values suggesting quantised reporting.

spect high anthropic:claude-opus-4-7

This column holds stellar spectral type codes (e.g. K0, G5, A0, F8) — short one-word tokens averaging 3.4 characters with a 1,532-word vocabulary across 119,626 rows. Values are highly repetitive (96.4% duplicate rate, only 4,310 unique), which is expected for a categorical taxonomy, and 3,048 rows are empty. Mixed casing shows up as a 45.8% allcaps rate, suggesting inconsistent capitalization (e.g. K0III vs lowercase forms) that should be normalized.

ci high anthropic:claude-opus-4-7

Numeric feature 'ci' spans -0.4 to 5.46 with mean 0.71 and median 0.616, suggesting a bounded continuous measurement (possibly a colour index or similar physical quantity given the name). Distribution is mildly right-skewed (0.37) with light tails (kurtosis -0.26) and only 208 outliers (0.18%). Negative values exist but zeros are rare (0.15%), and 1.58% of rows are null.

x high anthropic:claude-opus-4-7

Numeric feature 'x' is effectively continuous (119,593 unique values across 119,626 rows, no nulls) and centered near zero (median -1.05, Q1/Q3 of -89.04/86.27). The distribution has extreme tails: min -99,950 and max 99,982 push the standard deviation to 15,182 against an IQR of just 175, with kurtosis 19.16 and 13.07% of rows flagged as outliers. Slight negative skew (-0.22) suggests the tails are roughly symmetric in direction but heavy in magnitude.

y high anthropic:claude-opus-4-7

A continuous numeric feature centered near zero (median -1.24, mean -39.3) but with an extraordinarily wide spread (std ~17249, min -99979, max 99996). The distribution is roughly symmetric (skew 0.12) yet heavy-tailed (kurtosis 18.0), and 13.9% of values flag as outliers — the bulk sits within an IQR of ~183 while extremes reach ±100k. Near-unique values (119585 of 119626) and effectively no zeros or nulls suggest a measured signal rather than a category or sentinel-coded field.

z high anthropic:claude-opus-4-7

Column z is a high-cardinality numeric feature (119,588 unique values across 119,626 rows, no nulls) centered near zero with median -3.42 and IQR roughly -107.6 to 95.0. The distribution has extreme tails: min -99,964.98, max 99,862.51, std 18,074.56, and kurtosis 15.49, with 13.7% of rows (16,441) flagged as outliers despite skew of only -0.27. The IQR is two orders of magnitude smaller than the standard deviation, indicating a tight core swamped by heavy symmetric tails.

vx high anthropic:claude-opus-4-7

`vx` is a numeric feature centered tightly around zero (median 1.3e-07, IQR 1.94e-05) but with a symmetric extreme range of ±0.10227249 — almost certainly a velocity-like component (x-axis). The distribution is pathologically heavy-tailed: kurtosis 1307.6 and skew -11.5, with 13.2% of values flagged as outliers despite std being only 0.00178. The exact symmetry of min and max suggests a hard clipping bound at ±0.10227249.

vy high anthropic:claude-opus-4-7

Likely a velocity component (vy) for ~119k objects, with values clustered tightly around zero (median 1.18e-05, IQR 3.4e-05) but spanning -0.102 to 0.102. The distribution is extraordinarily heavy-tailed (skew 15.6, kurtosis 678) and 12.0% of rows fall outside the Tukey fences, so a small minority of fast movers dominate the variance (std 0.0022 vs IQR ~3e-05).

vz high anthropic:claude-opus-4-7

A signed numeric quantity centred almost exactly on zero (median -6.23e-06, mean -1.57e-04) with an extremely tight IQR of 2.31e-05 — consistent with a vertical velocity or rate-of-change feature (the name 'vz' suggests a z-axis velocity). The distribution is pathologically heavy-tailed: kurtosis 1029.85, skew -20.30, and symmetric extremes at ±0.10227249 produce 15,774 outliers (13.2%). Despite 119,626 rows there are only 23,037 unique values, hinting at quantisation or repeated stationary readings.

rarad high anthropic:claude-opus-4-7

Values span 0 to 6.2828 with mean 3.166 and median 3.175, consistent with a right ascension expressed in radians (0 to 2π). The distribution is essentially symmetric (skew -0.012) and platykurtic (kurtosis -1.198), close to uniform across the circle, with no outliers and only one zero out of 119,626 rows. With 119,585 unique values, this is a high-resolution continuous coordinate.

decrad high anthropic:claude-opus-4-7

This column appears to be a declination angle expressed in radians, ranging symmetrically from -1.567 to 1.563 (close to ±π/2). The distribution is near-symmetric (skew 0.037) with negative kurtosis (-1.02), indicating a flatter-than-normal spread, and 119,585 of 119,626 values are unique with no nulls or outliers. The mean (-0.035) and median (-0.029) sit near zero, consistent with an angular coordinate covering most of the celestial sphere.

pmrarad high anthropic:claude-opus-4-7

`pmrarad` is a tiny-magnitude signed numeric feature centered near zero (mean -6.4e-09, median -8.1e-09) with values on the order of 1e-7 to 1e-5, consistent with a proper-motion-in-RA quantity expressed in radians. The distribution is highly non-Gaussian: skew 4.6, kurtosis 433.6, and 16.4% of rows flagged as outliers, with extremes reaching 3.28e-05 against an IQR of just 1.34e-07. Nulls are absent and only 0.04% are exact zeros, so the heavy tails are real rather than artefacts of missingness.

pmdecrad high anthropic:claude-opus-4-7

This is `pmdecrad`, proper motion in declination expressed in radians (per time unit), centred near zero with median -2.79e-08 and IQR of about 1.27e-07. The distribution is extremely heavy-tailed: kurtosis of 997.6, skew of -1.99, and 17,187 outliers (14.4% of rows) stretching from -2.82e-05 to 5.01e-05 against a std of 5.47e-07. Values are dense and continuous (23,588 unique across 119,626 rows) with no nulls and essentially no zeros.

bayer high anthropic:claude-opus-4-7

This is the Bayer designation for stars (Greek-letter prefix like 'Alp', 'Bet', 'Gam', 'Del'), with 104 distinct values across 119,626 rows. It is overwhelmingly empty: 118,089 of 119,626 rows (top_rate 0.987) carry no Bayer letter, leaving entropy at just 0.171 (entropy_ratio 0.026). The non-empty tail is roughly evenly distributed across the Greek alphabet, with Alpha (80) and Beta (77) most common.

flam high anthropic:claude-opus-4-7

The column 'flam' is a categorical field that is overwhelmingly empty: 116,889 of 119,626 rows (97.7%) hold the blank string. The remaining 2.3% spreads across 138 distinct values that look like small integers ('2','4','5','7'…'16'), each appearing roughly 48-53 times. Entropy ratio of 0.043 confirms almost no information content, and the imbalance alert is warranted.

con high anthropic:claude-opus-4-7

Three-letter IAU constellation abbreviations (Cen, UMa, Her, Cyg...), with all 89 of the standard 88+1 codes represented across 119,626 rows and zero nulls. The distribution is remarkably flat: entropy ratio is 0.949 and the most common value, Cen, accounts for only 3.57% of records, so no single constellation dominates. Useful as a sky-region grouping key rather than a predictive feature on its own.

comp high anthropic:claude-opus-4-7

The column 'comp' is a numeric field with only 3 distinct values bounded between 1.0 and 3.0, with a median, Q1, and Q3 all equal to 1.0, indicating it is effectively a low-cardinality code or flag rather than a continuous measure. The mean of 1.0048 shows the value 1 dominates almost entirely, with only 536 outliers (0.45%) deviating, producing extreme skew (16.7) and kurtosis (311.1).

comp_primary high anthropic:claude-opus-4-7

The column 'comp_primary' contains 119,626 nearly unique numeric values (119,190 distinct) ranging from 0 to 119,630 with mean 59,641.96 and median 59,634.5. The near-perfect symmetry (skew 0.0004), negative kurtosis (-1.20), and quartiles (Q1 29,815.25, Q3 89,462.75) closely matching a uniform distribution over [0, n] strongly suggest this is a row index or sequential identifier rather than a substantive measurement.

base high anthropic:claude-opus-4-7

The 'base' column is a categorical field that is effectively empty: 118,540 of 119,626 rows (top_rate 0.991) hold the empty string, leaving only ~1,086 rows distributed across 650 other values. The non-empty entries look like Gliese star catalog identifiers (e.g. 'Gl 57.1', 'Gl 60'), each appearing at most 3 times. Entropy ratio of 0.017 confirms almost no information content.

lum high anthropic:claude-opus-4-7

This is a luminosity-like numeric feature spanning roughly 1.2e-06 to 4.09e+08, with a median of about 21.98 but a mean of 356526 — clear evidence of an extremely heavy right tail. Skew of 49.27 and kurtosis of 3885.57 confirm a pathological distribution, and 17485 values (14.6%) fall outside the IQR fence. Of 119626 rows, 13465 are unique with no nulls or zeros, so the spread is genuine rather than padded.

var high anthropic:claude-opus-4-7

Column 'var' is a sparse short-code field: 113,634 of 119,626 rows (n_empty) are blank and the remaining values are 1–5 character tokens like 'R', 'S', 'T', 'RS'. Duplicate_rate is 0.987 with only 1,523 uniques, and one_word_rate is 1.0 with len_max of 5, suggesting a categorical abbreviation or flag rather than free text. The overwhelming emptiness (null_rate is reported as 0.0 but empties dominate) is the headline surprise.

var_min high anthropic:claude-opus-4-7

Numeric feature 'var_min' is populated for only ~14% of rows (null_rate 0.858), making it a sparse signal. Among the 16,991 observed values it ranges from -1.333 to 14.902 with mean 9.50 and median 9.849, and is left-skewed (skew -0.93) with mild kurtosis (1.25). About 2.6% of present values fall outside the IQR fence (449 outliers), and only 6,248 distinct values appear.

var_max high anthropic:claude-opus-4-7

Numeric feature 'var_max' (likely a per-record maximum of some variable) is missing for 85.8% of the 119,626 rows, leaving roughly 17k populated values spread over 6,090 distinct numbers. Among observed values it centers near a median of 9.646 with mean 9.259, ranges from -1.523 to 13.702, and is left-skewed (-0.97) with 325 outliers (1.9%). The dominant concern is the null rate, not the distribution shape.

Numeric correlation

id numeric

rows119,626
null0 (0.0%)
unique119,626
min0.000
max119,630
mean59,813
median59,814
std34,534
q129,906
q389,720
iqr59,814
skew1.47e-05
kurtosis-1.200
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

hip numeric

rows119,626
null1,675 (1.4%)
unique117,951
min1.000
max120,404
mean59,170
median59,172
std34,172
q129,564
q388,762
iqr59,198
skew1.94e-04
kurtosis-1.200
n_outliers0
outlier_rate0.000
zero_rate0.000

hd numeric

rows119,626
null20,741 (17.3%)
unique98,825
min1.000
max358,431
mean114,357
median110,358
std74,176
q146,723
q3175,823
iqr129,100
skew0.282
kurtosis-0.727
n_outliers0
outlier_rate0.000
zero_rate0.000

hr numeric

92.4% null
rows119,626
null110,585 (92.4%)
unique9,029
min1.000
max9,110
mean4,564
median4,566
std2,632
q12,283
q36,848
iqr4,565
skew-3.43e-03
kurtosis-1.202
n_outliers0
outlier_rate0.000
zero_rate0.000

gl text

96.8% rows are a single word 95th-percentile length under 20 chars 96.8% duplicate strings
rows119,626
null0 (0.0%)
unique3,802
len_min0
len_max9
len_mean0.226
len_median0.000
len_p950.000
word_mean1.032
word_median1.000
n_empty115,825
n_duplicates115,824
duplicate_rate0.968
vocab_size677
readability_flesch_mean4.808
emoji_rate0.000
url_rate0.000
one_word_rate0.968
allcaps_rate0.017
boilerplate_rate0.000
Sample values (first 10)

bf text

97.6% rows are a single word 95th-percentile length under 20 chars 97.4% duplicate strings
rows119,626
null0 (0.0%)
unique3,066
len_min0
len_max10
len_mean0.221
len_median0.000
len_p950.000
word_mean1.064
word_median1.000
n_empty116,527
n_duplicates116,560
duplicate_rate0.974
vocab_size369
readability_flesch_mean1.986
emoji_rate0.000
url_rate0.000
one_word_rate0.976
allcaps_rate0.000
boilerplate_rate0.000
Sample values (first 10)
  1. 29 Psc
  2. 64 Aql

proper categorical

463 singleton categories top value is 99.6% of rows
rows119,626
null0 (0.0%)
unique465
top_value
top_rate0.996
cardinality465
entropy0.071
entropy_ratio8.03e-03
Top values (rank 1–20)
  1. — 119,161
  2. p Eridani — 2
  3. Sol — 1
  4. Alpheratz — 1
  5. Caph — 1
  6. Algenib — 1
  7. Groombridge 34 — 1
  8. Citadelle — 1
  9. Ankaa — 1
  10. Felixvarela — 1
  11. Fulu — 1
  12. Schedar — 1
  13. Diphda — 1
  14. Cocibolca — 1
  15. 96 G. Psc — 1
  16. Achird — 1
  17. Van Maanen's Star — 1
  18. Castula — 1
  19. Cih — 1
  20. Nenque — 1

ra numeric

rows119,626
null0 (0.0%)
unique119,263
min0.000
max23.999
mean12.095
median12.127
std6.887
q16.217
q318.115
iqr11.898
skew-0.012
kurtosis-1.198
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

dec numeric

rows119,626
null0 (0.0%)
unique119,534
min-89.782
max89.569
mean-1.986
median-1.640
std40.965
q1-36.422
q331.515
iqr67.937
skew0.037
kurtosis-1.019
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

dist numeric

skew=+2.97 10.3% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique5,397
min0.000
max100,000
mean8,772
median213.675
std27,891
q1115.075
q3392.157
iqr277.082
skew2.965
kurtosis6.792
n_outliers12,350
outlier_rate0.103
zero_rate8.36e-06

pmra numeric

skew=+4.61 16.4% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique25,644
min-4,433
max6,767
mean-1.307
median-1.680
std118.175
q1-15.460
q312.180
iqr27.640
skew4.608
kurtosis433.552
n_outliers19,615
outlier_rate0.164
zero_rate4.01e-04

pmdec numeric

skew=-2.60 14.4% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique23,226
min-5,813
max10,000
mean-19.330
median-5.760
std112.502
q1-22.398
q33.770
iqr26.168
skew-2.605
kurtosis934.505
n_outliers17,188
outlier_rate0.144
zero_rate3.43e-04

rv numeric

18.9% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique1,714
min-386.900
max471.000
mean-0.276
median0.000
std13.904
q10.000
q30.000
iqr0.000
skew0.371
kurtosis116.062
n_outliers22,643
outlier_rate0.189
zero_rate0.811

mag numeric

rows119,626
null0 (0.0%)
unique1,422
min-26.700
max21.000
mean8.429
median8.460
std1.428
q17.650
q39.170
iqr1.520
skew0.161
kurtosis6.353
n_outliers5,241
outlier_rate0.044
zero_rate0.000

absmag numeric

11.3% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique13,452
min-16.680
max19.629
mean0.991
median1.495
std4.353
q10.138
q33.159
iqr3.021
skew-1.370
kurtosis3.168
n_outliers13,508
outlier_rate0.113
zero_rate1.50e-04

spect text

99.4% rows are a single word 45.8% rows are all-caps 95th-percentile length under 20 chars 96.4% duplicate strings
rows119,626
null0 (0.0%)
unique4,310
len_min0
len_max12
len_mean3.376
len_median3.000
len_p958.000
word_mean1.009
word_median1.000
n_empty3,048
n_duplicates115,316
duplicate_rate0.964
vocab_size1,532
readability_flesch_mean98.195
emoji_rate0.000
url_rate0.000
one_word_rate0.994
allcaps_rate0.458
boilerplate_rate0.000
Sample values (first 10)
  1. B7III-IV
  2. K2
  3. A0...
  4. K1IV
  5. G0V
  6. F5
  7. G6/G8III
  8. F8
  9. K1III
  10. G0

ci numeric

rows119,626
null1,891 (1.6%)
unique2,439
min-0.400
max5.460
mean0.712
median0.616
std0.493
q10.348
q31.083
iqr0.734
skew0.373
kurtosis-0.255
n_outliers208
outlier_rate1.77e-03
zero_rate1.54e-03

x numeric

13.1% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique119,593
min-99,950
max99,982
mean-235.251
median-1.050
std15,183
q1-89.043
q386.268
iqr175.311
skew-0.223
kurtosis19.161
n_outliers15,635
outlier_rate0.131
zero_rate0.000

y numeric

13.9% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique119,585
min-99,979
max99,996
mean-39.324
median-1.239
std17,249
q1-91.176
q391.873
iqr183.049
skew0.117
kurtosis18.028
n_outliers16,582
outlier_rate0.139
zero_rate8.36e-06

z numeric

13.7% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique119,588
min-99,965
max99,863
mean-235.030
median-3.416
std18,075
q1-107.567
q394.974
iqr202.541
skew-0.272
kurtosis15.494
n_outliers16,441
outlier_rate0.137
zero_rate8.36e-06

vx numeric

skew=-11.51 13.2% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique21,555
min-0.102
max0.102
mean-2.89e-05
median1.30e-07
std1.78e-03
q1-1.03e-05
q39.09e-06
iqr1.94e-05
skew-11.509
kurtosis1,308
n_outliers15,752
outlier_rate0.132
zero_rate4.85e-04

vy numeric

skew=+15.59 12.0% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique25,826
min-0.102
max0.102
mean2.16e-04
median1.18e-05
std2.23e-03
q1-1.86e-06
q33.21e-05
iqr3.39e-05
skew15.591
kurtosis678.404
n_outliers14,368
outlier_rate0.120
zero_rate3.43e-04

vz numeric

skew=-20.30 13.2% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique23,037
min-0.102
max0.102
mean-1.57e-04
median-6.23e-06
std1.95e-03
q1-2.00e-05
q33.15e-06
iqr2.31e-05
skew-20.298
kurtosis1,030
n_outliers15,774
outlier_rate0.132
zero_rate5.35e-04

rarad numeric

rows119,626
null0 (0.0%)
unique119,585
min0.000
max6.283
mean3.166
median3.175
std1.803
q11.628
q34.743
iqr3.115
skew-0.012
kurtosis-1.198
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

decrad numeric

rows119,626
null0 (0.0%)
unique119,585
min-1.567
max1.563
mean-0.035
median-0.029
std0.715
q1-0.636
q30.550
iqr1.186
skew0.037
kurtosis-1.019
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

pmrarad numeric

skew=+4.61 16.4% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique25,647
min-2.15e-05
max3.28e-05
mean-6.40e-09
median-8.14e-09
std5.73e-07
q1-7.50e-08
q35.91e-08
iqr1.34e-07
skew4.607
kurtosis433.617
n_outliers19,615
outlier_rate0.164
zero_rate4.01e-04

pmdecrad numeric

14.4% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique23,588
min-2.82e-05
max5.01e-05
mean-9.36e-08
median-2.79e-08
std5.47e-07
q1-1.09e-07
q31.83e-08
iqr1.27e-07
skew-1.992
kurtosis997.579
n_outliers17,187
outlier_rate0.144
zero_rate3.34e-04

bayer categorical

top value is 98.7% of rows
rows119,626
null0 (0.0%)
unique104
top_value
top_rate0.987
cardinality104
entropy0.171
entropy_ratio0.026
Top values (rank 1–20)
  1. — 118,089
  2. Alp — 80
  3. Bet — 77
  4. Eps — 74
  5. Del — 71
  6. Eta — 69
  7. Zet — 68
  8. Gam — 68
  9. The — 61
  10. Iot — 60
  11. Lam — 56
  12. Kap — 54
  13. Mu — 49
  14. Nu — 45
  15. Sig — 39
  16. Xi — 38
  17. Pi — 37
  18. Rho — 36
  19. Omi — 36
  20. Tau — 35

flam categorical

top value is 97.7% of rows
rows119,626
null0 (0.0%)
unique139
top_value
top_rate0.977
cardinality139
entropy0.307
entropy_ratio0.043
Top values (rank 1–20)
  1. — 116,889
  2. 7 — 53
  3. 8 — 51
  4. 5 — 51
  5. 9 — 50
  6. 4 — 50
  7. 10 — 49
  8. 12 — 49
  9. 16 — 49
  10. 2 — 48
  11. 11 — 48
  12. 13 — 48
  13. 14 — 48
  14. 15 — 48
  15. 1 — 47
  16. 3 — 46
  17. 6 — 46
  18. 17 — 46
  19. 21 — 45
  20. 20 — 44

con categorical

rows119,626
null0 (0.0%)
unique89
top_valueCen
top_rate0.036
cardinality89
entropy6.147
entropy_ratio0.949
Top values (rank 1–20)
  1. Cen — 4,270
  2. UMa — 3,616
  3. Her — 3,434
  4. Cyg — 3,116
  5. Hya — 3,061
  6. Cet — 3,030
  7. Vir — 2,921
  8. Eri — 2,789
  9. Peg — 2,744
  10. Dra — 2,722
  11. Sgr — 2,504
  12. Boo — 2,477
  13. Pup — 2,427
  14. Cas — 2,352
  15. Tau — 2,281
  16. Oph — 2,270
  17. Vel — 2,238
  18. Aqr — 2,188
  19. Leo — 2,165
  20. Car — 2,162

comp numeric

skew=+16.71
rows119,626
null0 (0.0%)
unique3
min1.000
max3.000
mean1.005
median1.000
std0.074
q11.000
q31.000
iqr0.000
skew16.709
kurtosis311.095
n_outliers536
outlier_rate4.48e-03
zero_rate0.000

comp_primary numeric

rows119,626
null0 (0.0%)
unique119,190
min0.000
max119,630
mean59,642
median59,634
std34,442
q129,815
q389,463
iqr59,648
skew3.99e-04
kurtosis-1.200
n_outliers0
outlier_rate0.000
zero_rate8.36e-06

base categorical

top value is 99.1% of rows
rows119,626
null0 (0.0%)
unique651
top_value
top_rate0.991
cardinality651
entropy0.159
entropy_ratio0.017
Top values (rank 1–20)
  1. — 118,540
  2. Gl 57.1 — 3
  3. Gl 60 — 3
  4. Gl 100 — 3
  5. Gl 106.1 — 3
  6. Gl 118.2 — 3
  7. Gl 120.1 — 3
  8. Gl 140 — 3
  9. Gl 153 — 3
  10. Gl 166 — 3
  11. Gl 225.2 — 3
  12. Gl 278 — 3
  13. Gl 294 — 3
  14. Gl 319 — 3
  15. Gl 321.3 — 3
  16. Gl 331 — 3
  17. Gl 421 — 3
  18. Gl 520 — 3
  19. GJ 9490 — 3
  20. Gl 586 — 3

lum numeric

skew=+49.27 14.6% rows beyond 1.5 IQR
rows119,626
null0 (0.0%)
unique13,465
min1.23e-06
max409,260,660
mean356,526
median21.979
std3,341,375
q14.747
q376.701
iqr71.954
skew49.268
kurtosis3,886
n_outliers17,485
outlier_rate0.146
zero_rate0.000

var text

100.0% rows are a single word 95th-percentile length under 20 chars 98.7% duplicate strings
rows119,626
null0 (0.0%)
unique1,523
len_min0
len_max5
len_mean0.140
len_median0.000
len_p951.000
word_mean1.000
word_median1.000
n_empty113,634
n_duplicates118,103
duplicate_rate0.987
vocab_size597
readability_flesch_mean4.609
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.018
boilerplate_rate0.000
Sample values (first 10)

var_min numeric

85.8% null
rows119,626
null102,635 (85.8%)
unique6,248
min-1.333
max14.902
mean9.502
median9.849
std1.781
q18.526
q310.707
iqr2.181
skew-0.934
kurtosis1.251
n_outliers449
outlier_rate0.026
zero_rate0.000

var_max numeric

85.8% null
rows119,626
null102,635 (85.8%)
unique6,090
min-1.523
max13.702
mean9.259
median9.646
std1.742
q18.243
q310.492
iqr2.249
skew-0.970
kurtosis1.128
n_outliers325
outlier_rate0.019
zero_rate0.000