Summary confidence: high
This is the HYG star catalog (hygdata_v41.csv) with 119,626 stars and 37 columns covering positions (ra/dec, x/y/z), motion (pmra, pmdec, vx/vy/vz, rv), brightness (mag, absmag, lum, ci), and identifiers/classifications (hd, hip, spect, con, proper). The most informative single field is the spectral type 'spect': it has 4,310 distinct values but is dominated by a handful of classes (K0 ~8.6k, G5 ~6.0k, A0 ~4.9k), giving a clean view of stellar populations. Distance and luminosity are extremely right-skewed (lum skew ≈49, dist max 100,000 pc) with 10–15% outliers, so any analysis on those should use log scales. Radial velocity 'rv' is 81% zeros — effectively a 'measured vs not' flag rather than a continuous variable. Constellation 'con' is the most evenly distributed categorical (89 values, entropy ratio 0.95) led by Cen, UMa, and Her, making it a good grouping key.
citing: row_count · column_count · spect.top_values · spect.n_unique · dist.skew · dist.max · lum.skew · lum.max · rv.zero_rate · con.entropy_ratio · con.top_values · mag.median · absmag.skew