saturn·

quirky aurora

source /home/coolhand/html/datavis/data_trove/data/quirky/aurora.json 300 rows 8 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset captures 300 minute-by-minute aurora and solar wind observations starting 2026-01-20, with 8 columns covering geomagnetic indices (kp_index, estimated_kp, intensity), solar wind conditions (speed, density), and a categorical activity label. The activity field is heavily skewed toward 'Moderate Storm' (172 of 300, ~57%), with only 8 'Quiet' readings — worth a closer look since this dominates the storyline. The kp_index and intensity columns are left-skewed and pile up at their max values (median equals max), with ~15% flagged as low-side outliers, suggesting the sample is a sustained storm period rather than a balanced range. Solar wind speed is also unusually elevated (min 881, max 1051 km/s), reinforcing that this is a storm-window snapshot rather than typical conditions.

citing: activity.top_values · activity.top_rate · kp_index.stats · estimated_kp.stats · intensity.stats · solar_wind_speed.stats · solar_wind_density.stats · row_count

Schema

8 columns
Per-column summary. Click column name to jump to its detail.
Alerts
time_tag categorical 0.0% 300
long_tail
kp_index numeric 0.0% 7
outliers
estimated_kp numeric 0.0% 18
outliers
activity categorical 0.0% 5
intensity numeric 0.0% 7
outliers
solar_wind_time categorical 0.0% 299
long_tail
solar_wind_speed numeric 0.0% 266
solar_wind_density numeric 0.0% 215

time_tag

categorical timestamp long_tail
This is an ISO-8601 timestamp column at minute resolution, with all 300 values unique (entropy_ratio 1.0) and zero nulls. The visible top values form a contiguous one-minute sequence starting 2026-01-20T00:06:00, consistent with a regular time index rather than event timestamps. Cardinality equals row count, so it acts as a row key over time. Treatment: Parse to datetime and use as the time index; do not one-hot encode. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
300
top_value
2026-01-20T00:06:00
top_rate
0.003333
cardinality
300
entropy
8.229
entropy_ratio
1

kp_index

numeric feature outliers
Numeric column on a small 0-6 integer scale with only 7 distinct values, consistent with a Kp geomagnetic index reading. The distribution is heavily concentrated at the high end (median and Q3 both 6.0, mean 5.09) and strongly left-skewed (skew -1.68), with 46 low-side outliers flagged (15.3%). No nulls, and zeros are rare (1.3%). Treatment: Treat as ordinal categorical (7 levels) rather than continuous; outliers are real low-Kp readings, do not clip. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
7
min
0
max
6
mean
5.09
median
6
std
1.396
q1
5
q3
6
iqr
1
skew
-1.683
kurtosis
2.149
n_outliers
46
outlier_rate
0.1533
zero_rate
0.01333

estimated_kp

numeric feature outliers
This looks like an estimated Kp geomagnetic index value, bounded between 0.0 and 6.33 across 300 rows with only 18 unique values, consistent with the discrete third-step Kp scale. The distribution is heavily left-skewed (skew -1.30) with the median pinned at the maximum 6.33 and Q3 also at 6.33, meaning at least half the rows sit at the ceiling. 18 low-side outliers (6% rate) and a near-zero zero_rate (0.67%) confirm a long thin tail toward quiet conditions. Treatment: Treat as ordinal/discrete and consider binning or a rank transform before modelling given the ceiling-heavy left skew. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
18
min
0
max
6.33
mean
5.183
median
6.33
std
1.538
q1
4.67
q3
6.33
iqr
1.66
skew
-1.295
kurtosis
0.9485
n_outliers
18
outlier_rate
0.06
zero_rate
0.006667

activity

categorical label
Categorical descriptor of geomagnetic or space-weather activity level, with 5 ordered classes from 'Quiet' to 'Moderate Storm'. The distribution is heavily skewed toward storm conditions: 'Moderate Storm' alone accounts for 57.3% (172/300) of rows, while 'Quiet' appears just 8 times, suggesting the dataset is filtered to disturbed periods rather than representing typical activity. Entropy ratio of 0.73 confirms moderate concentration on the top class. Treatment: Treat as ordinal target; consider class-weighting or stratified sampling given the imbalance toward 'Moderate Storm'. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
5
top_value
Moderate Storm
top_rate
0.5733
cardinality
5
entropy
1.69
entropy_ratio
0.7278

intensity

numeric feature outliers
Numeric 'intensity' column bounded between 0.0 and 0.667 with only 7 unique values across 300 rows, suggesting a discretised or quantised measurement rather than a continuous reading. Distribution is heavily left-skewed (skew -1.68) with median equal to the max (0.667), and 46 rows (15.3%) flagged as outliers on the low end. The handful of zeros (1.3%) and the ceiling at 0.667 hint at a capped or normalised score. Treatment: Treat as ordinal/categorical given only 7 unique values, or bin explicitly before modelling. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
7
min
0
max
0.667
mean
0.5658
median
0.667
std
0.1553
q1
0.556
q3
0.667
iqr
0.111
skew
-1.682
kurtosis
2.139
n_outliers
46
outlier_rate
0.1533
zero_rate
0.01333

solar_wind_time

categorical timestamp long_tail
This column is a minute-resolution timestamp of solar wind observations, all falling on 2026-01-20 with values like '2026-01-20 03:58:00.000'. With 299 unique values across 300 rows and entropy ratio 0.9997, it is effectively a per-row time index; only '2026-01-20 03:58:00.000' repeats (twice). No nulls, but the long_tail alert reflects this near-unique structure rather than meaningful categories. Treatment: Parse to datetime and use as the time axis; do not treat as a categorical feature. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
299
top_value
2026-01-20 03:58:00.000
top_rate
0.006667
cardinality
299
entropy
8.222
entropy_ratio
0.9998

solar_wind_speed

numeric feature
Numeric measurements of solar wind speed across 300 records, all populated and tightly clustered between 881.6 and 1051.3 with a mean of 955.79 and median of 940.15. The distribution is mildly right-skewed (0.45) and platykurtic (kurtosis -1.32), suggesting a flatter-than-normal spread with no outliers flagged. With 266 unique values out of 300, the column behaves as a continuous physical feature rather than a categorical one. Treatment: Use as-is for modelling; optional standardization given the narrow range and mild skew. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
266
min
881.6
max
1051
mean
955.8
median
940.1
std
51.98
q1
908.5
q3
1008
iqr
99.8
skew
0.4506
kurtosis
-1.323
n_outliers
0
outlier_rate
0
zero_rate
0

solar_wind_density

numeric feature
This is a numeric feature capturing solar wind density, fully populated across 300 rows with 215 distinct values. The distribution is fairly symmetric (skew 0.26, kurtosis 0.12) with mean 3.10 and median 3.14, ranging from 0.23 to 8.32 and an IQR of 2.02. Only 2 outliers (0.67%) appear, so the column looks clean and well-behaved. Treatment: Use as-is in modelling; standard scaling is sufficient given the near-symmetric distribution. high · anthropic:claude-opus-4-7
n
300
nulls
0 (0.0%)
unique
215
min
0.23
max
8.32
mean
3.103
median
3.14
std
1.344
q1
2.09
q3
4.11
iqr
2.02
skew
0.2555
kurtosis
0.1168
n_outliers
2
outlier_rate
0.006667
zero_rate
0