saturn·

data trove noaa lightning strikes 2018

source /home/coolhand/html/datavis/data_trove/data/wild/weather/monthly_heatmap.json 59,070 rows 4 columns profiled 2026-06-21 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:default

This dataset contains 59,070 records of lightning strike activity, each described by geographic coordinates (latitude and longitude), a month, and a strike count. The strikes column is highly right-skewed (skew ~2.0, max 531 vs. median 34), meaning a small number of locations experience dramatically more lightning than typical — these ~2,900 outlier records are worth investigating. Latitude also shows ~9.5% outlier rate with a northward skew, suggesting strike activity is concentrated in a core geographic band but with notable events at higher latitudes.

citing: strikes.skew · strikes.median · strikes.max · strikes.n_outliers · strikes.outlier_rate · lat.n_outliers · lat.outlier_rate · lat.skew · row_count · month.min · month.max

Schema

4 columns
Per-column summary. Click column name to jump to its detail.
Alerts
lat numeric 0.0% 868
outliers
lon numeric 0.0% 1,231
month numeric 0.0% 12
strikes numeric 0.0% 381
high_skew

lat

numeric feature outliers
This column is a geographic latitude, with values ranging from 25.35 to 35.46 degrees north — consistent with a mid-latitude region such as the Arabian Peninsula, South Asia, or the southern US Sun Belt. With only 868 unique values across 59,070 rows, latitudes are heavily discretised (likely snapped to a grid or centroid), not continuous GPS readings. The distribution is right-skewed (skew 1.15) with 9.5% of rows flagged as outliers (5,626 records), suggesting a dominant cluster around 27–30° with a long upper tail toward 35.46° that warrants geographic investigation. Treatment: Pair with longitude for spatial joins or clustering; investigate the 5,626 outlier records above ~32° for data-quality or sub-population issues before modelling. high · anthropic:default
n
59,070
nulls
0 (0.0%)
unique
868
min
25.35
max
35.46
mean
29.23
median
28.84
std
1.899
q1
27.84
q3
29.94
iqr
2.1
skew
1.147
kurtosis
0.7033
n_outliers
5,626
outlier_rate
0.09524
zero_rate
0

lon

numeric feature
This column contains longitude coordinates, with all values falling between -96.74 and -79.03, consistent with the central/eastern United States (roughly spanning from Texas/Oklahoma eastward to the Great Lakes or Southeast). The distribution is moderately left-skewed (skew = -0.85) with mass concentrated toward the eastern end of the range (median -82.06, Q3 -81.22), suggesting most records originate from states like Ohio, Michigan, Florida, or nearby. Only 1,231 unique values across 59,070 rows indicates coordinates are discretized or snapped to a coarse grid rather than true GPS precision — an analyst should be aware this limits spatial resolution. Treatment: Use as-is for spatial joins or clustering; note limited precision (1,231 unique values for 59,070 rows) before any fine-grained geospatial analysis. high · anthropic:default
n
59,070
nulls
0 (0.0%)
unique
1,231
min
-96.74
max
-79.03
mean
-85.27
median
-82.06
std
5.252
q1
-89.94
q3
-81.22
iqr
8.72
skew
-0.8513
kurtosis
-0.8532
n_outliers
0
outlier_rate
0
zero_rate
0

month

numeric feature
This column encodes calendar month as an integer from 1 to 12, with exactly 12 unique values and no nulls across 59,070 rows. Distribution is nearly uniform (mean 6.90, median 7.0, skew −0.15, kurtosis −0.21), suggesting fairly even coverage across all months with a slight lean toward the second half of the year. No outliers or zero values are present, and the near-flat kurtosis rules out strong seasonal concentration. The even spread is mildly surprising if the underlying data were expected to show seasonal patterns. Treatment: Treat as a cyclic ordinal feature; apply sine/cosine encoding (sin(2π·month/12), cos(2π·month/12)) before modelling to capture cyclical continuity. high · anthropic:default
n
59,070
nulls
0 (0.0%)
unique
12
min
1
max
12
mean
6.899
median
7
std
2.335
q1
5
q3
8
iqr
3
skew
-0.155
kurtosis
-0.2056
n_outliers
0
outlier_rate
0
zero_rate
0

strikes

numeric feature high_skew
This column almost certainly represents a count of strikes (e.g., bowling, lightning, labor, or similar discrete events) per record, with values ranging from 1 to 531 across 381 unique integers. The distribution is heavily right-skewed (skew = 2.02, kurtosis = 6.09): the median is only 34 while the mean is 49.43 and the std is 49.83, indicating a long upper tail. Notably, 4.9% of rows (2,917) are flagged as outliers, and the IQR of 54 spans Q1=14 to Q3=68, confirming most records cluster low while a minority have very high counts up to 531. Treatment: Log-transform (log1p) before regression or distance-based modelling to reduce skew; investigate outlier group (n=2,917) for data quality or domain-specific segmentation. medium · anthropic:default
n
59,070
nulls
0 (0.0%)
unique
381
min
1
max
531
mean
49.43
median
34
std
49.83
q1
14
q3
68
iqr
54
skew
2.023
kurtosis
6.094
n_outliers
2,917
outlier_rate
0.04938
zero_rate
0