saturn·

data trove noaa lightning strikes 2018

saturn notebook · generated 2026-06-21 Report Notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/data/wild/weather/monthly_heatmap.json

Saturn profiled 59,070 rows across 4 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:
!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/data/wild/weather/monthly_heatmap.json",
    "--findings", "data-trove-noaa-lightning-strikes-2018.json",
    "--llm", "anthropic:default",
])

Summary confidence: high

This dataset contains 59,070 records of lightning strike activity, each described by geographic coordinates (latitude and longitude), a month, and a strike count. The strikes column is highly right-skewed (skew ~2.0, max 531 vs. median 34), meaning a small number of locations experience dramatically more lightning than typical — these ~2,900 outlier records are worth investigating. Latitude also shows ~9.5% outlier rate with a northward skew, suggesting strike activity is concentrated in a core geographic band but with notable events at higher latitudes.

citing: strikes.skew · strikes.median · strikes.max · strikes.n_outliers · strikes.outlier_rate · lat.n_outliers · lat.outlier_rate · lat.skew · row_count · month.min · month.max

Out[4]:

saturn.schema() · 4 columns

column kind n null% unique alerts
lat numeric 59,070 0.0% 868 outliers
lon numeric 59,070 0.0% 1,231
month numeric 59,070 0.0% 12
strikes numeric 59,070 0.0% 381 high_skew
Fig 1.
strikes · Look for the heavy right tail — most locations have modest strike counts but a long tail extends to 531, flagging extreme hotspots.
Show data table
Histogram bins for strikes (median: 34.0).
bincount
1 – 14.2515261
14.25 – 27.59914
27.5 – 40.757780
40.75 – 546090
54 – 67.254959
67.25 – 80.53431
80.5 – 93.752719
93.75 – 1072084
107 – 120.21684
120.2 – 133.51183
133.5 – 146.8882
146.8 – 160711
160 – 173.2578
173.2 – 186.5414
186.5 – 199.8320
199.8 – 213208
213 – 226.2205
226.2 – 239.5151
239.5 – 252.8109
252.8 – 26684
266 – 279.276
279.2 – 292.546
292.5 – 305.835
305.8 – 31934
319 – 332.234
332.2 – 345.523
345.5 – 358.811
358.8 – 3726
372 – 385.27
385.2 – 398.59
398.5 – 411.86
411.8 – 4253
425 – 438.25
438.2 – 451.51
451.5 – 464.83
464.8 – 4782
478 – 491.20
491.2 – 504.50
504.5 – 517.80
517.8 – 5312
Fig 2.
month · Check whether lightning activity clusters in summer months (median month is 7), suggesting a strong seasonal pattern.
Show data table
Histogram bins for month (median: 7.0).
bincount
1 – 1.275990
1.275 – 1.550
1.55 – 1.8250
1.825 – 2.1990
2.1 – 2.3750
2.375 – 2.650
2.65 – 2.9250
2.925 – 3.23300
3.2 – 3.4750
3.475 – 3.750
3.75 – 4.0253300
4.025 – 4.30
4.3 – 4.5750
4.575 – 4.850
4.85 – 5.1256600
5.125 – 5.40
5.4 – 5.6750
5.675 – 5.950
5.95 – 6.2259900
6.225 – 6.50
6.5 – 6.7750
6.775 – 7.059900
7.05 – 7.3250
7.325 – 7.60
7.6 – 7.8750
7.875 – 8.159900
8.15 – 8.4250
8.425 – 8.70
8.7 – 8.9750
8.975 – 9.256600
9.25 – 9.5250
9.525 – 9.80
9.8 – 10.083300
10.08 – 10.350
10.35 – 10.620
10.62 – 10.90
10.9 – 11.183300
11.18 – 11.450
11.45 – 11.730
11.73 – 12990
Fig 3.
lat · The distribution is concentrated between ~27–30°N with a skewed tail northward — note the ~9.5% outlier rate at higher latitudes.
Show data table
Histogram bins for lat (median: 28.84).
bincount
25.35 – 25.62
25.6 – 25.865
25.86 – 26.1135
26.11 – 26.36143
26.36 – 26.61475
26.61 – 26.871135
26.87 – 27.122153
27.12 – 27.373296
27.37 – 27.624020
27.62 – 27.884062
27.88 – 28.134007
28.13 – 28.383871
28.38 – 28.643643
28.64 – 28.893282
28.89 – 29.143236
29.14 – 29.393215
29.39 – 29.653527
29.65 – 29.93549
29.9 – 30.153239
30.15 – 30.412427
30.41 – 30.661482
30.66 – 30.91729
30.91 – 31.16272
31.16 – 31.4284
31.42 – 31.6715
31.67 – 31.9210
31.92 – 32.1720
32.17 – 32.4372
32.43 – 32.68227
32.68 – 32.93580
32.93 – 33.191045
33.19 – 33.441305
33.44 – 33.691424
33.69 – 33.941167
33.94 – 34.2762
34.2 – 34.45343
34.45 – 34.7166
34.7 – 34.9534
34.95 – 35.215
35.21 – 35.466
Fig 4.
lon · Longitude spans roughly –97 to –79 (Gulf/Southeast US region) with a smooth, slightly left-skewed spread and no outliers.
Show data table
Histogram bins for lon (median: -82.06).
bincount
-96.74 – -96.349
-96.3 – -95.85343
-95.85 – -95.411457
-95.41 – -94.972950
-94.97 – -94.532653
-94.53 – -94.081196
-94.08 – -93.64275
-93.64 – -93.226
-93.2 – -92.761
-92.76 – -92.310
-92.31 – -91.870
-91.87 – -91.4330
-91.43 – -90.98226
-90.98 – -90.541161
-90.54 – -90.13141
-90.1 – -89.663520
-89.66 – -89.212000
-89.21 – -88.77592
-88.77 – -88.3368
-88.33 – -87.882
-87.88 – -87.440
-87.44 – -870
-87 – -86.560
-86.56 – -86.110
-86.11 – -85.671
-85.67 – -85.2359
-85.23 – -84.79335
-84.79 – -84.341370
-84.34 – -83.92386
-83.9 – -83.462055
-83.46 – -83.01799
-83.01 – -82.57458
-82.57 – -82.131836
-82.13 – -81.695610
-81.69 – -81.249229
-81.24 – -80.88753
-80.8 – -80.364831
-80.36 – -79.921419
-79.92 – -79.47223
-79.47 – -79.0316
Fig 5.
Per-column null rate across the corpus. Columns are ordered by input position.
Show data table
Per-column null rate across the corpus.
columnkindnull %
latnumeric0.0%
lonnumeric0.0%
monthnumeric0.0%
strikesnumeric0.0%
Fig 6.
Pearson correlation across numeric columns (sampled, bounded).
Show data table
Pearson correlation across 4 numeric columns (values clipped to 2 decimals).
latlonmonthstrikes
lat+1.00-0.37-0.04+0.11
lon-0.37+1.00-0.05-0.07
month-0.04-0.05+1.00+0.00
strikes+0.11-0.07+0.00+1.00

lat numeric feature

This column is a geographic latitude, with values ranging from 25.35 to 35.46 degrees north — consistent with a mid-latitude region such as the Arabian Peninsula, South Asia, or the southern US Sun Belt. With only 868 unique values across 59,070 rows, latitudes are heavily discretised (likely snapped to a grid or centroid), not continuous GPS readings. The distribution is right-skewed (skew 1.15) with 9.5% of rows flagged as outliers (5,626 records), suggesting a dominant cluster around 27–30° with a long upper tail toward 35.46° that warrants geographic investigation.

Treatment: Pair with longitude for spatial joins or clustering; investigate the 5,626 outlier records above ~32° for data-quality or sub-population issues before modelling.

anthropic:default · confidence high
Out[12]:

saturn.columns["lat"].stats

statvalue
n59,070
nulls0 (0.0%)
unique868
min 25.35
max 35.46
mean 29.23
median 28.84
std 1.899
q1 27.84
q3 29.94
iqr 2.1
skew 1.147
kurtosis 0.7033
n_outliers 5,626
outlier_rate 0.09524
zero_rate 0
alert: outliers9.5% rows beyond 1.5 IQR
Fig 7.
Distribution of lat. Vertical dash marks the median.
Show data table
Histogram bins for lat (median: 28.84).
bincount
25.35 – 25.62
25.6 – 25.865
25.86 – 26.1135
26.11 – 26.36143
26.36 – 26.61475
26.61 – 26.871135
26.87 – 27.122153
27.12 – 27.373296
27.37 – 27.624020
27.62 – 27.884062
27.88 – 28.134007
28.13 – 28.383871
28.38 – 28.643643
28.64 – 28.893282
28.89 – 29.143236
29.14 – 29.393215
29.39 – 29.653527
29.65 – 29.93549
29.9 – 30.153239
30.15 – 30.412427
30.41 – 30.661482
30.66 – 30.91729
30.91 – 31.16272
31.16 – 31.4284
31.42 – 31.6715
31.67 – 31.9210
31.92 – 32.1720
32.17 – 32.4372
32.43 – 32.68227
32.68 – 32.93580
32.93 – 33.191045
33.19 – 33.441305
33.44 – 33.691424
33.69 – 33.941167
33.94 – 34.2762
34.2 – 34.45343
34.45 – 34.7166
34.7 – 34.9534
34.95 – 35.215
35.21 – 35.466

lon numeric feature

This column contains longitude coordinates, with all values falling between -96.74 and -79.03, consistent with the central/eastern United States (roughly spanning from Texas/Oklahoma eastward to the Great Lakes or Southeast). The distribution is moderately left-skewed (skew = -0.85) with mass concentrated toward the eastern end of the range (median -82.06, Q3 -81.22), suggesting most records originate from states like Ohio, Michigan, Florida, or nearby. Only 1,231 unique values across 59,070 rows indicates coordinates are discretized or snapped to a coarse grid rather than true GPS precision — an analyst should be aware this limits spatial resolution.

Treatment: Use as-is for spatial joins or clustering; note limited precision (1,231 unique values for 59,070 rows) before any fine-grained geospatial analysis.

anthropic:default · confidence high
Out[15]:

saturn.columns["lon"].stats

statvalue
n59,070
nulls0 (0.0%)
unique1,231
min -96.74
max -79.03
mean -85.27
median -82.06
std 5.252
q1 -89.94
q3 -81.22
iqr 8.72
skew -0.8513
kurtosis -0.8532
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 8.
Distribution of lon. Vertical dash marks the median.
Show data table
Histogram bins for lon (median: -82.06).
bincount
-96.74 – -96.349
-96.3 – -95.85343
-95.85 – -95.411457
-95.41 – -94.972950
-94.97 – -94.532653
-94.53 – -94.081196
-94.08 – -93.64275
-93.64 – -93.226
-93.2 – -92.761
-92.76 – -92.310
-92.31 – -91.870
-91.87 – -91.4330
-91.43 – -90.98226
-90.98 – -90.541161
-90.54 – -90.13141
-90.1 – -89.663520
-89.66 – -89.212000
-89.21 – -88.77592
-88.77 – -88.3368
-88.33 – -87.882
-87.88 – -87.440
-87.44 – -870
-87 – -86.560
-86.56 – -86.110
-86.11 – -85.671
-85.67 – -85.2359
-85.23 – -84.79335
-84.79 – -84.341370
-84.34 – -83.92386
-83.9 – -83.462055
-83.46 – -83.01799
-83.01 – -82.57458
-82.57 – -82.131836
-82.13 – -81.695610
-81.69 – -81.249229
-81.24 – -80.88753
-80.8 – -80.364831
-80.36 – -79.921419
-79.92 – -79.47223
-79.47 – -79.0316

month numeric feature

This column encodes calendar month as an integer from 1 to 12, with exactly 12 unique values and no nulls across 59,070 rows. Distribution is nearly uniform (mean 6.90, median 7.0, skew −0.15, kurtosis −0.21), suggesting fairly even coverage across all months with a slight lean toward the second half of the year. No outliers or zero values are present, and the near-flat kurtosis rules out strong seasonal concentration. The even spread is mildly surprising if the underlying data were expected to show seasonal patterns.

Treatment: Treat as a cyclic ordinal feature; apply sine/cosine encoding (sin(2π·month/12), cos(2π·month/12)) before modelling to capture cyclical continuity.

anthropic:default · confidence high
Out[18]:

saturn.columns["month"].stats

statvalue
n59,070
nulls0 (0.0%)
unique12
min 1
max 12
mean 6.899
median 7
std 2.335
q1 5
q3 8
iqr 3
skew -0.155
kurtosis -0.2056
n_outliers 0
outlier_rate 0
zero_rate 0
Fig 9.
Distribution of month. Vertical dash marks the median.
Show data table
Histogram bins for month (median: 7.0).
bincount
1 – 1.275990
1.275 – 1.550
1.55 – 1.8250
1.825 – 2.1990
2.1 – 2.3750
2.375 – 2.650
2.65 – 2.9250
2.925 – 3.23300
3.2 – 3.4750
3.475 – 3.750
3.75 – 4.0253300
4.025 – 4.30
4.3 – 4.5750
4.575 – 4.850
4.85 – 5.1256600
5.125 – 5.40
5.4 – 5.6750
5.675 – 5.950
5.95 – 6.2259900
6.225 – 6.50
6.5 – 6.7750
6.775 – 7.059900
7.05 – 7.3250
7.325 – 7.60
7.6 – 7.8750
7.875 – 8.159900
8.15 – 8.4250
8.425 – 8.70
8.7 – 8.9750
8.975 – 9.256600
9.25 – 9.5250
9.525 – 9.80
9.8 – 10.083300
10.08 – 10.350
10.35 – 10.620
10.62 – 10.90
10.9 – 11.183300
11.18 – 11.450
11.45 – 11.730
11.73 – 12990

strikes numeric feature

This column almost certainly represents a count of strikes (e.g., bowling, lightning, labor, or similar discrete events) per record, with values ranging from 1 to 531 across 381 unique integers. The distribution is heavily right-skewed (skew = 2.02, kurtosis = 6.09): the median is only 34 while the mean is 49.43 and the std is 49.83, indicating a long upper tail. Notably, 4.9% of rows (2,917) are flagged as outliers, and the IQR of 54 spans Q1=14 to Q3=68, confirming most records cluster low while a minority have very high counts up to 531.

Treatment: Log-transform (log1p) before regression or distance-based modelling to reduce skew; investigate outlier group (n=2,917) for data quality or domain-specific segmentation.

anthropic:default · confidence medium
Out[21]:

saturn.columns["strikes"].stats

statvalue
n59,070
nulls0 (0.0%)
unique381
min 1
max 531
mean 49.43
median 34
std 49.83
q1 14
q3 68
iqr 54
skew 2.023
kurtosis 6.094
n_outliers 2,917
outlier_rate 0.04938
zero_rate 0
alert: high_skewskew=+2.02
Fig 10.
Distribution of strikes. Vertical dash marks the median.
Show data table
Histogram bins for strikes (median: 34.0).
bincount
1 – 14.2515261
14.25 – 27.59914
27.5 – 40.757780
40.75 – 546090
54 – 67.254959
67.25 – 80.53431
80.5 – 93.752719
93.75 – 1072084
107 – 120.21684
120.2 – 133.51183
133.5 – 146.8882
146.8 – 160711
160 – 173.2578
173.2 – 186.5414
186.5 – 199.8320
199.8 – 213208
213 – 226.2205
226.2 – 239.5151
239.5 – 252.8109
252.8 – 26684
266 – 279.276
279.2 – 292.546
292.5 – 305.835
305.8 – 31934
319 – 332.234
332.2 – 345.523
345.5 – 358.811
358.8 – 3726
372 – 385.27
385.2 – 398.59
398.5 – 411.86
411.8 – 4253
425 – 438.25
438.2 – 451.51
451.5 – 464.83
464.8 – 4782
478 – 491.20
491.2 – 504.50
504.5 – 517.80
517.8 – 5312

How to cite

click to copy

BibTeX
@misc{saturn-data-trove-noaa-lightning-strikes-2018-2026,
  author       = {Steuber, Luke},
  title        = {Saturn reading: data trove noaa lightning strikes 2018},
  year         ={2026},
  howpublished = {\url{https://dr.eamer.dev/saturn/view/data-trove-noaa-lightning-strikes-2018}},
  note         = {Profiled with saturn-dissect v0.2.0, prompt saturn-insight-v2, model anthropic:default},
}
APA
Steuber, L. (2026). Saturn reading: data trove noaa lightning strikes 2018. Source: /home/coolhand/html/datavis/data_trove/data/wild/weather/monthly_heatmap.json. Profiled with saturn-dissect v0.2.0 (saturn-insight-v2, anthropic:default). Retrieved from https://dr.eamer.dev/saturn/view/data-trove-noaa-lightning-strikes-2018