data trove noaa lightning strikes 2018
Reading
This dataset contains 59,070 records of lightning strike activity, each described by geographic coordinates (latitude and longitude), a month, and a strike count. The strikes column is highly right-skewed (skew ~2.0, max 531 vs. median 34), meaning a small number of locations experience dramatically more lightning than typical — these ~2,900 outlier records are worth investigating. Latitude also shows ~9.5% outlier rate with a northward skew, suggesting strike activity is concentrated in a core geographic band but with notable events at higher latitudes.
citing: strikes.skew · strikes.median · strikes.max · strikes.n_outliers · strikes.outlier_rate · lat.n_outliers · lat.outlier_rate · lat.skew · row_count · month.min · month.max
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 1 – 14.25 | 15261 |
| 14.25 – 27.5 | 9914 |
| 27.5 – 40.75 | 7780 |
| 40.75 – 54 | 6090 |
| 54 – 67.25 | 4959 |
| 67.25 – 80.5 | 3431 |
| 80.5 – 93.75 | 2719 |
| 93.75 – 107 | 2084 |
| 107 – 120.2 | 1684 |
| 120.2 – 133.5 | 1183 |
| 133.5 – 146.8 | 882 |
| 146.8 – 160 | 711 |
| 160 – 173.2 | 578 |
| 173.2 – 186.5 | 414 |
| 186.5 – 199.8 | 320 |
| 199.8 – 213 | 208 |
| 213 – 226.2 | 205 |
| 226.2 – 239.5 | 151 |
| 239.5 – 252.8 | 109 |
| 252.8 – 266 | 84 |
| 266 – 279.2 | 76 |
| 279.2 – 292.5 | 46 |
| 292.5 – 305.8 | 35 |
| 305.8 – 319 | 34 |
| 319 – 332.2 | 34 |
| 332.2 – 345.5 | 23 |
| 345.5 – 358.8 | 11 |
| 358.8 – 372 | 6 |
| 372 – 385.2 | 7 |
| 385.2 – 398.5 | 9 |
| 398.5 – 411.8 | 6 |
| 411.8 – 425 | 3 |
| 425 – 438.2 | 5 |
| 438.2 – 451.5 | 1 |
| 451.5 – 464.8 | 3 |
| 464.8 – 478 | 2 |
| 478 – 491.2 | 0 |
| 491.2 – 504.5 | 0 |
| 504.5 – 517.8 | 0 |
| 517.8 – 531 | 2 |
Show data table
| bin | count |
|---|---|
| 1 – 1.275 | 990 |
| 1.275 – 1.55 | 0 |
| 1.55 – 1.825 | 0 |
| 1.825 – 2.1 | 990 |
| 2.1 – 2.375 | 0 |
| 2.375 – 2.65 | 0 |
| 2.65 – 2.925 | 0 |
| 2.925 – 3.2 | 3300 |
| 3.2 – 3.475 | 0 |
| 3.475 – 3.75 | 0 |
| 3.75 – 4.025 | 3300 |
| 4.025 – 4.3 | 0 |
| 4.3 – 4.575 | 0 |
| 4.575 – 4.85 | 0 |
| 4.85 – 5.125 | 6600 |
| 5.125 – 5.4 | 0 |
| 5.4 – 5.675 | 0 |
| 5.675 – 5.95 | 0 |
| 5.95 – 6.225 | 9900 |
| 6.225 – 6.5 | 0 |
| 6.5 – 6.775 | 0 |
| 6.775 – 7.05 | 9900 |
| 7.05 – 7.325 | 0 |
| 7.325 – 7.6 | 0 |
| 7.6 – 7.875 | 0 |
| 7.875 – 8.15 | 9900 |
| 8.15 – 8.425 | 0 |
| 8.425 – 8.7 | 0 |
| 8.7 – 8.975 | 0 |
| 8.975 – 9.25 | 6600 |
| 9.25 – 9.525 | 0 |
| 9.525 – 9.8 | 0 |
| 9.8 – 10.08 | 3300 |
| 10.08 – 10.35 | 0 |
| 10.35 – 10.62 | 0 |
| 10.62 – 10.9 | 0 |
| 10.9 – 11.18 | 3300 |
| 11.18 – 11.45 | 0 |
| 11.45 – 11.73 | 0 |
| 11.73 – 12 | 990 |
Show data table
| bin | count |
|---|---|
| 25.35 – 25.6 | 2 |
| 25.6 – 25.86 | 5 |
| 25.86 – 26.11 | 35 |
| 26.11 – 26.36 | 143 |
| 26.36 – 26.61 | 475 |
| 26.61 – 26.87 | 1135 |
| 26.87 – 27.12 | 2153 |
| 27.12 – 27.37 | 3296 |
| 27.37 – 27.62 | 4020 |
| 27.62 – 27.88 | 4062 |
| 27.88 – 28.13 | 4007 |
| 28.13 – 28.38 | 3871 |
| 28.38 – 28.64 | 3643 |
| 28.64 – 28.89 | 3282 |
| 28.89 – 29.14 | 3236 |
| 29.14 – 29.39 | 3215 |
| 29.39 – 29.65 | 3527 |
| 29.65 – 29.9 | 3549 |
| 29.9 – 30.15 | 3239 |
| 30.15 – 30.41 | 2427 |
| 30.41 – 30.66 | 1482 |
| 30.66 – 30.91 | 729 |
| 30.91 – 31.16 | 272 |
| 31.16 – 31.42 | 84 |
| 31.42 – 31.67 | 15 |
| 31.67 – 31.92 | 10 |
| 31.92 – 32.17 | 20 |
| 32.17 – 32.43 | 72 |
| 32.43 – 32.68 | 227 |
| 32.68 – 32.93 | 580 |
| 32.93 – 33.19 | 1045 |
| 33.19 – 33.44 | 1305 |
| 33.44 – 33.69 | 1424 |
| 33.69 – 33.94 | 1167 |
| 33.94 – 34.2 | 762 |
| 34.2 – 34.45 | 343 |
| 34.45 – 34.7 | 166 |
| 34.7 – 34.95 | 34 |
| 34.95 – 35.21 | 5 |
| 35.21 – 35.46 | 6 |
Show data table
| bin | count |
|---|---|
| -96.74 – -96.3 | 49 |
| -96.3 – -95.85 | 343 |
| -95.85 – -95.41 | 1457 |
| -95.41 – -94.97 | 2950 |
| -94.97 – -94.53 | 2653 |
| -94.53 – -94.08 | 1196 |
| -94.08 – -93.64 | 275 |
| -93.64 – -93.2 | 26 |
| -93.2 – -92.76 | 1 |
| -92.76 – -92.31 | 0 |
| -92.31 – -91.87 | 0 |
| -91.87 – -91.43 | 30 |
| -91.43 – -90.98 | 226 |
| -90.98 – -90.54 | 1161 |
| -90.54 – -90.1 | 3141 |
| -90.1 – -89.66 | 3520 |
| -89.66 – -89.21 | 2000 |
| -89.21 – -88.77 | 592 |
| -88.77 – -88.33 | 68 |
| -88.33 – -87.88 | 2 |
| -87.88 – -87.44 | 0 |
| -87.44 – -87 | 0 |
| -87 – -86.56 | 0 |
| -86.56 – -86.11 | 0 |
| -86.11 – -85.67 | 1 |
| -85.67 – -85.23 | 59 |
| -85.23 – -84.79 | 335 |
| -84.79 – -84.34 | 1370 |
| -84.34 – -83.9 | 2386 |
| -83.9 – -83.46 | 2055 |
| -83.46 – -83.01 | 799 |
| -83.01 – -82.57 | 458 |
| -82.57 – -82.13 | 1836 |
| -82.13 – -81.69 | 5610 |
| -81.69 – -81.24 | 9229 |
| -81.24 – -80.8 | 8753 |
| -80.8 – -80.36 | 4831 |
| -80.36 – -79.92 | 1419 |
| -79.92 – -79.47 | 223 |
| -79.47 – -79.03 | 16 |
Schema
4 columns| Alerts | ||||
|---|---|---|---|---|
| lat | numeric | 0.0% | 868 |
outliers
|
| lon | numeric | 0.0% | 1,231 |
|
| month | numeric | 0.0% | 12 |
|
| strikes | numeric | 0.0% | 381 |
high_skew
|
lat
numeric feature outliersThis column is a geographic latitude, with values ranging from 25.35 to 35.46 degrees north — consistent with a mid-latitude region such as the Arabian Peninsula, South Asia, or the southern US Sun Belt. With only 868 unique values across 59,070 rows, latitudes are heavily discretised (likely snapped to a grid or centroid), not continuous GPS readings. The distribution is right-skewed (skew 1.15) with 9.5% of rows flagged as outliers (5,626 records), suggesting a dominant cluster around 27–30° with a long upper tail toward 35.46° that warrants geographic investigation. Treatment: Pair with longitude for spatial joins or clustering; investigate the 5,626 outlier records above ~32° for data-quality or sub-population issues before modelling.
- n
- 59,070
- nulls
- 0 (0.0%)
- unique
- 868
- min
- 25.35
- max
- 35.46
- mean
- 29.23
- median
- 28.84
- std
- 1.899
- q1
- 27.84
- q3
- 29.94
- iqr
- 2.1
- skew
- 1.147
- kurtosis
- 0.7033
- n_outliers
- 5,626
- outlier_rate
- 0.09524
- zero_rate
- 0
lon
numeric featureThis column contains longitude coordinates, with all values falling between -96.74 and -79.03, consistent with the central/eastern United States (roughly spanning from Texas/Oklahoma eastward to the Great Lakes or Southeast). The distribution is moderately left-skewed (skew = -0.85) with mass concentrated toward the eastern end of the range (median -82.06, Q3 -81.22), suggesting most records originate from states like Ohio, Michigan, Florida, or nearby. Only 1,231 unique values across 59,070 rows indicates coordinates are discretized or snapped to a coarse grid rather than true GPS precision — an analyst should be aware this limits spatial resolution. Treatment: Use as-is for spatial joins or clustering; note limited precision (1,231 unique values for 59,070 rows) before any fine-grained geospatial analysis.
- n
- 59,070
- nulls
- 0 (0.0%)
- unique
- 1,231
- min
- -96.74
- max
- -79.03
- mean
- -85.27
- median
- -82.06
- std
- 5.252
- q1
- -89.94
- q3
- -81.22
- iqr
- 8.72
- skew
- -0.8513
- kurtosis
- -0.8532
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
month
numeric featureThis column encodes calendar month as an integer from 1 to 12, with exactly 12 unique values and no nulls across 59,070 rows. Distribution is nearly uniform (mean 6.90, median 7.0, skew −0.15, kurtosis −0.21), suggesting fairly even coverage across all months with a slight lean toward the second half of the year. No outliers or zero values are present, and the near-flat kurtosis rules out strong seasonal concentration. The even spread is mildly surprising if the underlying data were expected to show seasonal patterns. Treatment: Treat as a cyclic ordinal feature; apply sine/cosine encoding (sin(2π·month/12), cos(2π·month/12)) before modelling to capture cyclical continuity.
- n
- 59,070
- nulls
- 0 (0.0%)
- unique
- 12
- min
- 1
- max
- 12
- mean
- 6.899
- median
- 7
- std
- 2.335
- q1
- 5
- q3
- 8
- iqr
- 3
- skew
- -0.155
- kurtosis
- -0.2056
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
strikes
numeric feature high_skewThis column almost certainly represents a count of strikes (e.g., bowling, lightning, labor, or similar discrete events) per record, with values ranging from 1 to 531 across 381 unique integers. The distribution is heavily right-skewed (skew = 2.02, kurtosis = 6.09): the median is only 34 while the mean is 49.43 and the std is 49.83, indicating a long upper tail. Notably, 4.9% of rows (2,917) are flagged as outliers, and the IQR of 54 spans Q1=14 to Q3=68, confirming most records cluster low while a minority have very high counts up to 531. Treatment: Log-transform (log1p) before regression or distance-based modelling to reduce skew; investigate outlier group (n=2,917) for data quality or domain-specific segmentation.
- n
- 59,070
- nulls
- 0 (0.0%)
- unique
- 381
- min
- 1
- max
- 531
- mean
- 49.43
- median
- 34
- std
- 49.83
- q1
- 14
- q3
- 68
- iqr
- 54
- skew
- 2.023
- kurtosis
- 6.094
- n_outliers
- 2,917
- outlier_rate
- 0.04938
- zero_rate
- 0