data trove noaa atmospheric weather alerts
Reading
This dataset contains 571 weather alert and atmospheric event records, combining operational NWS advisory data with a small number of rare/quirky atmospheric phenomena entries. The bulk of the dataset is well-populated NWS alerts — dominated by Small Craft Advisories (149), Winter Weather Advisories (95), and Winter Storm Warnings (60) — with certainty skewed heavily toward 'Likely' (89% of records). A key anomaly worth investigating is that columns like country, event_type, magnitude, source, and state have a ~98.6% null rate, meaning they are only populated for roughly 8 rare-event records, suggesting the dataset is a hybrid merge of two very different sources. Severity is fairly well distributed across Minor, Moderate, and Severe, making it a useful dimension for filtering operational alerts.
citing: event.top_values · event.n_unique · certainty.top_rate · certainty.top_value · severity.top_values · urgency.top_value · urgency.top_rate · country.null_rate · event_type.null_rate · row_count
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Small Craft Advisory | 149 | 26.1% |
| Winter Weather Advisory | 95 | 16.6% |
| Winter Storm Warning | 60 | 10.5% |
| Wind Advisory | 55 | 9.6% |
| High Wind Warning | 21 | 3.7% |
| Red Flag Warning | 20 | 3.5% |
| Gale Warning | 20 | 3.5% |
| Brisk Wind Advisory | 19 | 3.3% |
| Dense Fog Advisory | 16 | 2.8% |
| Cold Weather Advisory | 15 | 2.6% |
| Fire Weather Watch | 12 | 2.1% |
| Heavy Freezing Spray Warning | 10 | 1.8% |
| Winter Storm Watch | 10 | 1.8% |
| Avalanche Warning | 8 | 1.4% |
| High Surf Advisory | 6 | 1.1% |
| Air Quality Alert | 6 | 1.1% |
| Special Weather Statement | 5 | 0.9% |
| Blizzard Warning | 5 | 0.9% |
| Rip Current Statement | 5 | 0.9% |
| Extreme Cold Warning | 3 | 0.5% |
Show data table
| value | count | share |
|---|---|---|
| Moderate | 208 | 36.4% |
| Minor | 187 | 32.7% |
| Severe | 144 | 25.2% |
| Unknown | 19 | 3.3% |
| Extreme | 5 | 0.9% |
Show data table
| value | count | share |
|---|---|---|
| Likely | 503 | 88.1% |
| Observed | 24 | 4.2% |
| Unknown | 18 | 3.2% |
| Possible | 18 | 3.2% |
Show data table
| value | count | share |
|---|---|---|
| Expected | 510 | 89.3% |
| Unknown | 18 | 3.2% |
| Future | 18 | 3.2% |
| Past | 11 | 1.9% |
| Immediate | 6 | 1.1% |
Show data table
| value | count | share |
|---|---|---|
| Glacier Bay | 9 | 1.6% |
| Northern Lynn Canal | 7 | 1.2% |
| Stephens Passage | 5 | 0.9% |
| Frederick Sound | 5 | 0.9% |
| Cape Decision to Cape Edgecumbe from 15 to 80 NM | 4 | 0.7% |
| Dixon Entrance to Cape Decision from 15 to 90 NM | 4 | 0.7% |
| Icy Strait | 4 | 0.7% |
| Northern Chatham Strait | 4 | 0.7% |
| Greater Lake Tahoe Area; Greater Lake Tahoe Area | 4 | 0.7% |
| Sacramento Mountains Above 7500 Feet; East Slopes Sacramento Mountains Below 7500 Feet | 3 | 0.5% |
| Rock Island Passage to Sturgeon Bay WI | 3 | 0.5% |
| Southern Lynn Canal | 3 | 0.5% |
| Cross Sound | 3 | 0.5% |
| Nikolski to Seguam Pacific Side from 15 to 75 NM | 3 | 0.5% |
| Kuskokwim Delta from 15 to 80 NM | 3 | 0.5% |
| Castle Cape to Cape Tolstoi from 15 to 100 NM | 3 | 0.5% |
| Seguam to Adak Pacific Side from 15 to 75 NM | 3 | 0.5% |
| Seguam to Adak Bering Side from 15 to 85 NM | 3 | 0.5% |
| Nikolski to Seguam Bering Side out to 15 NM | 3 | 0.5% |
| Pribilof Islands Nearshore Waters | 3 | 0.5% |
Schema
19 columns| Alerts | ||||
|---|---|---|---|---|
| event | categorical | 1.4% | 35 |
|
| headline | categorical | 1.6% | 305 |
long_tail
|
| description | categorical | 0.0% | 527 |
long_tail
|
| severity | categorical | 1.4% | 5 |
|
| certainty | categorical | 1.4% | 4 |
|
| urgency | categorical | 1.4% | 5 |
|
| areaDesc | categorical | 1.4% | 441 |
long_tail
|
| sent | unknown | 0.0% | — |
skipped
|
| effective | unknown | 0.0% | — |
skipped
|
| onset | unknown | 0.0% | — |
skipped
|
| expires | unknown | 0.0% | — |
skipped
|
| latitude | numeric | 97.2% | 16 |
null_rate
outliers
|
| longitude | numeric | 97.2% | 16 |
null_rate
outliers
|
| event_type | categorical | 98.6% | 8 |
long_tail
null_rate
|
| date | unknown | 0.0% | — |
skipped
|
| state | categorical | 98.6% | 6 |
long_tail
null_rate
|
| country | categorical | 98.6% | 4 |
long_tail
null_rate
|
| magnitude | categorical | 98.6% | 5 |
long_tail
null_rate
|
| source | categorical | 98.6% | 8 |
long_tail
null_rate
|
event
categorical labelThis column contains National Weather Service alert/advisory event type names, serving as a categorical label for meteorological warning events. With 35 unique values across 571 rows, it covers a meaningful range of weather phenomena. 'Small Craft Advisory' dominates at 26.5% (149 occurrences), while the top 4 event types together account for the majority of records — suggesting a skewed distribution toward marine and winter/wind events. Entropy ratio of 0.72 indicates moderate-to-high diversity, but the heavy concentration in a few categories is worth noting for class-imbalance handling in any classification task. Treatment: One-hot encode or target-encode for modelling; be aware of class imbalance with 'Small Craft Advisory' at 26.5% and many tail categories.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 35
- top_value
- Small Craft Advisory
- top_rate
- 0.2647
- cardinality
- 35
- entropy
- 3.699
- entropy_ratio
- 0.7212
headline
categorical free_text long_tailThis column contains NWS (National Weather Service) alert headlines — structured text strings describing weather advisory type, issuance timestamp, expiry, and issuing office. Despite appearing categorical, the entropy ratio of 0.943 and 305 unique values out of 571 records signal near free-text behaviour, with a long-tail alert confirming most headlines appear only once or a handful of times. The most frequent value ('Small Craft Advisory issued February 17…') appears only 19 times (top_rate ≈ 3.4%), indicating very little repetition across the dataset. Treatment: Parse structured subfields (alert type, issue time, expiry time, NWS office) via regex before modelling; do not use raw string as a categorical feature.
- n
- 571
- nulls
- 9 (1.6%)
- unique
- 305
- top_value
- Small Craft Advisory issued February 17 at 4:12AM AKST until February 18 at 5:00PM AKST by NWS Anchorage AK
- top_rate
- 0.03381
- cardinality
- 305
- entropy
- 7.782
- entropy_ratio
- 0.943
description
categorical free_text long_tailThis column contains full-text NWS (National Weather Service) alert and forecast descriptions — multi-line, structured prose covering marine, fire weather, wind, and winter storm warnings across various US regions. With 527 unique values out of 571 rows and an entropy ratio of 0.9945, nearly every entry is distinct, making this essentially free text. The long-tail alert and the presence of duplicate entries (e.g., the same Southeast Alaska marine forecast appearing 4 times) suggest periodic reissue of templated advisories rather than purely unique records, which may indicate time-series duplication worth investigating. Treatment: Tokenize and embed (e.g., TF-IDF or sentence transformer) before modelling; consider deduplicating or grouping by alert template for frequency analysis.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- 527
- top_value
- Southeast Alaska Inside Waters from Dixon Entrance to Skagway Wind forecasts reflect the predominant speed and direction expected. Sea forecasts represent the average of the highest one-third of the combined windwave and swell height. .TONIGHT...N wind 25 kt. Seas 5 ft. Heavy freezing spray. .WED...N wind 15 kt diminishing to 10 kt in the morning. Seas 3 ft in the morning then 2 ft or less. Light freezing spray in the early morning. .WED NIGHT...N wind 10 kt. Seas 2 ft or less. .THU...N wind 10 kt. Seas 2 ft or less. Light freezing spray. .THU NIGHT...N wind 15 kt. Seas 2 ft or less. Snow. .FRI...N gale to 35 kt. Seas 5 ft. .SAT...N gale to 45 kt. Seas 2 ft or less. .SUN...N gale to 35 kt. Seas 2 ft or less.
- top_rate
- 0.007005
- cardinality
- 527
- entropy
- 8.992
- entropy_ratio
- 0.9945
severity
categorical labelThis column is an ordinal severity classification with 5 levels — Moderate, Minor, Severe, Unknown, and Extreme — likely describing the intensity of incidents, events, or conditions. 'Moderate' dominates at 36.9% of records (208/571), and 'Extreme' is strikingly rare at only 5 occurrences, suggesting a heavily right-skewed real-world distribution where catastrophic events are uncommon. The 19 'Unknown' values represent a data-quality concern distinct from the 1.4% null rate, effectively adding a second form of missingness. Entropy ratio of 0.77 indicates a reasonably spread distribution, though the extreme imbalance at the tail warrants attention for any classification task. Treatment: Encode as ordinal (Minor < Moderate < Severe < Extreme); treat 'Unknown' as a separate missing indicator before modelling.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 5
- top_value
- Moderate
- top_rate
- 0.3694
- cardinality
- 5
- entropy
- 1.788
- entropy_ratio
- 0.7698
certainty
categorical labelThis column encodes an analyst-assigned confidence level for some observation or classification, with four ordinal categories: Likely, Observed, Unknown, and Possible. The distribution is severely skewed: 'Likely' dominates at 89.3% of all records (503/571), while 'Observed', 'Unknown', and 'Possible' each account for only 18–24 records. The low entropy ratio of 0.328 confirms near-constant behaviour, and the 1.4% null rate is minor. The near-total dominance of a single category limits this column's discriminative power as a feature. Treatment: Ordinal-encode with awareness of severe class imbalance; consider collapsing minority categories or using as a stratification variable rather than a predictive feature.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 4
- top_value
- Likely
- top_rate
- 0.8934
- cardinality
- 4
- entropy
- 0.6569
- entropy_ratio
- 0.3285
urgency
categorical featureThis column is a categorical urgency classification, likely from an incident or request management system, with 5 distinct levels. It is severely dominated by 'Expected' (510 out of 571 rows, 90.6%), leaving the remaining 4 categories — 'Unknown', 'Future', 'Past', and 'Immediate' — collectively accounting for fewer than 10% of records. The very low entropy ratio of 0.27 confirms extreme class imbalance, which will limit this column's discriminative power in most models. The 'Immediate' category, presumably the most critical, appears only 6 times, making it near-invisible to any classifier trained on this distribution. Treatment: One-hot encode but flag severe class imbalance; consider oversampling minority classes (especially 'Immediate' with n=6) or collapsing into binary 'Expected vs. Other' before modelling.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 5
- top_value
- Expected
- top_rate
- 0.9059
- cardinality
- 5
- entropy
- 0.6276
- entropy_ratio
- 0.2703
areaDesc
categorical label long_tailThis column contains geographic area descriptions used in weather or emergency alerts, predominantly covering coastal and inland waterways of Southeast Alaska (e.g., 'Glacier Bay', 'Stephens Passage', 'Northern Lynn Canal') with some continental US zones also present (e.g., 'Greater Lake Tahoe Area', 'Sacramento Mountains'). With 441 unique values across only 571 rows and an entropy ratio of 0.98, cardinality is extremely high — nearly every row is a distinct area. The long-tail alert confirms that most areas appear only once or twice, with even the top value ('Glacier Bay') appearing just 9 times (1.6% of rows). The multi-zone concatenated entries (e.g., 'Greater Lake Tahoe Area; Greater Lake Tahoe Area') suggest some records bundle multiple zones into a single string, which may cause deduplication or parsing issues. Treatment: Parse semicolon-delimited multi-zone entries into separate records, then use as a grouping/filter dimension rather than a model feature; too high-cardinality for direct encoding without aggregation.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 441
- top_value
- Glacier Bay
- top_rate
- 0.01599
- cardinality
- 441
- entropy
- 8.608
- entropy_ratio
- 0.9799
sent
unknown other skippedThe column 'sent' has 571 non-null rows but was skipped by the profiler, leaving its type and value distribution entirely unknown. No stats, uniqueness counts, or distribution metrics are available. The name suggests a boolean flag (e.g., message/email sent status) or a timestamp, but this cannot be confirmed from the evidence. An analyst should inspect raw values directly before any downstream use. Treatment: Inspect raw values to determine dtype, then re-profile before any modelling or filtering.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
effective
unknown other skippedThe column 'effective' contains 571 non-null values but was skipped by the profiler, leaving its type and distribution entirely unknown. No stats, uniqueness count, or value samples are available, so its semantic role cannot be determined from this evidence alone. The name suggests a boolean flag (e.g., is-effective) or a date (effective date/start), but this is speculation beyond the evidence. Treatment: Manually inspect raw values to determine type, then re-profile before any modelling use.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
onset
unknown timestamp skippedThe column 'onset' likely records the timing or start of a clinical or epidemiological event (e.g., symptom onset date), but the profiler emitted a 'skipped' alert and returned no stats, leaving its true type and distribution entirely uncharacterised. With 571 non-null rows and zero null rate, data is present, but nothing about format, uniqueness, or value range can be confirmed from this evidence alone. An analyst should inspect raw values to determine whether it is a date string, numeric duration, or categorical stage before any downstream use. Treatment: Inspect raw values to confirm type (date vs. numeric vs. categorical), then parse or encode accordingly before modelling.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
expires
unknown timestamp skippedThe column 'expires' likely represents an expiration date or timestamp field, but the profiler skipped analysis entirely, yielding no stats, no uniqueness count, and a kind of 'unknown'. With 571 non-null rows and zero null rate, data is present but its structure or encoding prevented saturn from classifying it. No further distributional signals are available from the evidence. Treatment: Inspect raw values to confirm encoding (ISO string, epoch int, or other format), parse to datetime, then use as a feature or filter boundary.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
latitude
numeric null_rate outliers- n
- 571
- nulls
- 555 (97.2%)
- unique
- 16
- min
- 25.76
- max
- 69.65
- mean
- 40.46
- median
- 35.45
- std
- 11.58
- q1
- 34.96
- q3
- 41
- iqr
- 6.043
- skew
- 1.508
- kurtosis
- 1.297
- n_outliers
- 4
- outlier_rate
- 0.25
- zero_rate
- 0
longitude
numeric null_rate outliers- n
- 571
- nulls
- 555 (97.2%)
- unique
- 16
- min
- -120.6
- max
- 18.96
- mean
- -84.42
- median
- -95.31
- std
- 45.23
- q1
- -120.2
- q3
- -78.65
- iqr
- 41.52
- skew
- 1.283
- kurtosis
- 0.3248
- n_outliers
- 2
- outlier_rate
- 0.125
- zero_rate
- 0
event_type
categorical long_tail null_rate- n
- 571
- nulls
- 563 (98.6%)
- unique
- 8
- top_value
- Ball Lightning
- top_rate
- 0.125
- cardinality
- 8
- entropy
- 3
- entropy_ratio
- 1
date
unknown timestamp skippedThis column is named 'date' and contains 571 non-null values with a 0.0% null rate, suggesting it is a timestamp or date field. However, saturn skipped profiling it (kind: 'unknown', no stats, no uniqueness count), so no distribution, range, or format details are available. The absence of any computed statistics prevents assessment of cardinality, temporal range, or potential drift. Treat with caution until the parsing issue causing the skip is resolved. Treatment: Investigate why saturn skipped this column, parse to a proper datetime type, then extract temporal features or use as an index.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
state
categorical long_tail null_rate- n
- 571
- nulls
- 563 (98.6%)
- unique
- 6
- top_value
- International
- top_rate
- 0.375
- cardinality
- 6
- entropy
- 2.406
- entropy_ratio
- 0.9306
country
categorical long_tail null_rate- n
- 571
- nulls
- 563 (98.6%)
- unique
- 4
- top_value
- USA
- top_rate
- 0.625
- cardinality
- 4
- entropy
- 1.549
- entropy_ratio
- 0.7744
magnitude
categorical long_tail null_rate- n
- 571
- nulls
- 563 (98.6%)
- unique
- 5
- top_value
- N/A
- top_rate
- 0.5
- cardinality
- 5
- entropy
- 2
- entropy_ratio
- 0.8614
source
categorical long_tail null_rate- n
- 571
- nulls
- 563 (98.6%)
- unique
- 8
- top_value
- Journal of Geophysical Research
- top_rate
- 0.125
- cardinality
- 8
- entropy
- 3
- entropy_ratio
- 1