quirky atmospheric real
Reading
This dataset contains 571 weather alert records with 19 columns mixing NWS-style alert metadata (event, severity, urgency, certainty, areaDesc, headline) with sparse atmospheric event annotations (country, event_type, magnitude, source, state). The alert fields are well-populated and dominated by 'Small Craft Advisory' (149 of 571) and 'Winter Weather Advisory' (95), while certainty is overwhelmingly 'Likely' (89.3%) and urgency is 'Expected' (90.6%), suggesting limited variation in those risk dimensions. Severity is the most balanced operational field, split across Moderate (208), Minor (187), and Severe (144). Note that the curiosity-style columns (country, event_type, magnitude, source, state) are ~98.6% null and only describe a handful of rows, so treat them as a separate mini-dataset rather than primary signal.
citing: row_count · column_count · columns.event · columns.severity · columns.urgency · columns.certainty · columns.country · columns.event_type · columns.magnitude
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Small Craft Advisory | 149 | 26.1% |
| Winter Weather Advisory | 95 | 16.6% |
| Winter Storm Warning | 60 | 10.5% |
| Wind Advisory | 55 | 9.6% |
| High Wind Warning | 21 | 3.7% |
| Red Flag Warning | 20 | 3.5% |
| Gale Warning | 20 | 3.5% |
| Brisk Wind Advisory | 19 | 3.3% |
| Dense Fog Advisory | 16 | 2.8% |
| Cold Weather Advisory | 15 | 2.6% |
| Fire Weather Watch | 12 | 2.1% |
| Heavy Freezing Spray Warning | 10 | 1.8% |
| Winter Storm Watch | 10 | 1.8% |
| Avalanche Warning | 8 | 1.4% |
| High Surf Advisory | 6 | 1.1% |
| Air Quality Alert | 6 | 1.1% |
| Special Weather Statement | 5 | 0.9% |
| Blizzard Warning | 5 | 0.9% |
| Rip Current Statement | 5 | 0.9% |
| Extreme Cold Warning | 3 | 0.5% |
Show data table
| value | count | share |
|---|---|---|
| Moderate | 208 | 36.4% |
| Minor | 187 | 32.7% |
| Severe | 144 | 25.2% |
| Unknown | 19 | 3.3% |
| Extreme | 5 | 0.9% |
Show data table
| value | count | share |
|---|---|---|
| Likely | 503 | 88.1% |
| Observed | 24 | 4.2% |
| Unknown | 18 | 3.2% |
| Possible | 18 | 3.2% |
Show data table
| value | count | share |
|---|---|---|
| Expected | 510 | 89.3% |
| Unknown | 18 | 3.2% |
| Future | 18 | 3.2% |
| Past | 11 | 1.9% |
| Immediate | 6 | 1.1% |
Show data table
| bin | count |
|---|---|
| 25.76 – 34.54 | 3 |
| 34.54 – 43.32 | 10 |
| 43.32 – 52.09 | 1 |
| 52.09 – 60.87 | 0 |
| 60.87 – 69.65 | 2 |
Schema
19 columns| Alerts | ||||
|---|---|---|---|---|
| event | categorical | 1.4% | 35 |
|
| headline | categorical | 1.6% | 305 |
long_tail
|
| description | categorical | 0.0% | 527 |
long_tail
|
| severity | categorical | 1.4% | 5 |
|
| certainty | categorical | 1.4% | 4 |
|
| urgency | categorical | 1.4% | 5 |
|
| areaDesc | categorical | 1.4% | 441 |
long_tail
|
| sent | unknown | 0.0% | — |
skipped
|
| effective | unknown | 0.0% | — |
skipped
|
| onset | unknown | 0.0% | — |
skipped
|
| expires | unknown | 0.0% | — |
skipped
|
| latitude | numeric | 97.2% | 16 |
null_rate
outliers
|
| longitude | numeric | 97.2% | 16 |
null_rate
outliers
|
| event_type | categorical | 98.6% | 8 |
long_tail
null_rate
|
| date | unknown | 0.0% | — |
skipped
|
| state | categorical | 98.6% | 6 |
long_tail
null_rate
|
| country | categorical | 98.6% | 4 |
long_tail
null_rate
|
| magnitude | categorical | 98.6% | 5 |
long_tail
null_rate
|
| source | categorical | 98.6% | 8 |
long_tail
null_rate
|
event
categorical labelThis column captures NWS-style weather alert types, with 35 distinct events across 571 rows and a 1.4% null rate. 'Small Craft Advisory' dominates at 26.5% (149 occurrences), followed by 'Winter Weather Advisory' (95) and 'Winter Storm Warning' (60), suggesting a marine/winter-weighted sample. Entropy ratio of 0.72 indicates moderate spread but a clear long tail of rarer event types. Treatment: Use as a categorical label; consider grouping rare events into an 'Other' bucket before modelling.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 35
- top_value
- Small Craft Advisory
- top_rate
- 0.2647
- cardinality
- 35
- entropy
- 3.699
- entropy_ratio
- 0.7212
headline
categorical free_text long_tailThis column holds NWS-style alert headlines that pack advisory type, issuing/expiry timestamps, and originating forecast office into a single string (e.g. 'Small Craft Advisory issued February 17 at 4:12AM AKST until February 18 at 5:00PM AKST by NWS Anchorage AK'). Cardinality is high at 305 unique values across 571 rows with entropy ratio 0.943, yet the top headline still repeats 19 times because identical advisories cover multiple zones. Small Craft Advisories out of NWS Anchorage AK dominate the top of the distribution, and a long_tail alert is flagged. Treatment: Parse into structured fields (event type, issue/expiry timestamps, office) rather than using the raw string.
- n
- 571
- nulls
- 9 (1.6%)
- unique
- 305
- top_value
- Small Craft Advisory issued February 17 at 4:12AM AKST until February 18 at 5:00PM AKST by NWS Anchorage AK
- top_rate
- 0.03381
- cardinality
- 305
- entropy
- 7.782
- entropy_ratio
- 0.943
description
categorical free_text long_tailThis column holds full-text NWS weather advisories — marine forecasts, gale/small craft warnings, fire weather watches, and high wind warnings — with structured sections like .TONIGHT/.WED, WHAT/WHERE/WHEN/IMPACTS embedded in the prose. It is near-unique (527 distinct of 571, entropy ratio 0.9945) with a long_tail alert, and the most common string repeats only 4 times (top_rate 0.0070). Treating these as categorical levels would be useless; they are documents, and several appear to be recurring boilerplate templates with swapped wind/sea numbers. Treatment: Parse the structured fields (WHAT/WHERE/WHEN, wind/seas) with regex or tokenize and embed; do not one-hot.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- 527
- top_value
- Southeast Alaska Inside Waters from Dixon Entrance to Skagway Wind forecasts reflect the predominant speed and direction expected. Sea forecasts represent the average of the highest one-third of the combined windwave and swell height. .TONIGHT...N wind 25 kt. Seas 5 ft. Heavy freezing spray. .WED...N wind 15 kt diminishing to 10 kt in the morning. Seas 3 ft in the morning then 2 ft or less. Light freezing spray in the early morning. .WED NIGHT...N wind 10 kt. Seas 2 ft or less. .THU...N wind 10 kt. Seas 2 ft or less. Light freezing spray. .THU NIGHT...N wind 15 kt. Seas 2 ft or less. Snow. .FRI...N gale to 35 kt. Seas 5 ft. .SAT...N gale to 45 kt. Seas 2 ft or less. .SUN...N gale to 35 kt. Seas 2 ft or less.
- top_rate
- 0.007005
- cardinality
- 527
- entropy
- 8.992
- entropy_ratio
- 0.9945
severity
categorical labelA 5-level severity classification, dominated by Moderate (208) and Minor (187), with Severe (144) close behind and Extreme appearing only 5 times. The Unknown bucket (19) plus a 1.4% null rate means roughly 4% of rows lack a clean severity signal. Entropy ratio of 0.77 shows the distribution is reasonably spread rather than collapsed onto one class. Treatment: Treat as ordinal (Minor
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 5
- top_value
- Moderate
- top_rate
- 0.3694
- cardinality
- 5
- entropy
- 1.788
- entropy_ratio
- 0.7698
certainty
categorical labelA 4-level categorical certainty/confidence label, almost certainly attached to event or prediction records. The distribution is extremely concentrated: 'Likely' covers 89.3% of the 571 rows, while 'Observed', 'Unknown', and 'Possible' together account for only 60 rows. Null rate is negligible (1.4%) and entropy ratio is just 0.33, so this column carries little discriminative signal on its own. Treatment: One-hot encode but expect low signal; consider collapsing rare levels or dropping due to severe class imbalance.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 4
- top_value
- Likely
- top_rate
- 0.8934
- cardinality
- 4
- entropy
- 0.6569
- entropy_ratio
- 0.3285
urgency
categorical featureThis is a small-cardinality categorical flag describing the timing/urgency of an event, with 5 levels. The distribution is severely imbalanced: 'Expected' covers 90.6% of 571 rows, leaving 'Unknown', 'Future', 'Past', and 'Immediate' as rare tails (6-18 rows each). Entropy ratio of 0.27 confirms the column carries little information, and 1.4% of values are null. Treatment: Collapse rare levels into an 'Other' bucket or binarize as Expected vs not before modelling.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 5
- top_value
- Expected
- top_rate
- 0.9059
- cardinality
- 5
- entropy
- 0.6276
- entropy_ratio
- 0.2703
areaDesc
categorical metadata long_tailFree-text geographic descriptions for weather alerts, with 441 unique values across 571 rows (entropy ratio 0.98) and the most common value 'Glacier Bay' appearing only 9 times (1.6%). Many entries concatenate multiple zones with semicolons (e.g., 'Greater Lake Tahoe Area; Greater Lake Tahoe Area' which even repeats itself), so the field mixes single-region and compound-region strings. Null rate is low at 1.4%, but the long tail makes this unsuitable as a categorical feature without normalization. Treatment: Split on ';' and normalize to canonical zone names before any grouping or joining.
- n
- 571
- nulls
- 8 (1.4%)
- unique
- 441
- top_value
- Glacier Bay
- top_rate
- 0.01599
- cardinality
- 441
- entropy
- 8.608
- entropy_ratio
- 0.9799
sent
unknown other skippedThe column 'sent' was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. The only signals are 571 rows with a 0.0 null rate, suggesting the field is fully populated but otherwise opaque. Without type or cardinality information, no further characterisation is possible. Treatment: Re-profile with type inference enabled before deciding on downstream handling.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
effective
unknown other skippedThe column 'effective' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. We only know it has 571 rows and zero nulls; uniqueness, type, and value distribution are all unreported. Treatment: Re-profile or inspect manually to determine type before any downstream use.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
onset
unknown other skippedThe column is named "onset" with 571 non-null entries and a null rate of 0.0, but saturn skipped profiling so its kind is unknown and no distributional stats were emitted. The name suggests an onset time or date (e.g., symptom or event onset), yet without dtype, uniqueness, or value statistics this is unverified. No surprises can be flagged because the evidence payload is empty beyond the row count. Treatment: Re-profile with an explicit type hint (likely datetime) before deciding on use.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
expires
unknown other skippedThe column is named "expires" and contains 571 non-null entries, but saturn skipped profiling so its type and value distribution are unknown. No uniqueness, range, or format statistics are available to confirm whether it holds dates, durations, or flags. The name suggests a timestamp or expiry indicator, but this is unverified by the evidence. Treatment: Re-profile with an explicit parser to determine whether this is a timestamp before using it downstream.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
latitude
numeric feature null_rate outliersThis is a geographic latitude column, with values spanning 25.76 to 69.65 and a median of 35.45 — consistent with northern-hemisphere coordinates. The column is almost entirely empty (97.2% null) with only 16 unique values across 571 rows, and 4 of the 16 populated values (25%) are flagged as outliers, with a right skew of 1.51 pulling the mean up to 40.46. Treatment: Drop or treat as sparse metadata; too few non-null values to model directly.
- n
- 571
- nulls
- 555 (97.2%)
- unique
- 16
- min
- 25.76
- max
- 69.65
- mean
- 40.46
- median
- 35.45
- std
- 11.58
- q1
- 34.96
- q3
- 41
- iqr
- 6.043
- skew
- 1.508
- kurtosis
- 1.297
- n_outliers
- 4
- outlier_rate
- 0.25
- zero_rate
- 0
longitude
numeric feature null_rate outliersGeographic longitude coordinates, but only 16 of 571 rows carry a value (null rate 0.972), making this column nearly empty. The populated values span -120.609 to 18.955 with median -95.307, suggesting a mix of North American and possibly European points, and 2 outliers (12.5% of non-nulls) plus a skew of 1.28 hint at a few eastern-hemisphere entries pulling the distribution right. Treatment: Drop or impute given 97.2% nulls; only usable if paired with latitude on the few populated rows.
- n
- 571
- nulls
- 555 (97.2%)
- unique
- 16
- min
- -120.6
- max
- 18.96
- mean
- -84.42
- median
- -95.31
- std
- 45.23
- q1
- -120.2
- q3
- -78.65
- iqr
- 41.52
- skew
- 1.283
- kurtosis
- 0.3248
- n_outliers
- 2
- outlier_rate
- 0.125
- zero_rate
- 0
event_type
categorical label long_tail null_rateThis is a categorical event_type label naming rare atmospheric or geological phenomena (e.g., Ball Lightning, Waterspout, Volcanic Lightning). It is almost entirely empty: null_rate is 0.986 across n=571, leaving only 8 populated rows, each a distinct value with frequency 1, so entropy_ratio is 1.0. The column is effectively a sparse free-list of unique tags rather than a usable category. Treatment: Drop or retain only as a sparse annotation; too few non-null rows to model.
- n
- 571
- nulls
- 563 (98.6%)
- unique
- 8
- top_value
- Ball Lightning
- top_rate
- 0.125
- cardinality
- 8
- entropy
- 3
- entropy_ratio
- 1
date
unknown timestamp skippedThis column is named "date" and contains 571 non-null values, but saturn skipped detailed profiling so no type, uniqueness, or distribution stats are available. Without parsed values it is impossible to confirm whether entries are timestamps, formatted strings, or something else. The only firm signals are the full population (null_rate 0.0) and the row count of 571. Treatment: Parse to a proper datetime type and re-profile before use.
- n
- 571
- nulls
- 0 (0.0%)
- unique
- —
state
categorical metadata long_tail null_rateGeographic state field mixing US state abbreviations (FL, OK, CA, IA, NY) with an 'International' bucket. The column is essentially empty — 98.6% null with only 8 non-null values across 571 rows, and 'International' accounts for 3 of those 8. With just 6 unique values and near-uniform distribution among the populated rows (entropy ratio 0.93), there is too little signal here to support analysis. Treatment: Drop or collapse to a binary 'is_international' flag; null rate of 0.986 makes it unusable as-is.
- n
- 571
- nulls
- 563 (98.6%)
- unique
- 6
- top_value
- International
- top_rate
- 0.375
- cardinality
- 6
- entropy
- 2.406
- entropy_ratio
- 0.9306
country
categorical metadata long_tail null_rateCountry-of-origin field, but it is effectively empty: 98.6% of the 571 rows are null and only 8 records carry a value across 4 distinct countries (USA dominates at 5 of 8, or 62.5%). With such sparse coverage, the entropy ratio of 0.77 and the long tail are statistically meaningless. This column cannot support any country-level analysis as-is. Treatment: Drop or set aside; null rate is too high to use as a feature.
- n
- 571
- nulls
- 563 (98.6%)
- unique
- 4
- top_value
- USA
- top_rate
- 0.625
- cardinality
- 4
- entropy
- 1.549
- entropy_ratio
- 0.7744
magnitude
categorical feature long_tail null_rateCategorical magnitude/intensity field, almost entirely empty: 98.6% null across 571 rows, leaving only 8 populated entries spread across 5 distinct values. The non-null content is also inconsistent, mixing placeholders ("N/A", "Unknown") with structured scales ("F0-F1", "EF-3 equivalent") and free text ("140 mph winds"), so even the present data is not directly comparable. Treatment: Drop or set aside; too sparse and inconsistently coded to model without manual normalisation.
- n
- 571
- nulls
- 563 (98.6%)
- unique
- 5
- top_value
- N/A
- top_rate
- 0.5
- cardinality
- 5
- entropy
- 2
- entropy_ratio
- 0.8614
source
categorical metadata long_tail null_rateThis appears to be a citation/provenance field naming the agency or publication that supplied each record (e.g., NOAA offices, meteorological institutes, journals). It is almost entirely empty: 98.6% null, with only 8 non-null values across 571 rows, and every observed source occurs exactly once (top_rate 0.125, entropy_ratio 1.0). With no repetition, the column carries no categorical signal in its current state. Treatment: Drop or retain as provenance metadata only; too sparse and unique for modelling.
- n
- 571
- nulls
- 563 (98.6%)
- unique
- 8
- top_value
- Journal of Geophysical Research
- top_rate
- 0.125
- cardinality
- 8
- entropy
- 3
- entropy_ratio
- 1