saturn·

quirky atmospheric real

source /home/coolhand/html/datavis/data_trove/data/quirky/atmospheric_real.json 571 rows 19 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 571 weather alert records with 19 columns mixing NWS-style alert metadata (event, severity, urgency, certainty, areaDesc, headline) with sparse atmospheric event annotations (country, event_type, magnitude, source, state). The alert fields are well-populated and dominated by 'Small Craft Advisory' (149 of 571) and 'Winter Weather Advisory' (95), while certainty is overwhelmingly 'Likely' (89.3%) and urgency is 'Expected' (90.6%), suggesting limited variation in those risk dimensions. Severity is the most balanced operational field, split across Moderate (208), Minor (187), and Severe (144). Note that the curiosity-style columns (country, event_type, magnitude, source, state) are ~98.6% null and only describe a handful of rows, so treat them as a separate mini-dataset rather than primary signal.

citing: row_count · column_count · columns.event · columns.severity · columns.urgency · columns.certainty · columns.country · columns.event_type · columns.magnitude

Schema

19 columns
Per-column summary. Click column name to jump to its detail.
Alerts
event categorical 1.4% 35
headline categorical 1.6% 305
long_tail
description categorical 0.0% 527
long_tail
severity categorical 1.4% 5
certainty categorical 1.4% 4
urgency categorical 1.4% 5
areaDesc categorical 1.4% 441
long_tail
sent unknown 0.0%
skipped
effective unknown 0.0%
skipped
onset unknown 0.0%
skipped
expires unknown 0.0%
skipped
latitude numeric 97.2% 16
null_rate outliers
longitude numeric 97.2% 16
null_rate outliers
event_type categorical 98.6% 8
long_tail null_rate
date unknown 0.0%
skipped
state categorical 98.6% 6
long_tail null_rate
country categorical 98.6% 4
long_tail null_rate
magnitude categorical 98.6% 5
long_tail null_rate
source categorical 98.6% 8
long_tail null_rate

event

categorical label
This column captures NWS-style weather alert types, with 35 distinct events across 571 rows and a 1.4% null rate. 'Small Craft Advisory' dominates at 26.5% (149 occurrences), followed by 'Winter Weather Advisory' (95) and 'Winter Storm Warning' (60), suggesting a marine/winter-weighted sample. Entropy ratio of 0.72 indicates moderate spread but a clear long tail of rarer event types. Treatment: Use as a categorical label; consider grouping rare events into an 'Other' bucket before modelling. high · anthropic:claude-opus-4-7
n
571
nulls
8 (1.4%)
unique
35
top_value
Small Craft Advisory
top_rate
0.2647
cardinality
35
entropy
3.699
entropy_ratio
0.7212

headline

categorical free_text long_tail
This column holds NWS-style alert headlines that pack advisory type, issuing/expiry timestamps, and originating forecast office into a single string (e.g. 'Small Craft Advisory issued February 17 at 4:12AM AKST until February 18 at 5:00PM AKST by NWS Anchorage AK'). Cardinality is high at 305 unique values across 571 rows with entropy ratio 0.943, yet the top headline still repeats 19 times because identical advisories cover multiple zones. Small Craft Advisories out of NWS Anchorage AK dominate the top of the distribution, and a long_tail alert is flagged. Treatment: Parse into structured fields (event type, issue/expiry timestamps, office) rather than using the raw string. high · anthropic:claude-opus-4-7
n
571
nulls
9 (1.6%)
unique
305
top_value
Small Craft Advisory issued February 17 at 4:12AM AKST until February 18 at 5:00PM AKST by NWS Anchorage AK
top_rate
0.03381
cardinality
305
entropy
7.782
entropy_ratio
0.943

description

categorical free_text long_tail
This column holds full-text NWS weather advisories — marine forecasts, gale/small craft warnings, fire weather watches, and high wind warnings — with structured sections like .TONIGHT/.WED, WHAT/WHERE/WHEN/IMPACTS embedded in the prose. It is near-unique (527 distinct of 571, entropy ratio 0.9945) with a long_tail alert, and the most common string repeats only 4 times (top_rate 0.0070). Treating these as categorical levels would be useless; they are documents, and several appear to be recurring boilerplate templates with swapped wind/sea numbers. Treatment: Parse the structured fields (WHAT/WHERE/WHEN, wind/seas) with regex or tokenize and embed; do not one-hot. high · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique
527
top_value
Southeast Alaska Inside Waters from Dixon Entrance to Skagway Wind forecasts reflect the predominant speed and direction expected. Sea forecasts represent the average of the highest one-third of the combined windwave and swell height. .TONIGHT...N wind 25 kt. Seas 5 ft. Heavy freezing spray. .WED...N wind 15 kt diminishing to 10 kt in the morning. Seas 3 ft in the morning then 2 ft or less. Light freezing spray in the early morning. .WED NIGHT...N wind 10 kt. Seas 2 ft or less. .THU...N wind 10 kt. Seas 2 ft or less. Light freezing spray. .THU NIGHT...N wind 15 kt. Seas 2 ft or less. Snow. .FRI...N gale to 35 kt. Seas 5 ft. .SAT...N gale to 45 kt. Seas 2 ft or less. .SUN...N gale to 35 kt. Seas 2 ft or less.
top_rate
0.007005
cardinality
527
entropy
8.992
entropy_ratio
0.9945

severity

categorical label
A 5-level severity classification, dominated by Moderate (208) and Minor (187), with Severe (144) close behind and Extreme appearing only 5 times. The Unknown bucket (19) plus a 1.4% null rate means roughly 4% of rows lack a clean severity signal. Entropy ratio of 0.77 shows the distribution is reasonably spread rather than collapsed onto one class. Treatment: Treat as ordinal (Minor high · anthropic:claude-opus-4-7
n
571
nulls
8 (1.4%)
unique
5
top_value
Moderate
top_rate
0.3694
cardinality
5
entropy
1.788
entropy_ratio
0.7698

certainty

categorical label
A 4-level categorical certainty/confidence label, almost certainly attached to event or prediction records. The distribution is extremely concentrated: 'Likely' covers 89.3% of the 571 rows, while 'Observed', 'Unknown', and 'Possible' together account for only 60 rows. Null rate is negligible (1.4%) and entropy ratio is just 0.33, so this column carries little discriminative signal on its own. Treatment: One-hot encode but expect low signal; consider collapsing rare levels or dropping due to severe class imbalance. high · anthropic:claude-opus-4-7
n
571
nulls
8 (1.4%)
unique
4
top_value
Likely
top_rate
0.8934
cardinality
4
entropy
0.6569
entropy_ratio
0.3285

urgency

categorical feature
This is a small-cardinality categorical flag describing the timing/urgency of an event, with 5 levels. The distribution is severely imbalanced: 'Expected' covers 90.6% of 571 rows, leaving 'Unknown', 'Future', 'Past', and 'Immediate' as rare tails (6-18 rows each). Entropy ratio of 0.27 confirms the column carries little information, and 1.4% of values are null. Treatment: Collapse rare levels into an 'Other' bucket or binarize as Expected vs not before modelling. high · anthropic:claude-opus-4-7
n
571
nulls
8 (1.4%)
unique
5
top_value
Expected
top_rate
0.9059
cardinality
5
entropy
0.6276
entropy_ratio
0.2703

areaDesc

categorical metadata long_tail
Free-text geographic descriptions for weather alerts, with 441 unique values across 571 rows (entropy ratio 0.98) and the most common value 'Glacier Bay' appearing only 9 times (1.6%). Many entries concatenate multiple zones with semicolons (e.g., 'Greater Lake Tahoe Area; Greater Lake Tahoe Area' which even repeats itself), so the field mixes single-region and compound-region strings. Null rate is low at 1.4%, but the long tail makes this unsuitable as a categorical feature without normalization. Treatment: Split on ';' and normalize to canonical zone names before any grouping or joining. high · anthropic:claude-opus-4-7
n
571
nulls
8 (1.4%)
unique
441
top_value
Glacier Bay
top_rate
0.01599
cardinality
441
entropy
8.608
entropy_ratio
0.9799

sent

unknown other skipped
The column 'sent' was skipped by the profiler, so its kind is unknown and no descriptive statistics are available. The only signals are 571 rows with a 0.0 null rate, suggesting the field is fully populated but otherwise opaque. Without type or cardinality information, no further characterisation is possible. Treatment: Re-profile with type inference enabled before deciding on downstream handling. low · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique

effective

unknown other skipped
The column 'effective' was skipped by the profiler, so its kind is unknown and no descriptive statistics were computed. We only know it has 571 rows and zero nulls; uniqueness, type, and value distribution are all unreported. Treatment: Re-profile or inspect manually to determine type before any downstream use. low · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique

onset

unknown other skipped
The column is named "onset" with 571 non-null entries and a null rate of 0.0, but saturn skipped profiling so its kind is unknown and no distributional stats were emitted. The name suggests an onset time or date (e.g., symptom or event onset), yet without dtype, uniqueness, or value statistics this is unverified. No surprises can be flagged because the evidence payload is empty beyond the row count. Treatment: Re-profile with an explicit type hint (likely datetime) before deciding on use. low · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique

expires

unknown other skipped
The column is named "expires" and contains 571 non-null entries, but saturn skipped profiling so its type and value distribution are unknown. No uniqueness, range, or format statistics are available to confirm whether it holds dates, durations, or flags. The name suggests a timestamp or expiry indicator, but this is unverified by the evidence. Treatment: Re-profile with an explicit parser to determine whether this is a timestamp before using it downstream. low · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique

latitude

numeric feature null_rate outliers
This is a geographic latitude column, with values spanning 25.76 to 69.65 and a median of 35.45 — consistent with northern-hemisphere coordinates. The column is almost entirely empty (97.2% null) with only 16 unique values across 571 rows, and 4 of the 16 populated values (25%) are flagged as outliers, with a right skew of 1.51 pulling the mean up to 40.46. Treatment: Drop or treat as sparse metadata; too few non-null values to model directly. high · anthropic:claude-opus-4-7
n
571
nulls
555 (97.2%)
unique
16
min
25.76
max
69.65
mean
40.46
median
35.45
std
11.58
q1
34.96
q3
41
iqr
6.043
skew
1.508
kurtosis
1.297
n_outliers
4
outlier_rate
0.25
zero_rate
0

longitude

numeric feature null_rate outliers
Geographic longitude coordinates, but only 16 of 571 rows carry a value (null rate 0.972), making this column nearly empty. The populated values span -120.609 to 18.955 with median -95.307, suggesting a mix of North American and possibly European points, and 2 outliers (12.5% of non-nulls) plus a skew of 1.28 hint at a few eastern-hemisphere entries pulling the distribution right. Treatment: Drop or impute given 97.2% nulls; only usable if paired with latitude on the few populated rows. medium · anthropic:claude-opus-4-7
n
571
nulls
555 (97.2%)
unique
16
min
-120.6
max
18.96
mean
-84.42
median
-95.31
std
45.23
q1
-120.2
q3
-78.65
iqr
41.52
skew
1.283
kurtosis
0.3248
n_outliers
2
outlier_rate
0.125
zero_rate
0

event_type

categorical label long_tail null_rate
This is a categorical event_type label naming rare atmospheric or geological phenomena (e.g., Ball Lightning, Waterspout, Volcanic Lightning). It is almost entirely empty: null_rate is 0.986 across n=571, leaving only 8 populated rows, each a distinct value with frequency 1, so entropy_ratio is 1.0. The column is effectively a sparse free-list of unique tags rather than a usable category. Treatment: Drop or retain only as a sparse annotation; too few non-null rows to model. high · anthropic:claude-opus-4-7
n
571
nulls
563 (98.6%)
unique
8
top_value
Ball Lightning
top_rate
0.125
cardinality
8
entropy
3
entropy_ratio
1

date

unknown timestamp skipped
This column is named "date" and contains 571 non-null values, but saturn skipped detailed profiling so no type, uniqueness, or distribution stats are available. Without parsed values it is impossible to confirm whether entries are timestamps, formatted strings, or something else. The only firm signals are the full population (null_rate 0.0) and the row count of 571. Treatment: Parse to a proper datetime type and re-profile before use. low · anthropic:claude-opus-4-7
n
571
nulls
0 (0.0%)
unique

state

categorical metadata long_tail null_rate
Geographic state field mixing US state abbreviations (FL, OK, CA, IA, NY) with an 'International' bucket. The column is essentially empty — 98.6% null with only 8 non-null values across 571 rows, and 'International' accounts for 3 of those 8. With just 6 unique values and near-uniform distribution among the populated rows (entropy ratio 0.93), there is too little signal here to support analysis. Treatment: Drop or collapse to a binary 'is_international' flag; null rate of 0.986 makes it unusable as-is. high · anthropic:claude-opus-4-7
n
571
nulls
563 (98.6%)
unique
6
top_value
International
top_rate
0.375
cardinality
6
entropy
2.406
entropy_ratio
0.9306

country

categorical metadata long_tail null_rate
Country-of-origin field, but it is effectively empty: 98.6% of the 571 rows are null and only 8 records carry a value across 4 distinct countries (USA dominates at 5 of 8, or 62.5%). With such sparse coverage, the entropy ratio of 0.77 and the long tail are statistically meaningless. This column cannot support any country-level analysis as-is. Treatment: Drop or set aside; null rate is too high to use as a feature. high · anthropic:claude-opus-4-7
n
571
nulls
563 (98.6%)
unique
4
top_value
USA
top_rate
0.625
cardinality
4
entropy
1.549
entropy_ratio
0.7744

magnitude

categorical feature long_tail null_rate
Categorical magnitude/intensity field, almost entirely empty: 98.6% null across 571 rows, leaving only 8 populated entries spread across 5 distinct values. The non-null content is also inconsistent, mixing placeholders ("N/A", "Unknown") with structured scales ("F0-F1", "EF-3 equivalent") and free text ("140 mph winds"), so even the present data is not directly comparable. Treatment: Drop or set aside; too sparse and inconsistently coded to model without manual normalisation. high · anthropic:claude-opus-4-7
n
571
nulls
563 (98.6%)
unique
5
top_value
N/A
top_rate
0.5
cardinality
5
entropy
2
entropy_ratio
0.8614

source

categorical metadata long_tail null_rate
This appears to be a citation/provenance field naming the agency or publication that supplied each record (e.g., NOAA offices, meteorological institutes, journals). It is almost entirely empty: 98.6% null, with only 8 non-null values across 571 rows, and every observed source occurs exactly once (top_rate 0.125, entropy_ratio 1.0). With no repetition, the column carries no categorical signal in its current state. Treatment: Drop or retain as provenance metadata only; too sparse and unique for modelling. high · anthropic:claude-opus-4-7
n
571
nulls
563 (98.6%)
unique
8
top_value
Journal of Geophysical Research
top_rate
0.125
cardinality
8
entropy
3
entropy_ratio
1