accessibility-atlas-cdc_dhds_disability_prevalence

Overview

Source: /home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv

Saturn profiled 3,592 rows across 30 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/datasets/accessibility-atlas/cdc_dhds_disability_prevalence.csv",
    "--findings", "accessibility-atlas-cdc_dhds_disability_prevalence.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset contains 3,592 BRFSS-derived records of age-adjusted disability prevalence among U.S. adults 18+, broken out by state/territory (65 locations), year (2016-2022), and 8 disability response types. The core measure is Data_Value (percent prevalence), which ranges from 1.8% to 81.3% with a median of 9.1% and a heavily right-skewed distribution flagged for outliers. Most metadata columns (Category, Indicator, DataSource, Stratification1, etc.) are constant single-value fields and can be ignored as filters. The two things worth a closer look are the distribution of Data_Value across the 8 disability types in Response, and the geographic spread via LocationDesc — both are perfectly balanced in row counts, so any variation will come from the prevalence values themselves.

citing: row_count · column_count · Data_Value · Response · LocationDesc · Year · WeightedNumber · Category

Out[4]:

saturn.schema() · 30 columns

column	kind	n	null%	unique	alerts
Year	numeric	3,592	0.0%	7
LocationAbbr	categorical	3,592	0.0%	65
LocationDesc	categorical	3,592	0.0%	65
DataSource	categorical	3,592	0.0%	1	imbalance
Category	categorical	3,592	0.0%	1	imbalance
Indicator	categorical	3,592	0.0%	1	imbalance
Response	categorical	3,592	0.0%	8
Data_Value_Unit	categorical	3,592	0.0%	1	imbalance
Data_Value_Type	categorical	3,592	0.0%	1	imbalance
Data_Value	numeric	3,592	0.1%	486	outliers
Data_Value_Alt	numeric	3,592	0.1%	486	outliers
Data_Value_Footnote_Symbol	categorical	3,592	99.9%	1	null_rate imbalance
Data_Value_Footnote	categorical	3,592	99.9%	1	null_rate imbalance
Low_Confidence_Limit	numeric	3,592	0.1%	489	outliers
High_Confidence_Limit	numeric	3,592	0.1%	503	outliers
Number	numeric	3,592	0.1%	2,267	high_skew outliers
WeightedNumber	numeric	3,592	0.1%	3,580	high_skew outliers
StratificationCategory1	categorical	3,592	0.0%	1	imbalance
Stratification1	categorical	3,592	0.0%	1	imbalance
StratificationCategory2	unknown	3,592	0.0%	—	skipped
Stratification2	unknown	3,592	0.0%	—	skipped
CategoryID	categorical	3,592	0.0%	1	imbalance
IndicatorID	categorical	3,592	0.0%	1	imbalance
LocationID	numeric	3,592	0.0%	65
ResponseID	categorical	3,592	0.0%	8
DataValueTypeID	categorical	3,592	0.0%	1	imbalance
StratificationCategoryID1	categorical	3,592	0.0%	1	imbalance
StratificationID1	categorical	3,592	0.0%	1	imbalance
StratificationCategoryID2	unknown	3,592	0.0%	—	skipped
StratificationID2	unknown	3,592	0.0%	—	skipped

Fig 1.

Data_Value · Prevalence percentages are right-skewed with a median near 9.1% and a long tail up to 81.3% — watch for the outlier cluster.

Show data table

Histogram bins for Data_Value (median: 9.1).
bin	count
1.8 – 3.788	442
3.788 – 5.775	606
5.775 – 7.763	551
7.763 – 9.75	297
9.75 – 11.74	343
11.74 – 13.73	237
13.73 – 15.71	112
15.71 – 17.7	58
17.7 – 19.69	38
19.69 – 21.68	50
21.68 – 23.66	84
23.66 – 25.65	88
25.65 – 27.64	79
27.64 – 29.62	66
29.62 – 31.61	29
31.61 – 33.6	25
33.6 – 35.59	14
35.59 – 37.57	7
37.57 – 39.56	8
39.56 – 41.55	2
41.55 – 43.54	2
43.54 – 45.52	0
45.52 – 47.51	0
47.51 – 49.5	0
49.5 – 51.49	0
51.49 – 53.48	0
53.48 – 55.46	0
55.46 – 57.45	0
57.45 – 59.44	3
59.44 – 61.42	5
61.42 – 63.41	8
63.41 – 65.4	10
65.4 – 67.39	21
67.39 – 69.38	25
69.38 – 71.36	44
71.36 – 73.35	76
73.35 – 75.34	81
75.34 – 77.33	92
77.33 – 79.31	66
79.31 – 81.3	18

Fig 2.

Response · All 8 disability types appear in equal counts (449 each), confirming a balanced design across Cognitive, Mobility, Vision, Hearing, Self-care, Independent Living, Any, and No Disability.

Show data table

Top values for Response (8 unique shown, of 8 total).
value	count	share
Cognitive Disability	449	12.5%
No Disability	449	12.5%
Mobility Disability	449	12.5%
Independent Living Disability	449	12.5%
Any Disability	449	12.5%
Vision Disability	449	12.5%
Self-care Disability	449	12.5%
Hearing Disability	449	12.5%

Fig 3.

Year · Records span 2016-2022; check whether coverage is even across years before doing any trend analysis.

Show data table

Histogram bins for Year (median: 2019.0).
bin	count
2016 – 2016	520
2016 – 2016	0
2016 – 2016	0
2016 – 2017	0
2017 – 2017	0
2017 – 2017	0
2017 – 2017	512
2017 – 2017	0
2017 – 2017	0
2017 – 2018	0
2018 – 2018	0
2018 – 2018	0
2018 – 2018	0
2018 – 2018	512
2018 – 2018	0
2018 – 2018	0
2018 – 2019	0
2019 – 2019	0
2019 – 2019	0
2019 – 2019	0
2019 – 2019	504
2019 – 2019	0
2019 – 2019	0
2019 – 2020	0
2020 – 2020	0
2020 – 2020	0
2020 – 2020	512
2020 – 2020	0
2020 – 2020	0
2020 – 2020	0
2020 – 2021	0
2021 – 2021	0
2021 – 2021	0
2021 – 2021	512
2021 – 2021	0
2021 – 2021	0
2021 – 2022	0
2022 – 2022	0
2022 – 2022	0
2022 – 2022	520

Fig 4.

LocationDesc · 65 locations (states, DC, territories) each contribute 56 rows — useful for confirming geographic completeness before mapping prevalence.

Show data table

Top values for LocationDesc (20 unique shown, of 65 total).
value	count	share
Pennsylvania	56	1.6%
Louisiana	56	1.6%
Arkansas	56	1.6%
Wyoming	56	1.6%
Alaska	56	1.6%
Maryland	56	1.6%
Guam	56	1.6%
Massachusetts	56	1.6%
West Virginia	56	1.6%
Utah	56	1.6%
North Dakota	56	1.6%
North Carolina	56	1.6%
Ohio	56	1.6%
South Dakota	56	1.6%
Connecticut	56	1.6%
Oregon	56	1.6%
Minnesota	56	1.6%
HHS Region 6	56	1.6%
Michigan	56	1.6%
HHS Region 8	56	1.6%

Fig 5.

WeightedNumber · Weighted population estimates range from ~1.6K to 181M with extreme skew (kurtosis ~262) — large states dominate the tail.

Show data table

Histogram bins for WeightedNumber (median: 418252.0).
bin	count
1641 – 4.532e+06	3285
4.532e+06 – 9.063e+06	156
9.063e+06 – 1.359e+07	42
1.359e+07 – 1.812e+07	40
1.812e+07 – 2.265e+07	15
2.265e+07 – 2.718e+07	4
2.718e+07 – 3.172e+07	19
3.172e+07 – 3.625e+07	12
3.625e+07 – 4.078e+07	0
4.078e+07 – 4.531e+07	0
4.531e+07 – 4.984e+07	0
4.984e+07 – 5.437e+07	0
5.437e+07 – 5.89e+07	0
5.89e+07 – 6.343e+07	1
6.343e+07 – 6.796e+07	5
6.796e+07 – 7.249e+07	0
7.249e+07 – 7.702e+07	1
7.702e+07 – 8.155e+07	0
8.155e+07 – 8.608e+07	0
8.608e+07 – 9.061e+07	0
9.061e+07 – 9.514e+07	0
9.514e+07 – 9.967e+07	0
9.967e+07 – 1.042e+08	0
1.042e+08 – 1.087e+08	0
1.087e+08 – 1.133e+08	0
1.133e+08 – 1.178e+08	0
1.178e+08 – 1.223e+08	0
1.223e+08 – 1.269e+08	0
1.269e+08 – 1.314e+08	0
1.314e+08 – 1.359e+08	0
1.359e+08 – 1.404e+08	0
1.404e+08 – 1.45e+08	0
1.45e+08 – 1.495e+08	0
1.495e+08 – 1.54e+08	0
1.54e+08 – 1.586e+08	0
1.586e+08 – 1.631e+08	0
1.631e+08 – 1.676e+08	1
1.676e+08 – 1.722e+08	2
1.722e+08 – 1.767e+08	0
1.767e+08 – 1.812e+08	4

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
Year	numeric	0.0%
LocationAbbr	categorical	0.0%
LocationDesc	categorical	0.0%
DataSource	categorical	0.0%
Category	categorical	0.0%
Indicator	categorical	0.0%
Response	categorical	0.0%
Data_Value_Unit	categorical	0.0%
Data_Value_Type	categorical	0.0%
Data_Value	numeric	0.1%
Data_Value_Alt	numeric	0.1%
Data_Value_Footnote_Symbol	categorical	99.9%
Data_Value_Footnote	categorical	99.9%
Low_Confidence_Limit	numeric	0.1%
High_Confidence_Limit	numeric	0.1%
Number	numeric	0.1%
WeightedNumber	numeric	0.1%
StratificationCategory1	categorical	0.0%
Stratification1	categorical	0.0%
StratificationCategory2	unknown	0.0%
Stratification2	unknown	0.0%
CategoryID	categorical	0.0%
IndicatorID	categorical	0.0%
LocationID	numeric	0.0%
ResponseID	categorical	0.0%
DataValueTypeID	categorical	0.0%
StratificationCategoryID1	categorical	0.0%
StratificationID1	categorical	0.0%
StratificationCategoryID2	unknown	0.0%
StratificationID2	unknown	0.0%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 8 numeric columns (values clipped to 2 decimals).
	Year	Data_Value	Data_Value_Alt	Low_Confidence_Limit	High_Confidence_Limit	Number	WeightedNumber	LocationID
Year	+1.00	+0.02	+0.02	+0.02	+0.02	+0.03	+0.04	+0.03
Data_Value	+0.02	+1.00	+1.00	+1.00	+1.00	+0.25	+0.23	-0.00
Data_Value_Alt	+0.02	+1.00	+1.00	+1.00	+1.00	+0.25	+0.23	-0.00
Low_Confidence_Limit	+0.02	+1.00	+1.00	+1.00	+1.00	+0.26	+0.23	-0.00
High_Confidence_Limit	+0.02	+1.00	+1.00	+1.00	+1.00	+0.25	+0.22	-0.00
Number	+0.03	+0.25	+0.25	+0.26	+0.25	+1.00	+0.98	-0.00
WeightedNumber	+0.04	+0.23	+0.23	+0.23	+0.22	+0.98	+1.00	-0.01
LocationID	+0.03	-0.00	-0.00	-0.00	-0.00	-0.00	-0.01	+1.00

Year numeric timestamp

This is a Year column spanning 2016 to 2022 with only 7 unique values across 3592 rows, no nulls, and a perfectly symmetric distribution centered on 2019 (mean = median = 2019). Despite being typed numeric, it functions as a low-cardinality temporal category. No outliers and zero zero-values, so the field is clean.

Treatment: Treat as an ordinal/categorical year for grouping or one-hot encoding rather than a continuous numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["Year"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	7
min	2,016
max	2,022
mean	2,019
median	2,019
std	2.008
q1	2,017
q3	2,021
iqr	4
skew	0
kurtosis	-1.259
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 8.

Distribution of Year. Vertical dash marks the median.

Show data table

Histogram bins for Year (median: 2019.0).
bin	count
2016 – 2016	520
2016 – 2016	0
2016 – 2016	0
2016 – 2017	0
2017 – 2017	0
2017 – 2017	0
2017 – 2017	512
2017 – 2017	0
2017 – 2017	0
2017 – 2018	0
2018 – 2018	0
2018 – 2018	0
2018 – 2018	0
2018 – 2018	512
2018 – 2018	0
2018 – 2018	0
2018 – 2019	0
2019 – 2019	0
2019 – 2019	0
2019 – 2019	0
2019 – 2019	504
2019 – 2019	0
2019 – 2019	0
2019 – 2020	0
2020 – 2020	0
2020 – 2020	0
2020 – 2020	512
2020 – 2020	0
2020 – 2020	0
2020 – 2020	0
2020 – 2021	0
2021 – 2021	0
2021 – 2021	0
2021 – 2021	512
2021 – 2021	0
2021 – 2021	0
2021 – 2022	0
2022 – 2022	0
2022 – 2022	0
2022 – 2022	520

LocationAbbr categorical foreign_key

This is a US state/territory abbreviation code (e.g., PA, LA, AR, WY, GU), serving as a geographic key. With 65 unique values across 3592 rows and a near-uniform distribution (entropy ratio 0.999, top_rate just 0.0156), most codes appear exactly 56 times — suggesting a balanced panel of states/territories repeated across another dimension. The cardinality of 65 exceeds the 50 states, indicating territories and possibly national/regional aggregates are included.

Treatment: left-join on this code to enrich with state/territory metadata, or one-hot encode for modelling.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["LocationAbbr"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	65
top_value	PA
top_rate	0.01559
cardinality	65
entropy	6.017
entropy_ratio	0.9992

Fig 9.

Top values for LocationAbbr.

Show data table

Top values for LocationAbbr (20 unique shown, of 65 total).
value	count	share
PA	56	1.6%
LA	56	1.6%
AR	56	1.6%
WY	56	1.6%
AK	56	1.6%
MD	56	1.6%
GU	56	1.6%
MA	56	1.6%
WV	56	1.6%
UT	56	1.6%
ND	56	1.6%
NC	56	1.6%
OH	56	1.6%
SD	56	1.6%
CT	56	1.6%
OR	56	1.6%
MN	56	1.6%
HHS6	56	1.6%
MI	56	1.6%
HHS8	56	1.6%

LocationDesc categorical feature

LocationDesc is a US state/territory name field with 65 distinct values including states, DC, and territories like Guam. The distribution is essentially uniform — entropy_ratio of 0.999 and the top 10 values all tie at 56 occurrences — suggesting this is a balanced panel where each location contributes the same number of rows. No nulls and a tidy, closed vocabulary.

Treatment: Use as a categorical grouping key; one-hot or target-encode if modelling.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["LocationDesc"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	65
top_value	Pennsylvania
top_rate	0.01559
cardinality	65
entropy	6.017
entropy_ratio	0.9992

Fig 10.

Top values for LocationDesc.

Show data table

Top values for LocationDesc (20 unique shown, of 65 total).
value	count	share
Pennsylvania	56	1.6%
Louisiana	56	1.6%
Arkansas	56	1.6%
Wyoming	56	1.6%
Alaska	56	1.6%
Maryland	56	1.6%
Guam	56	1.6%
Massachusetts	56	1.6%
West Virginia	56	1.6%
Utah	56	1.6%
North Dakota	56	1.6%
North Carolina	56	1.6%
Ohio	56	1.6%
South Dakota	56	1.6%
Connecticut	56	1.6%
Oregon	56	1.6%
Minnesota	56	1.6%
HHS Region 6	56	1.6%
Michigan	56	1.6%
HHS Region 8	56	1.6%

DataSource categorical metadata

This column records the dataset's provenance, with every one of the 3592 rows tagged "BRFSS". Cardinality is 1 and entropy is 0, so it carries no discriminative signal.

Treatment: Drop; constant column adds no information.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["DataSource"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	BRFSS
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 11.

Top values for DataSource.

Show data table

Top values for DataSource (1 unique shown, of 1 total).
value	count	share
BRFSS	3592	100.0%

Category categorical metadata

This column is a single-valued tag labeling every row as "Disability Estimates" across all 3592 records. With cardinality of 1, top_rate of 1.0, and entropy of 0.0, it carries no information for modelling or filtering.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["Category"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	Disability Estimates
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 12.

Top values for Category.

Show data table

Top values for Category (1 unique shown, of 1 total).
value	count	share
Disability Estimates	3592	100.0%

Indicator categorical metadata

This column holds a single constant string ('Disability status and types among adults 18 years of age or older') across all 3,592 rows, with cardinality 1 and entropy 0. It carries no information for modelling and likely just labels the survey indicator the dataset was filtered to.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["Indicator"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	Disability status and types among adults 18 years of age or older
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 13.

Top values for Indicator.

Show data table

Top values for Indicator (1 unique shown, of 1 total).
value	count	share
Disability status and types among adults 18 years of age or older	3592	100.0%

Response categorical label

This column enumerates a disability response category, with 8 distinct values such as 'Cognitive Disability', 'No Disability', and 'Hearing Disability'. The distribution is perfectly uniform — each of the 8 values appears exactly 449 times (top_rate 0.125, entropy_ratio 1.0), indicating the dataset is balanced or pivoted by category rather than sampled organically. There are no nulls.

Treatment: Use as a categorical label; one-hot or factor encode for modelling.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["Response"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	8
top_value	Cognitive Disability
top_rate	0.125
cardinality	8
entropy	3
entropy_ratio	1

Fig 14.

Top values for Response.

Show data table

Top values for Response (8 unique shown, of 8 total).
value	count	share
Cognitive Disability	449	12.5%
No Disability	449	12.5%
Mobility Disability	449	12.5%
Independent Living Disability	449	12.5%
Any Disability	449	12.5%
Vision Disability	449	12.5%
Self-care Disability	449	12.5%
Hearing Disability	449	12.5%

Data_Value_Unit categorical metadata

This column records the unit of measurement for the data values, and it is constant: every one of the 3592 rows carries the value "%". With cardinality 1, entropy 0, and top_rate 1.0, it provides no information for modelling or segmentation.

Treatment: Drop; constant column carrying no signal.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["Data_Value_Unit"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	%
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 15.

Top values for Data_Value_Unit.

Show data table

Top values for Data_Value_Unit (1 unique shown, of 1 total).
value	count	share
%	3592	100.0%

Data_Value_Type categorical metadata

This column records the type of data value reported, but every one of the 3592 rows holds the single label "Age-adjusted Prevalence". Cardinality is 1 and entropy is 0, so the field carries no information for modelling or segmentation. It likely exists as a schema placeholder from a wider source where multiple value types are possible.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["Data_Value_Type"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	Age-adjusted Prevalence
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 16.

Top values for Data_Value_Type.

Show data table

Top values for Data_Value_Type (1 unique shown, of 1 total).
value	count	share
Age-adjusted Prevalence	3592	100.0%

Data_Value numeric feature

Data_Value is a continuous numeric measurement spanning 1.8 to 81.3 with a median of 9.1 but mean of 18.25, indicating heavy right skew (skew 1.88, kurtosis 2.09). The distribution flags 450 outliers (12.5% of rows) and the standard deviation (22.16) exceeds the mean, suggesting a long upper tail or a mixture of differently-scaled metrics. Nulls are negligible (0.14%) and there are no zeros, but only 486 unique values across 3,592 rows hints at rounding or a discrete reporting grid.

Treatment: Log-transform or winsorize before modelling to tame the right skew and 12.5% outlier load.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["Data_Value"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	486
min	1.8
max	81.3
mean	18.25
median	9.1
std	22.16
q1	5.3
q3	19.95
iqr	14.65
skew	1.876
kurtosis	2.086
n_outliers	450
outlier_rate	0.1255
zero_rate	0
alert: outliers	12.5% rows beyond 1.5 IQR

Fig 17.

Distribution of Data_Value. Vertical dash marks the median.

Show data table

Histogram bins for Data_Value (median: 9.1).
bin	count
1.8 – 3.788	442
3.788 – 5.775	606
5.775 – 7.763	551
7.763 – 9.75	297
9.75 – 11.74	343
11.74 – 13.73	237
13.73 – 15.71	112
15.71 – 17.7	58
17.7 – 19.69	38
19.69 – 21.68	50
21.68 – 23.66	84
23.66 – 25.65	88
25.65 – 27.64	79
27.64 – 29.62	66
29.62 – 31.61	29
31.61 – 33.6	25
33.6 – 35.59	14
35.59 – 37.57	7
37.57 – 39.56	8
39.56 – 41.55	2
41.55 – 43.54	2
43.54 – 45.52	0
45.52 – 47.51	0
47.51 – 49.5	0
49.5 – 51.49	0
51.49 – 53.48	0
53.48 – 55.46	0
55.46 – 57.45	0
57.45 – 59.44	3
59.44 – 61.42	5
61.42 – 63.41	8
63.41 – 65.4	10
65.4 – 67.39	21
67.39 – 69.38	25
69.38 – 71.36	44
71.36 – 73.35	76
73.35 – 75.34	81
75.34 – 77.33	92
77.33 – 79.31	66
79.31 – 81.3	18

Data_Value_Alt numeric feature

A numeric measurement field (likely an alternate encoding of Data_Value) ranging from 1.8 to 81.3 with a median of 9.1 and mean of 18.25. The distribution is heavily right-skewed (skew 1.88, kurtosis 2.09) with std 22.16 dwarfing the IQR of 14.65, and 12.5% of rows (450) flagged as outliers. Only 486 distinct values across 3,592 rows suggest a discretised or rounded scale rather than a continuous measure.

Treatment: Log-transform or winsorise before modelling to tame the right skew and outlier mass.

anthropic:claude-opus-4-7 · confidence high

Out[43]:

saturn.columns["Data_Value_Alt"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	486
min	1.8
max	81.3
mean	18.25
median	9.1
std	22.16
q1	5.3
q3	19.95
iqr	14.65
skew	1.876
kurtosis	2.086
n_outliers	450
outlier_rate	0.1255
zero_rate	0
alert: outliers	12.5% rows beyond 1.5 IQR

Fig 18.

Distribution of Data_Value_Alt. Vertical dash marks the median.

Show data table

Histogram bins for Data_Value_Alt (median: 9.1).
bin	count
1.8 – 3.788	442
3.788 – 5.775	606
5.775 – 7.763	551
7.763 – 9.75	297
9.75 – 11.74	343
11.74 – 13.73	237
13.73 – 15.71	112
15.71 – 17.7	58
17.7 – 19.69	38
19.69 – 21.68	50
21.68 – 23.66	84
23.66 – 25.65	88
25.65 – 27.64	79
27.64 – 29.62	66
29.62 – 31.61	29
31.61 – 33.6	25
33.6 – 35.59	14
35.59 – 37.57	7
37.57 – 39.56	8
39.56 – 41.55	2
41.55 – 43.54	2
43.54 – 45.52	0
45.52 – 47.51	0
47.51 – 49.5	0
49.5 – 51.49	0
51.49 – 53.48	0
53.48 – 55.46	0
55.46 – 57.45	0
57.45 – 59.44	3
59.44 – 61.42	5
61.42 – 63.41	8
63.41 – 65.4	10
65.4 – 67.39	21
67.39 – 69.38	25
69.38 – 71.36	44
71.36 – 73.35	76
73.35 – 75.34	81
75.34 – 77.33	92
77.33 – 79.31	66
79.31 – 81.3	18

Data_Value_Footnote_Symbol categorical metadata

This appears to be a footnote symbol marker, almost entirely empty with a 99.86% null rate and only 5 non-null entries — all the single character '*'. With cardinality of 1 and entropy of 0, the column carries no discriminative information.

Treatment: Drop; effectively constant with 99.86% nulls.

anthropic:claude-opus-4-7 · confidence high

Out[46]:

saturn.columns["Data_Value_Footnote_Symbol"].stats

stat	value
n	3,592
nulls	3,587 (99.9%)
unique	1
top_value	*
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: null_rate	99.9% null
alert: imbalance	top value is 100.0% of rows

Fig 19.

Top values for Data_Value_Footnote_Symbol.

Show data table

Top values for Data_Value_Footnote_Symbol (1 unique shown, of 1 total).
value	count	share
*	5	0.1%

Data_Value_Footnote categorical metadata

This column is a footnote/annotation field accompanying a Data_Value column, used to flag exceptional rows. It is effectively empty: 99.86% null, with only 5 non-null entries, all carrying the single value "Data suppressed" (cardinality 1, entropy 0). It carries no discriminative information on its own and only marks the handful of suppressed measurements.

Treatment: Convert to a boolean is_suppressed flag and drop the original column.

anthropic:claude-opus-4-7 · confidence high

Out[49]:

saturn.columns["Data_Value_Footnote"].stats

stat	value
n	3,592
nulls	3,587 (99.9%)
unique	1
top_value	Data suppressed
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: null_rate	99.9% null
alert: imbalance	top value is 100.0% of rows

Fig 20.

Top values for Data_Value_Footnote.

Show data table

Top values for Data_Value_Footnote (1 unique shown, of 1 total).
value	count	share
Data suppressed	5	0.1%

Low_Confidence_Limit numeric feature

This is the lower bound of a confidence interval for some measured rate or percentage, ranging from 1.1 to 80.5 with a median of 8.2. The distribution is heavily right-skewed (skew 1.90, kurtosis 2.16) and 12.57% of values flag as outliers, suggesting a long tail of high-confidence-floor estimates above the bulk of small values. Nulls are negligible (0.14%) and there are no zeros.

Treatment: Log-transform before modelling to tame the right skew, and pair with the matching upper limit.

anthropic:claude-opus-4-7 · confidence high

Out[52]:

saturn.columns["Low_Confidence_Limit"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	489
min	1.1
max	80.5
mean	17.31
median	8.2
std	21.89
q1	4.7
q3	18.7
iqr	14
skew	1.899
kurtosis	2.159
n_outliers	451
outlier_rate	0.1257
zero_rate	0
alert: outliers	12.6% rows beyond 1.5 IQR

Fig 21.

Distribution of Low_Confidence_Limit. Vertical dash marks the median.

Show data table

Histogram bins for Low_Confidence_Limit (median: 8.2).
bin	count
1.1 – 3.085	375
3.085 – 5.07	669
5.07 – 7.055	587
7.055 – 9.04	317
9.04 – 11.03	324
11.03 – 13.01	232
13.01 – 15	104
15 – 16.98	48
16.98 – 18.97	38
18.97 – 20.95	56
20.95 – 22.94	91
22.94 – 24.92	95
24.92 – 26.91	78
26.91 – 28.89	52
28.89 – 30.88	26
30.88 – 32.86	20
32.86 – 34.85	12
34.85 – 36.83	7
36.83 – 38.82	4
38.82 – 40.8	3
40.8 – 42.79	0
42.79 – 44.77	0
44.77 – 46.76	0
46.76 – 48.74	0
48.74 – 50.73	0
50.73 – 52.71	0
52.71 – 54.7	0
54.7 – 56.68	3
56.68 – 58.67	1
58.67 – 60.65	8
60.65 – 62.64	8
62.64 – 64.62	19
64.62 – 66.61	22
66.61 – 68.59	24
68.59 – 70.58	52
70.58 – 72.56	79
72.56 – 74.55	76
74.55 – 76.53	85
76.53 – 78.52	61
78.52 – 80.5	11

High_Confidence_Limit numeric feature

A numeric upper-confidence-bound feature, ranging from 2.2 to 83.0 with a median of 10.1 but a mean of 19.26, indicating a long right tail. The distribution is heavily right-skewed (skew 1.85, kurtosis 2.01) and 12.5% of values (449 rows) are flagged as outliers. With 503 unique values across 3592 rows and only 0.14% nulls, it behaves as a continuous measurement rather than a categorical bound.

Treatment: Log-transform before modelling to compress the right tail and dampen the 12.5% outlier mass.

anthropic:claude-opus-4-7 · confidence high

Out[55]:

saturn.columns["High_Confidence_Limit"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	503
min	2.2
max	83
mean	19.26
median	10.1
std	22.4
q1	6
q3	21.5
iqr	15.5
skew	1.851
kurtosis	2.011
n_outliers	449
outlier_rate	0.1252
zero_rate	0
alert: outliers	12.5% rows beyond 1.5 IQR

Fig 22.

Distribution of High_Confidence_Limit. Vertical dash marks the median.

Show data table

Histogram bins for High_Confidence_Limit (median: 10.1).
bin	count
2.2 – 4.22	407
4.22 – 6.24	580
6.24 – 8.26	534
8.26 – 10.28	290
10.28 – 12.3	347
12.3 – 14.32	257
14.32 – 16.34	137
16.34 – 18.36	70
18.36 – 20.38	46
20.38 – 22.4	49
22.4 – 24.42	76
24.42 – 26.44	92
26.44 – 28.46	75
28.46 – 30.48	72
30.48 – 32.5	34
32.5 – 34.52	21
34.52 – 36.54	26
36.54 – 38.56	11
38.56 – 40.58	7
40.58 – 42.6	4
42.6 – 44.62	3
44.62 – 46.64	0
46.64 – 48.66	0
48.66 – 50.68	0
50.68 – 52.7	0
52.7 – 54.72	0
54.72 – 56.74	0
56.74 – 58.76	0
58.76 – 60.78	3
60.78 – 62.8	3
62.8 – 64.82	8
64.82 – 66.84	10
66.84 – 68.86	20
68.86 – 70.88	26
70.88 – 72.9	49
72.9 – 74.92	78
74.92 – 76.94	96
76.94 – 78.96	88
78.96 – 80.98	55
80.98 – 83	13

Number numeric feature

This is a numeric 'Number' column, almost certainly a count or quantity metric rather than an identifier given 2267 unique values across 3592 rows and a non-trivial null rate of 0.0014. The distribution is severely right-skewed (skew 14.57, kurtosis 256.99): the median is 978 while the mean is 3780 and the max reaches 327817, with 385 outliers (10.7%) flagged. The IQR (467 to 2750) is tiny relative to the max, so a handful of extreme values dominate the variance (std 15294).

Treatment: Log-transform (or winsorize) before any distance- or variance-based modelling.

anthropic:claude-opus-4-7 · confidence high

Out[58]:

saturn.columns["Number"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	2,267
min	31
max	327,817
mean	3780
median	978
std	1.529e+04
q1	467
q3	2,750
iqr	2,283
skew	14.57
kurtosis	257
n_outliers	385
outlier_rate	0.1073
zero_rate	0
alert: high_skew	skew=+14.57
alert: outliers	10.7% rows beyond 1.5 IQR

Fig 23.

Distribution of Number. Vertical dash marks the median.

Show data table

Histogram bins for Number (median: 978.0).
bin	count
31 – 8226	3323
8226 – 1.642e+04	124
1.642e+04 – 2.461e+04	48
2.461e+04 – 3.281e+04	35
3.281e+04 – 4.1e+04	25
4.1e+04 – 4.92e+04	9
4.92e+04 – 5.739e+04	3
5.739e+04 – 6.559e+04	0
6.559e+04 – 7.378e+04	4
7.378e+04 – 8.198e+04	2
8.198e+04 – 9.017e+04	0
9.017e+04 – 9.837e+04	0
9.837e+04 – 1.066e+05	0
1.066e+05 – 1.148e+05	1
1.148e+05 – 1.23e+05	0
1.23e+05 – 1.311e+05	3
1.311e+05 – 1.393e+05	2
1.393e+05 – 1.475e+05	1
1.475e+05 – 1.557e+05	0
1.557e+05 – 1.639e+05	0
1.639e+05 – 1.721e+05	0
1.721e+05 – 1.803e+05	0
1.803e+05 – 1.885e+05	0
1.885e+05 – 1.967e+05	0
1.967e+05 – 2.049e+05	0
2.049e+05 – 2.131e+05	0
2.131e+05 – 2.213e+05	0
2.213e+05 – 2.295e+05	0
2.295e+05 – 2.377e+05	0
2.377e+05 – 2.459e+05	0
2.459e+05 – 2.541e+05	0
2.541e+05 – 2.623e+05	0
2.623e+05 – 2.705e+05	0
2.705e+05 – 2.786e+05	2
2.786e+05 – 2.868e+05	1
2.868e+05 – 2.95e+05	2
2.95e+05 – 3.032e+05	1
3.032e+05 – 3.114e+05	0
3.114e+05 – 3.196e+05	0
3.196e+05 – 3.278e+05	1

WeightedNumber numeric feature

WeightedNumber is a numeric measure with 3580 distinct values across 3592 rows, ranging from 1641 to 181,223,676 with a median of 418,252 but a mean of 2,103,449. The distribution is severely right-skewed (skew 14.65, kurtosis 262.16) and 444 rows (12.4%) fall outside the IQR fence, suggesting a long tail of very large weights dominating the mean.

Treatment: log-transform before modelling to tame the heavy right tail.

anthropic:claude-opus-4-7 · confidence high

Out[61]:

saturn.columns["WeightedNumber"].stats

stat	value
n	3,592
nulls	5 (0.1%)
unique	3,580
min	1,641
max	1.812e+08
mean	2.103e+06
median	418,252
std	9.082e+06
q1	149,677
q3	1.303e+06
iqr	1.153e+06
skew	14.65
kurtosis	262.2
n_outliers	444
outlier_rate	0.1238
zero_rate	0
alert: high_skew	skew=+14.65
alert: outliers	12.4% rows beyond 1.5 IQR

Fig 24.

Distribution of WeightedNumber. Vertical dash marks the median.

Show data table

Histogram bins for WeightedNumber (median: 418252.0).
bin	count
1641 – 4.532e+06	3285
4.532e+06 – 9.063e+06	156
9.063e+06 – 1.359e+07	42
1.359e+07 – 1.812e+07	40
1.812e+07 – 2.265e+07	15
2.265e+07 – 2.718e+07	4
2.718e+07 – 3.172e+07	19
3.172e+07 – 3.625e+07	12
3.625e+07 – 4.078e+07	0
4.078e+07 – 4.531e+07	0
4.531e+07 – 4.984e+07	0
4.984e+07 – 5.437e+07	0
5.437e+07 – 5.89e+07	0
5.89e+07 – 6.343e+07	1
6.343e+07 – 6.796e+07	5
6.796e+07 – 7.249e+07	0
7.249e+07 – 7.702e+07	1
7.702e+07 – 8.155e+07	0
8.155e+07 – 8.608e+07	0
8.608e+07 – 9.061e+07	0
9.061e+07 – 9.514e+07	0
9.514e+07 – 9.967e+07	0
9.967e+07 – 1.042e+08	0
1.042e+08 – 1.087e+08	0
1.087e+08 – 1.133e+08	0
1.133e+08 – 1.178e+08	0
1.178e+08 – 1.223e+08	0
1.223e+08 – 1.269e+08	0
1.269e+08 – 1.314e+08	0
1.314e+08 – 1.359e+08	0
1.359e+08 – 1.404e+08	0
1.404e+08 – 1.45e+08	0
1.45e+08 – 1.495e+08	0
1.495e+08 – 1.54e+08	0
1.54e+08 – 1.586e+08	0
1.586e+08 – 1.631e+08	0
1.631e+08 – 1.676e+08	1
1.676e+08 – 1.722e+08	2
1.722e+08 – 1.767e+08	0
1.767e+08 – 1.812e+08	4

StratificationCategory1 categorical metadata

This column is a stratification dimension label, but every one of the 3592 rows holds the single value "Overall" (top_rate 1.0, cardinality 1, entropy 0.0). It carries no information and likely indicates this slice of the source dataset was filtered to the un-stratified aggregate.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high

Out[64]:

saturn.columns["StratificationCategory1"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	Overall
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 25.

Top values for StratificationCategory1.

Show data table

Top values for StratificationCategory1 (1 unique shown, of 1 total).
value	count	share
Overall	3592	100.0%

Stratification1 categorical metadata

This column is a stratification label that takes the single value "Overall" across all 3592 rows. With cardinality 1 and entropy 0, it carries no information and cannot differentiate records. It likely indicates that this slice of the source data was not broken out by any subgroup.

Treatment: drop, constant column with a single value.

anthropic:claude-opus-4-7 · confidence high

Out[67]:

saturn.columns["Stratification1"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	Overall
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 26.

Top values for Stratification1.

Show data table

Top values for Stratification1 (1 unique shown, of 1 total).
value	count	share
Overall	3592	100.0%

StratificationCategory2 unknown metadata

Column was skipped by the profiler, so no value-level statistics are available beyond a row count of 3592 and a null rate of 0.0. The name suggests a secondary stratification dimension used alongside a primary category, typical of public health or survey datasets. Without unique counts or value distributions, its content cannot be characterised further.

Treatment: Re-profile with the skip removed to inspect cardinality before deciding on encoding.

anthropic:claude-opus-4-7 · confidence low

Out[70]:

saturn.columns["StratificationCategory2"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

Stratification2 unknown other

Saturn skipped detailed profiling for Stratification2, so only the row count (3592) and a 0.0 null rate are known. With no unique count, type, or value distribution available, the column's content cannot be characterised from this evidence alone. The name suggests a secondary stratification key paired with a primary Stratification1 field, but that is not confirmed by the stats.

Treatment: Re-profile or inspect raw values before deciding; do not use until kind and cardinality are established.

anthropic:claude-opus-4-7 · confidence low

Out[72]:

saturn.columns["Stratification2"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

CategoryID categorical metadata

CategoryID is a categorical column that carries no information: every one of the 3592 rows holds the single value "DISEST", giving cardinality 1 and entropy 0. It likely encodes a fixed dataset-level tag or filter rather than a per-row attribute.

Treatment: Drop; constant column with zero variance.

anthropic:claude-opus-4-7 · confidence high

Out[74]:

saturn.columns["CategoryID"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	DISEST
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 27.

Top values for CategoryID.

Show data table

Top values for CategoryID (1 unique shown, of 1 total).
value	count	share
DISEST	3592	100.0%

IndicatorID categorical metadata

IndicatorID is a categorical column that holds the single value "STATTYPE" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and likely functions as a constant tag identifying the indicator type for this slice of the dataset.

Treatment: Drop before modelling; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high

Out[77]:

saturn.columns["IndicatorID"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	STATTYPE
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 28.

Top values for IndicatorID.

Show data table

Top values for IndicatorID (1 unique shown, of 1 total).
value	count	share
STATTYPE	3592	100.0%

LocationID numeric foreign_key

LocationID is almost certainly a categorical location key encoded as integers, with 65 distinct values across 3592 rows and no nulls. Values range from 1 to 89 with a median of 36 and mild positive skew (0.50), consistent with an ID lookup rather than a measured quantity. Treating it as numeric would be misleading despite its int dtype.

Treatment: Cast to categorical and left-join to a location lookup table rather than using as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[80]:

saturn.columns["LocationID"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	65
min	1
max	89
mean	39.69
median	36
std	25.34
q1	20
q3	54
iqr	34
skew	0.5048
kurtosis	-0.7622
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 29.

Distribution of LocationID. Vertical dash marks the median.

Show data table

Histogram bins for LocationID (median: 36.0).
bin	count
1 – 3.2	112
3.2 – 5.4	112
5.4 – 7.6	56
7.6 – 9.8	112
9.8 – 12	112
12 – 14.2	104
14.2 – 16.4	112
16.4 – 18.6	112
18.6 – 20.8	112
20.8 – 23	112
23 – 25.2	168
25.2 – 27.4	112
27.4 – 29.6	112
29.6 – 31.8	112
31.8 – 34	112
34 – 36.2	160
36.2 – 38.4	112
38.4 – 40.6	112
40.6 – 42.8	112
42.8 – 45	56
45 – 47.2	168
47.2 – 49.4	112
49.4 – 51.6	112
51.6 – 53.8	56
53.8 – 56	168
56 – 58.2	0
58.2 – 60.4	56
60.4 – 62.6	0
62.6 – 64.8	0
64.8 – 67	56
67 – 69.2	0
69.2 – 71.4	0
71.4 – 73.6	56
73.6 – 75.8	0
75.8 – 78	0
78 – 80.2	80
80.2 – 82.4	112
82.4 – 84.6	112
84.6 – 86.8	112
86.8 – 89	168

ResponseID categorical feature

ResponseID holds 8 distinct codes (Q6COG, Q6DIS2, Q6MOB, Q6IND, Q6DIS1, Q6VIS, Q6SEL, Q6HEAR), each appearing exactly 449 times across 3592 rows with no nulls. The perfectly uniform distribution and entropy ratio of 1.0 indicate this is a question/disability-domain identifier replicated per respondent rather than a unique response key. Despite the name, it behaves as a categorical factor, not an identifier.

Treatment: Treat as a categorical factor (one-hot or group-by key); do not use as a unique row id.

anthropic:claude-opus-4-7 · confidence high

Out[83]:

saturn.columns["ResponseID"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	8
top_value	Q6COG
top_rate	0.125
cardinality	8
entropy	3
entropy_ratio	1

Fig 30.

Top values for ResponseID.

Show data table

Top values for ResponseID (8 unique shown, of 8 total).
value	count	share
Q6COG	449	12.5%
Q6DIS2	449	12.5%
Q6MOB	449	12.5%
Q6IND	449	12.5%
Q6DIS1	449	12.5%
Q6VIS	449	12.5%
Q6SEL	449	12.5%
Q6HEAR	449	12.5%

DataValueTypeID categorical metadata

DataValueTypeID is a categorical metadata field indicating the type of statistical measure reported, but every one of the 3592 rows carries the single value 'AGEADJPREV' (age-adjusted prevalence). Cardinality is 1 and entropy is 0, so the column carries no information for modelling or filtering.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high

Out[86]:

saturn.columns["DataValueTypeID"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	AGEADJPREV
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 31.

Top values for DataValueTypeID.

Show data table

Top values for DataValueTypeID (1 unique shown, of 1 total).
value	count	share
AGEADJPREV	3592	100.0%

StratificationCategoryID1 categorical metadata

This column holds a single constant value "CAT1" across all 3592 rows, with zero nulls and cardinality of 1. Entropy is 0, meaning it carries no information for any downstream task. The name suggests it was meant to identify a stratification category, but only one category is represented in this slice.

Treatment: Drop; constant column with no variance.

anthropic:claude-opus-4-7 · confidence high

Out[89]:

saturn.columns["StratificationCategoryID1"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	CAT1
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 32.

Top values for StratificationCategoryID1.

Show data table

Top values for StratificationCategoryID1 (1 unique shown, of 1 total).
value	count	share
CAT1	3592	100.0%

StratificationID1 categorical metadata

This column holds a single constant value 'BO1' across all 3592 rows, with cardinality 1 and entropy 0. As a 'StratificationID1' it likely encodes a stratification dimension (e.g., overall/total) that was never varied in this slice. It carries no information for modelling or grouping.

Treatment: Drop; constant column with zero entropy.

anthropic:claude-opus-4-7 · confidence high

Out[92]:

saturn.columns["StratificationID1"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	1
top_value	BO1
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: imbalance	top value is 100.0% of rows

Fig 33.

Top values for StratificationID1.

Show data table

Top values for StratificationID1 (1 unique shown, of 1 total).
value	count	share
BO1	3592	100.0%

StratificationCategoryID2 unknown other

This column is named StratificationCategoryID2, suggesting it holds a secondary stratification category identifier in a public-health style dataset. Saturn skipped profiling, so no uniqueness, value, or distribution stats are available beyond a row count of 3592 and a null rate of 0.0. Without further signals, its actual content and cardinality cannot be characterised here.

Treatment: Re-profile with type coercion to confirm whether this is a categorical key before use.

anthropic:claude-opus-4-7 · confidence low

Out[95]:

saturn.columns["StratificationCategoryID2"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

StratificationID2 unknown foreign_key

StratificationID2 was skipped by the profiler, so its kind, uniqueness, and value distribution are unknown. The only confirmed signals are that it has 3592 rows and a null rate of 0.0. The name suggests a secondary stratification key (e.g., demographic subgroup) commonly paired with a StratificationCategoryID2 in CDC-style indicator tables.

Treatment: Re-profile the column to determine cardinality, then treat as a categorical join key against its stratification lookup.

anthropic:claude-opus-4-7 · confidence low

Out[97]:

saturn.columns["StratificationID2"].stats

stat	value
n	3,592
nulls	0 (0.0%)
unique	—
alert: skipped	no profiler for kind=unknown

accessibility atlas cdc dhds disability prevalence

Overview

Summary confidence: high

Year numeric timestamp

LocationAbbr categorical foreign_key

LocationDesc categorical feature

DataSource categorical metadata

Category categorical metadata

Indicator categorical metadata

Response categorical label

Data_Value_Unit categorical metadata

Data_Value_Type categorical metadata

Data_Value numeric feature

Data_Value_Alt numeric feature

Data_Value_Footnote_Symbol categorical metadata

Data_Value_Footnote categorical metadata

Low_Confidence_Limit numeric feature

High_Confidence_Limit numeric feature

Number numeric feature

WeightedNumber numeric feature

StratificationCategory1 categorical metadata

Stratification1 categorical metadata

StratificationCategory2 unknown metadata

Stratification2 unknown other

CategoryID categorical metadata

IndicatorID categorical metadata

LocationID numeric foreign_key

ResponseID categorical feature

DataValueTypeID categorical metadata

StratificationCategoryID1 categorical metadata

StratificationID1 categorical metadata

StratificationCategoryID2 unknown other

StratificationID2 unknown foreign_key

How to cite