cms-cms_hospitals_20260121 · saturn notebook

Overview

Source: /home/coolhand/html/datavis/data_trove/cache/cms/cms_hospitals_20260121.parquet

Saturn profiled 5,421 rows across 38 columns. The stats below are deterministic and machine-readable; the prose is a language-model interpretation of those stats (opt-in, added after the fact, never sees raw rows).

[2]:

!pip install saturn-dissect
import subprocess
subprocess.run([
    "saturn", "analyze", "/home/coolhand/html/datavis/data_trove/cache/cms/cms_hospitals_20260121.parquet",
    "--findings", "cms-cms_hospitals_20260121.json",
    "--llm", "anthropic:claude-opus-4-7",
])

Summary confidence: high

This dataset catalogs 5,421 U.S. hospitals with 38 columns covering location (city, county, state, ZIP), facility identity, ownership and type, and CMS quality-measure rollups (mortality, readmission, safety, patient experience, timely & effective care). The most interesting structural story is the quality-rating coverage: 'Hospital overall rating' is 'Not Available' for 47% of hospitals, and the various footnote columns are null for 53–83% of rows, so any analysis of star ratings has to handle a large missing slice. On the categorical side, the mix is dominated by Acute Care Hospitals (~58%) and Voluntary non-profit – Private ownership (~42%), with Texas and California leading state counts. The 'Meets criteria for birthing friendly designation' field only ever takes the value 'Y' (58% null, no 'N'), so it is effectively a flag rather than a comparator.

citing: row_count · column_count · Hospital overall rating.top_values · Hospital overall rating.top_rate · hospital_type.top_values · hospital_ownership.top_values · state.top_values · Meets criteria for birthing friendly designation.null_rate · Meets criteria for birthing friendly designation.top_value · MORT Group Footnote.null_rate · READM Group Footnote.null_rate · Safety Group Footnote.null_rate · Pt Exp Group Footnote.null_rate · emergency_services.top_values

Out[4]:

saturn.schema() · 38 columns

column	kind	n	null%	unique	alerts
facility_id	text	5,421	0.0%	5,421	near_unique one_word allcaps short_text
facility_name	text	5,421	0.0%	5,286	near_unique allcaps
address	text	5,421	0.0%	5,387	near_unique allcaps
city	text	5,421	0.0%	3,049	one_word allcaps short_text duplicates
state	categorical	5,421	0.0%	56
zip_code	numeric	5,421	0.0%	4,721
county_name	text	5,421	0.0%	1,555	one_word allcaps short_text duplicates
phone_number	text	5,421	0.0%	5,383	near_unique allcaps short_text
hospital_type	categorical	5,421	0.0%	8
hospital_ownership	categorical	5,421	0.0%	12
emergency_services	categorical	5,421	0.0%	2
Meets criteria for birthing friendly designation	categorical	5,421	58.2%	1	null_rate imbalance
Hospital overall rating	categorical	5,421	0.0%	6
Hospital overall rating footnote	categorical	5,421	52.7%	7	null_rate
MORT Group Measure Count	categorical	5,421	0.0%	2
Count of Facility MORT Measures	categorical	5,421	0.0%	8
Count of MORT Measures Better	categorical	5,421	0.0%	9
Count of MORT Measures No Different	categorical	5,421	0.0%	9
Count of MORT Measures Worse	categorical	5,421	0.0%	7
MORT Group Footnote	numeric	5,421	67.2%	4	null_rate
Safety Group Measure Count	categorical	5,421	0.0%	2
Count of Facility Safety Measures	categorical	5,421	0.0%	9
Count of Safety Measures Better	categorical	5,421	0.0%	8
Count of Safety Measures No Different	categorical	5,421	0.0%	10
Count of Safety Measures Worse	categorical	5,421	0.0%	5
Safety Group Footnote	numeric	5,421	61.8%	4	null_rate
READM Group Measure Count	categorical	5,421	0.0%	2
Count of Facility READM Measures	categorical	5,421	0.0%	12
Count of READM Measures Better	categorical	5,421	0.0%	7
Count of READM Measures No Different	categorical	5,421	0.0%	13
Count of READM Measures Worse	categorical	5,421	0.0%	9
READM Group Footnote	numeric	5,421	78.8%	3	null_rate
Pt Exp Group Measure Count	categorical	5,421	0.0%	2
Count of Facility Pt Exp Measures	categorical	5,421	0.0%	2
Pt Exp Group Footnote	numeric	5,421	58.2%	3	null_rate
TE Group Measure Count	categorical	5,421	0.0%	2
Count of Facility TE Measures	categorical	5,421	0.0%	13
TE Group Footnote	numeric	5,421	82.9%	3	null_rate high_skew outliers

Fig 1.

Hospital overall rating · Distribution of CMS star ratings — note that 'Not Available' is the single largest bucket (~47%).

Show data table

Top values for Hospital overall rating (6 unique shown, of 6 total).
value	count	share
Not Available	2552	47.1%
3	937	17.3%
4	765	14.1%
2	649	12.0%
5	289	5.3%
1	229	4.2%

Fig 2.

hospital_type · Acute Care Hospitals dominate at ~58%, followed by Critical Access and Psychiatric facilities.

Show data table

Top values for hospital_type (8 unique shown, of 8 total).
value	count	share
Acute Care Hospitals	3120	57.6%
Critical Access Hospitals	1375	25.4%
Psychiatric	626	11.5%
Acute Care - Veterans Administration	132	2.4%
Childrens	94	1.7%
Rural Emergency Hospital	38	0.7%
Acute Care - Department of Defense	32	0.6%
Long-term	4	0.1%

Fig 3.

hospital_ownership · Voluntary non-profit – Private hospitals make up ~42% of the dataset, with Proprietary a distant second.

Show data table

Top values for hospital_ownership (12 unique shown, of 12 total).
value	count	share
Voluntary non-profit - Private	2291	42.3%
Proprietary	1067	19.7%
Government - Hospital District or Authority	521	9.6%
Government - Local	400	7.4%
Voluntary non-profit - Other	361	6.7%
Voluntary non-profit - Church	275	5.1%
Government - State	210	3.9%
Veterans Health Administration	132	2.4%
Physician	74	1.4%
Government - Federal	44	0.8%
Department of Defense	32	0.6%
Tribal	14	0.3%

Fig 4.

state · Geographic spread across 56 states/territories; Texas and California top the list.

Show data table

Top values for state (20 unique shown, of 56 total).
value	count	share
TX	462	8.5%
CA	378	7.0%
FL	221	4.1%
IL	194	3.6%
OH	194	3.6%
NY	191	3.5%
PA	187	3.4%
LA	160	3.0%
GA	149	2.7%
IN	149	2.7%
MI	147	2.7%
WI	142	2.6%
KS	139	2.6%
MN	136	2.5%
OK	135	2.5%
TN	123	2.3%
MO	121	2.2%
NC	120	2.2%
IA	118	2.2%
AZ	106	2.0%

Fig 5.

emergency_services · About 83% of facilities report providing emergency services — useful as a quick filter.

Show data table

Top values for emergency_services (2 unique shown, of 2 total).
value	count	share
Yes	4505	83.1%
No	916	16.9%

Fig 6.

Per-column null rate across the corpus. Columns are ordered by input position.

Show data table

Per-column null rate across the corpus.
column	kind	null %
facility_id	text	0.0%
facility_name	text	0.0%
address	text	0.0%
city	text	0.0%
state	categorical	0.0%
zip_code	numeric	0.0%
county_name	text	0.0%
phone_number	text	0.0%
hospital_type	categorical	0.0%
hospital_ownership	categorical	0.0%
emergency_services	categorical	0.0%
Meets criteria for birthing friendly designation	categorical	58.2%
Hospital overall rating	categorical	0.0%
Hospital overall rating footnote	categorical	52.7%
MORT Group Measure Count	categorical	0.0%
Count of Facility MORT Measures	categorical	0.0%
Count of MORT Measures Better	categorical	0.0%
Count of MORT Measures No Different	categorical	0.0%
Count of MORT Measures Worse	categorical	0.0%
MORT Group Footnote	numeric	67.2%
Safety Group Measure Count	categorical	0.0%
Count of Facility Safety Measures	categorical	0.0%
Count of Safety Measures Better	categorical	0.0%
Count of Safety Measures No Different	categorical	0.0%
Count of Safety Measures Worse	categorical	0.0%
Safety Group Footnote	numeric	61.8%
READM Group Measure Count	categorical	0.0%
Count of Facility READM Measures	categorical	0.0%
Count of READM Measures Better	categorical	0.0%
Count of READM Measures No Different	categorical	0.0%
Count of READM Measures Worse	categorical	0.0%
READM Group Footnote	numeric	78.8%
Pt Exp Group Measure Count	categorical	0.0%
Count of Facility Pt Exp Measures	categorical	0.0%
Pt Exp Group Footnote	numeric	58.2%
TE Group Measure Count	categorical	0.0%
Count of Facility TE Measures	categorical	0.0%
TE Group Footnote	numeric	82.9%

Fig 7.

Pearson correlation across numeric columns (sampled, bounded).

Show data table

Pearson correlation across 6 numeric columns (values clipped to 2 decimals).
	zip_code	MORT Group Footnote	Safety Group Footnote	READM Group Footnote	Pt Exp Group Footnote	TE Group Footnote
zip_code	+1.00	-0.01	-0.04	-0.09	-0.02	+0.03
MORT Group Footnote	-0.01	+1.00	+0.13	+0.24	+0.07	+0.09
Safety Group Footnote	-0.04	+0.13	+1.00	+0.10	+0.27	-0.00
READM Group Footnote	-0.09	+0.24	+0.10	+1.00	+0.06	+0.07
Pt Exp Group Footnote	-0.02	+0.07	+0.27	+0.06	+1.00	+0.03
TE Group Footnote	+0.03	+0.09	-0.00	+0.07	+0.03	+1.00

facility_id text identifier

This is a facility identifier: every one of the 5421 rows holds a unique 6-character, single-token, all-caps code with no nulls or duplicates. The samples are zero-padded numeric strings (e.g. 010001, 010005), suggesting a fixed-width registry code rather than free text.

Treatment: Use as a primary key for joins; do not feed into models.

anthropic:claude-opus-4-7 · confidence high

Out[13]:

saturn.columns["facility_id"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	5,421
len_min	6
len_max	6
len_mean	6
len_median	6
len_p95	6
word_mean	1
word_median	1
n_empty	0
n_duplicates	0
duplicate_rate	0
vocab_size	5,421
readability_flesch_mean	121.2
emoji_rate	0
url_rate	0
one_word_rate	1
allcaps_rate	1
boilerplate_rate	0
alert: near_unique	100.0% of rows are unique strings
alert: one_word	100.0% rows are a single word
alert: allcaps	100.0% rows are all-caps
alert: short_text	95th-percentile length under 20 chars

Fig 8.

Character-length distribution for facility_id.

Show data table

Character-length distribution for facility_id (mean: 6.0).
chars	count
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	5421
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0
6 – 6	0

facility_name text identifier

This column holds healthcare facility names — 'hospital', 'center', 'medical', and 'health' dominate the top words, with typical entries around 4 words and 29 characters. It is near-unique (5286 distinct values across 5421 rows) yet still shows 135 duplicates (2.5%), suggesting either shared facility names across locations or genuine repeats. Notably, 99.3% of values are all-caps, which is a formatting quirk worth normalising.

Treatment: Lowercase and normalise whitespace, then treat as a high-cardinality entity key rather than a model feature.

anthropic:claude-opus-4-7 · confidence high

Out[16]:

saturn.columns["facility_name"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	5,286
len_min	3
len_max	74
len_mean	29.21
len_median	28
len_p95	45
word_mean	3.995
word_median	4
n_empty	0
n_duplicates	135
duplicate_rate	0.0249
vocab_size	3,942
readability_flesch_mean	6.842
emoji_rate	0
url_rate	0
one_word_rate	0.001845
allcaps_rate	0.9932
boilerplate_rate	0
alert: near_unique	97.5% of rows are unique strings
alert: allcaps	99.3% rows are all-caps

Fig 9.

Character-length distribution for facility_name.

Show data table

Character-length distribution for facility_name (mean: 29.20605054418004).
chars	count
3 – 5	2
5 – 7	1
7 – 8	1
8 – 10	11
10 – 12	12
12 – 14	43
14 – 15	91
15 – 17	178
17 – 19	93
19 – 21	258
21 – 23	424
23 – 24	526
24 – 26	572
26 – 28	258
28 – 30	548
30 – 31	525
31 – 33	460
33 – 35	160
35 – 37	308
37 – 38	232
38 – 40	170
40 – 42	149
42 – 44	47
44 – 46	105
46 – 47	82
47 – 49	79
49 – 51	55
51 – 53	7
53 – 54	4
54 – 56	2
56 – 58	1
58 – 60	3
60 – 62	4
62 – 63	1
63 – 65	2
65 – 67	1
67 – 69	2
69 – 70	0
70 – 72	3
72 – 74	1

address text identifier

Free-text street addresses: 5,387 unique values out of 5,421 rows (34 duplicates) with no nulls, averaging 3.75 words and 19 characters. Top tokens are street/road/avenue and cardinal directions, consistent with US-style mailing addresses. Notably 99.2% of values are ALLCAPS, suggesting upstream normalization rather than user free-form entry.

Treatment: Drop or hash for modelling; parse into components if geocoding is needed.

anthropic:claude-opus-4-7 · confidence high

Out[19]:

saturn.columns["address"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	5,387
len_min	7
len_max	50
len_mean	19.37
len_median	19
len_p95	29
word_mean	3.754
word_median	4
n_empty	0
n_duplicates	34
duplicate_rate	0.006272
vocab_size	4,996
readability_flesch_mean	79.27
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	0.9921
boilerplate_rate	0
alert: near_unique	99.4% of rows are unique strings
alert: allcaps	99.2% rows are all-caps

Fig 10.

Character-length distribution for address.

Show data table

Character-length distribution for address (mean: 19.371702637889687).
chars	count
7 – 8	3
8 – 9	8
9 – 10	32
10 – 11	66
11 – 12	115
12 – 13	227
13 – 15	288
15 – 16	380
16 – 17	481
17 – 18	502
18 – 19	541
19 – 20	510
20 – 21	413
21 – 22	691
22 – 23	257
23 – 24	218
24 – 25	151
25 – 26	121
26 – 27	73
27 – 28	72
28 – 30	52
30 – 31	47
31 – 32	34
32 – 33	27
33 – 34	15
34 – 35	12
35 – 36	34
36 – 37	12
37 – 38	8
38 – 39	7
39 – 40	6
40 – 41	3
41 – 42	2
42 – 44	3
44 – 45	0
45 – 46	3
46 – 47	2
47 – 48	0
48 – 49	0
49 – 50	5

city text feature

This is a US city name field, stored almost entirely in uppercase (allcaps_rate 0.994) and dominated by single-word entries (one_word_rate 0.771, word_median 1). With 3049 unique values across 5421 rows and a 0.438 duplicate_rate, common metros like CHICAGO (34), HOUSTON (31), and COLUMBUS (23) recur but the long tail is heavy. Lengths are short and tight (len_mean 8.6, len_max 24), and there are no nulls or empties.

Treatment: Normalize case and pair with state/country before using as a categorical or geocoding key.

anthropic:claude-opus-4-7 · confidence high

Out[22]:

saturn.columns["city"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	3,049
len_min	3
len_max	24
len_mean	8.611
len_median	8
len_p95	13
word_mean	1.241
word_median	1
n_empty	0
n_duplicates	2,372
duplicate_rate	0.4376
vocab_size	2,890
readability_flesch_mean	18.29
emoji_rate	0
url_rate	0
one_word_rate	0.7709
allcaps_rate	0.9943
boilerplate_rate	0
alert: one_word	77.1% rows are a single word
alert: allcaps	99.4% rows are all-caps
alert: short_text	95th-percentile length under 20 chars
alert: duplicates	43.8% duplicate strings

Fig 11.

Character-length distribution for city.

Show data table

Character-length distribution for city (mean: 8.610957387935805).
chars	count
3 – 4	10
4 – 4	122
4 – 5	0
5 – 5	332
5 – 6	0
6 – 6	761
6 – 7	0
7 – 7	895
7 – 8	0
8 – 8	737
8 – 9	0
9 – 9	694
9 – 10	0
10 – 10	705
10 – 11	0
11 – 11	446
11 – 12	0
12 – 12	295
12 – 13	0
13 – 14	191
14 – 14	98
14 – 15	0
15 – 15	50
15 – 16	0
16 – 16	54
16 – 17	0
17 – 17	15
17 – 18	0
18 – 18	6
18 – 19	0
19 – 19	3
19 – 20	0
20 – 20	6
20 – 21	0
21 – 21	0
21 – 22	0
22 – 22	0
22 – 23	0
23 – 23	0
23 – 24	1

state categorical feature

This column holds US state codes (top values TX, CA, FL, IL, OH), with 56 distinct values across 5421 rows and no nulls. Cardinality slightly exceeds the 50 states, suggesting territories or DC are mixed in. Distribution is fairly even — entropy ratio 0.917 and the top state TX accounts for only 8.5% — so no single state dominates.

Treatment: One-hot or target-encode for modelling; verify the 6 extra codes beyond 50 states.

anthropic:claude-opus-4-7 · confidence high

Out[25]:

saturn.columns["state"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	56
top_value	TX
top_rate	0.08522
cardinality	56
entropy	5.328
entropy_ratio	0.9174

Fig 12.

Top values for state.

Show data table

Top values for state (20 unique shown, of 56 total).
value	count	share
TX	462	8.5%
CA	378	7.0%
FL	221	4.1%
IL	194	3.6%
OH	194	3.6%
NY	191	3.5%
PA	187	3.4%
LA	160	3.0%
GA	149	2.7%
IN	149	2.7%
MI	147	2.7%
WI	142	2.6%
KS	139	2.6%
MN	136	2.5%
OK	135	2.5%
TN	123	2.3%
MO	121	2.2%
NC	120	2.2%
IA	118	2.2%
AZ	106	2.0%

zip_code numeric identifier

This is almost certainly a US ZIP code field, stored numerically with values spanning 603 to 99929 across 5421 rows and 4721 unique values. The numeric framing is misleading: the mean of 53780 and std of 27064 reflect ZIP geography, not a continuous quantity, and leading-zero ZIPs (e.g. New England) have likely been truncated given the minimum of 603. No nulls or statistical outliers are reported.

Treatment: Cast to zero-padded 5-character strings and treat as a categorical/geographic key, not a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[28]:

saturn.columns["zip_code"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	4,721
min	603
max	99,929
mean	5.378e+04
median	55,066
std	2.706e+04
q1	32,771
q3	76,104
iqr	43,333
skew	-0.1646
kurtosis	-0.9879
n_outliers	0
outlier_rate	0
zero_rate	0

Fig 13.

Distribution of zip_code. Vertical dash marks the median.

Show data table

Histogram bins for zip_code (median: 55066.0).
bin	count
603 – 3086	161
3086 – 5569	73
5569 – 8052	92
8052 – 1.054e+04	57
1.054e+04 – 1.302e+04	90
1.302e+04 – 1.55e+04	101
1.55e+04 – 1.799e+04	81
1.799e+04 – 2.047e+04	108
2.047e+04 – 2.295e+04	79
2.295e+04 – 2.543e+04	81
2.543e+04 – 2.792e+04	82
2.792e+04 – 3.04e+04	191
3.04e+04 – 3.288e+04	171
3.288e+04 – 3.537e+04	168
3.537e+04 – 3.785e+04	151
3.785e+04 – 4.033e+04	180
4.033e+04 – 4.282e+04	87
4.282e+04 – 4.53e+04	154
4.53e+04 – 4.778e+04	174
4.778e+04 – 5.027e+04	180
5.027e+04 – 5.275e+04	98
5.275e+04 – 5.523e+04	160
5.523e+04 – 5.772e+04	175
5.772e+04 – 6.02e+04	147
6.02e+04 – 6.268e+04	140
6.268e+04 – 6.516e+04	120
6.516e+04 – 6.765e+04	133
6.765e+04 – 7.013e+04	145
7.013e+04 – 7.261e+04	205
7.261e+04 – 7.51e+04	191
7.51e+04 – 7.758e+04	236
7.758e+04 – 8.006e+04	204
8.006e+04 – 8.255e+04	103
8.255e+04 – 8.503e+04	132
8.503e+04 – 8.751e+04	106
8.751e+04 – 9e+04	72
9e+04 – 9.248e+04	140
9.248e+04 – 9.496e+04	146
9.496e+04 – 9.745e+04	159
9.745e+04 – 9.993e+04	148

county_name text feature

This is a US county name field, stored entirely in uppercase (allcaps_rate 1.0) and mostly single-token (one_word_rate 0.87, word_mean 1.14). Across 5421 rows there are 1555 distinct values with a 71.3% duplicate_rate, led by LOS ANGELES (88), JEFFERSON (59), and COOK (59) — consistent with common US county names recurring across states. No nulls or empties, and lengths are short and tight (median 7, max 25).

Treatment: Normalize case and pair with a state column before joining or grouping, since county names repeat across states.

anthropic:claude-opus-4-7 · confidence high

Out[31]:

saturn.columns["county_name"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	1,555
len_min	3
len_max	25
len_mean	7.34
len_median	7
len_p95	11
word_mean	1.135
word_median	1
n_empty	0
n_duplicates	3,866
duplicate_rate	0.7132
vocab_size	1,591
readability_flesch_mean	34.44
emoji_rate	0
url_rate	0
one_word_rate	0.8733
allcaps_rate	1
boilerplate_rate	0
alert: one_word	87.3% rows are a single word
alert: allcaps	100.0% rows are all-caps
alert: short_text	95th-percentile length under 20 chars
alert: duplicates	71.3% duplicate strings

Fig 14.

Character-length distribution for county_name.

Show data table

Character-length distribution for county_name (mean: 7.3399741745065485).
chars	count
3 – 4	36
4 – 4	472
4 – 5	0
5 – 5	659
5 – 6	0
6 – 6	1008
6 – 7	0
7 – 7	956
7 – 8	0
8 – 8	834
8 – 9	575
9 – 10	0
10 – 10	430
10 – 11	0
11 – 11	195
11 – 12	0
12 – 12	101
12 – 13	0
13 – 13	40
13 – 14	0
14 – 15	75
15 – 15	18
15 – 16	0
16 – 16	2
16 – 17	0
17 – 17	5
17 – 18	0
18 – 18	3
18 – 19	0
19 – 20	5
20 – 20	4
20 – 21	0
21 – 21	1
21 – 22	0
22 – 22	0
22 – 23	0
23 – 23	1
23 – 24	0
24 – 24	0
24 – 25	1

phone_number text identifier

Formatted US phone numbers, every value exactly 14 characters and 2 "words" (area code in parentheses plus the rest), with top tokens like (406), (605), (402) confirming the (XXX) prefix pattern. Of 5421 rows, 5383 are unique with 38 duplicates (0.7%) and zero nulls, so the column is near-unique but not a clean key. The allcaps flag is an artifact of digits/punctuation and can be ignored.

Treatment: Drop or hash for PII; do not use as a model feature.

anthropic:claude-opus-4-7 · confidence high

Out[34]:

saturn.columns["phone_number"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	5,383
len_min	14
len_max	14
len_mean	14
len_median	14
len_p95	14
word_mean	2
word_median	2
n_empty	0
n_duplicates	38
duplicate_rate	0.00701
vocab_size	5,550
readability_flesch_mean	120.2
emoji_rate	0
url_rate	0
one_word_rate	0
allcaps_rate	1
boilerplate_rate	0
alert: near_unique	99.3% of rows are unique strings
alert: allcaps	100.0% rows are all-caps
alert: short_text	95th-percentile length under 20 chars

Fig 15.

Character-length distribution for phone_number.

Show data table

Character-length distribution for phone_number (mean: 14.0).
chars	count
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	5421
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0
14 – 14	0

hospital_type categorical feature

Categorical classifier of hospital facility type across 8 distinct values with no nulls. Acute Care Hospitals dominate at 57.6% (3120 of 5421), followed by Critical Access Hospitals (1375) and Psychiatric (626); the long tail is sparse, with Long-term appearing only 4 times. Entropy ratio of 0.55 confirms the distribution is heavily concentrated on the top category.

Treatment: One-hot encode and consider collapsing the four rarest types (<3% each) into an 'Other' bucket.

anthropic:claude-opus-4-7 · confidence high

Out[37]:

saturn.columns["hospital_type"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	8
top_value	Acute Care Hospitals
top_rate	0.5755
cardinality	8
entropy	1.654
entropy_ratio	0.5513

Fig 16.

Top values for hospital_type.

Show data table

Top values for hospital_type (8 unique shown, of 8 total).
value	count	share
Acute Care Hospitals	3120	57.6%
Critical Access Hospitals	1375	25.4%
Psychiatric	626	11.5%
Acute Care - Veterans Administration	132	2.4%
Childrens	94	1.7%
Rural Emergency Hospital	38	0.7%
Acute Care - Department of Defense	32	0.6%
Long-term	4	0.1%

hospital_ownership categorical feature

This column classifies each of 5,421 hospitals by ownership type across 12 categories with no nulls. Voluntary non-profit - Private dominates at 2,291 rows (42.3% top_rate), followed by Proprietary at 1,067, with a long tail down to Physician (74) and Government - Federal (44). Entropy ratio of 0.72 confirms a moderately skewed but usable distribution.

Treatment: One-hot or target-encode; consider grouping rare classes (Physician, Government - Federal) into an 'Other' bucket.

anthropic:claude-opus-4-7 · confidence high

Out[40]:

saturn.columns["hospital_ownership"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	12
top_value	Voluntary non-profit - Private
top_rate	0.4226
cardinality	12
entropy	2.586
entropy_ratio	0.7215

Fig 17.

Top values for hospital_ownership.

Show data table

Top values for hospital_ownership (12 unique shown, of 12 total).
value	count	share
Voluntary non-profit - Private	2291	42.3%
Proprietary	1067	19.7%
Government - Hospital District or Authority	521	9.6%
Government - Local	400	7.4%
Voluntary non-profit - Other	361	6.7%
Voluntary non-profit - Church	275	5.1%
Government - State	210	3.9%
Veterans Health Administration	132	2.4%
Physician	74	1.4%
Government - Federal	44	0.8%
Department of Defense	32	0.6%
Tribal	14	0.3%

emergency_services categorical feature

A binary Yes/No flag indicating whether emergency services are present, with no missing values across 5421 rows. The split is heavily skewed toward 'Yes' at 83.1% (4505 vs 916), giving an entropy ratio of 0.66.

Treatment: Encode as 0/1; consider class imbalance if used as a predictor or target.

anthropic:claude-opus-4-7 · confidence high

Out[43]:

saturn.columns["emergency_services"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	Yes
top_rate	0.831
cardinality	2
entropy	0.6553
entropy_ratio	0.6553

Fig 18.

Top values for emergency_services.

Show data table

Top values for emergency_services (2 unique shown, of 2 total).
value	count	share
Yes	4505	83.1%
No	916	16.9%

Meets criteria for birthing friendly designation categorical feature

This is a binary flag indicating whether a facility meets criteria for a 'birthing friendly' designation, but every non-null value is 'Y' (2264 rows, top_rate 1.0, cardinality 1). The remaining 58.24% of rows are null, so the column effectively encodes presence/absence of the designation rather than a Y/N contrast. Entropy is 0.0, meaning it carries no information beyond the null pattern itself.

Treatment: Recode as a boolean (designated vs. not) from the null mask, or drop as near-constant.

anthropic:claude-opus-4-7 · confidence high

Out[46]:

saturn.columns["Meets criteria for birthing friendly designation"].stats

stat	value
n	5,421
nulls	3,157 (58.2%)
unique	1
top_value	Y
top_rate	1
cardinality	1
entropy	0
entropy_ratio	0
alert: null_rate	58.2% null
alert: imbalance	top value is 100.0% of rows

Fig 19.

Top values for Meets criteria for birthing friendly designation.

Show data table

Top values for Meets criteria for birthing friendly designation (1 unique shown, of 1 total).
value	count	share
Y	2264	41.8%

Hospital overall rating categorical label

This is the CMS-style hospital overall star rating, encoded as strings 1-5 with a 'Not Available' sentinel covering 47.1% of 5,421 rows. The remaining ratings concentrate around 3 (937) and 4 (765), with extremes 5 (289) and 1 (229) much rarer. The dominant 'Not Available' bucket is the headline surprise — nearly half of hospitals have no rating at all.

Treatment: Recode 'Not Available' as missing and treat the remainder as an ordinal 1-5 scale.

anthropic:claude-opus-4-7 · confidence high

Out[49]:

saturn.columns["Hospital overall rating"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	6
top_value	Not Available
top_rate	0.4708
cardinality	6
entropy	2.133
entropy_ratio	0.8252

Fig 20.

Top values for Hospital overall rating.

Show data table

Top values for Hospital overall rating (6 unique shown, of 6 total).
value	count	share
Not Available	2552	47.1%
3	937	17.3%
4	765	14.1%
2	649	12.0%
5	289	5.3%
1	229	4.2%

Hospital overall rating footnote categorical metadata

Footnote codes that qualify the Hospital overall rating, with only 7 distinct values across 5421 rows. Over half the column (52.7%) is null, and among the populated rows code '16' dominates at 65.4% followed by '19' at ~31%, leaving the other codes as long-tail rarities. One compound entry ('16, 23') hints that multiple footnotes can be concatenated in a single cell.

Treatment: Treat as categorical metadata; split compound codes and either one-hot encode or drop given the high null rate.

anthropic:claude-opus-4-7 · confidence high

Out[52]:

saturn.columns["Hospital overall rating footnote"].stats

stat	value
n	5,421
nulls	2,857 (52.7%)
unique	7
top_value	16
top_rate	0.6537
cardinality	7
entropy	1.158
entropy_ratio	0.4126
alert: null_rate	52.7% null

Fig 21.

Top values for Hospital overall rating footnote.

Show data table

Top values for Hospital overall rating footnote (7 unique shown, of 7 total).
value	count	share
16	1676	30.9%
19	795	14.7%
5	47	0.9%
22	32	0.6%
17	7	0.1%
23	5	0.1%
16, 23	2	0.0%

MORT Group Measure Count categorical feature

Binary categorical column where 84.1% of the 5421 rows hold the literal string "7" and the remaining 863 rows are "Not Available". This looks like a fixed mortality-group measure count (always 7 when reported) with explicit missingness encoded as a sentinel string rather than null, so null_rate is 0 despite real absence.

Treatment: Recode "Not Available" to null and convert to a binary availability flag, since the numeric value carries no variance.

anthropic:claude-opus-4-7 · confidence high

Out[55]:

saturn.columns["MORT Group Measure Count"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	7
top_rate	0.8408
cardinality	2
entropy	0.6324
entropy_ratio	0.6324

Fig 22.

Top values for MORT Group Measure Count.

Show data table

Top values for MORT Group Measure Count (2 unique shown, of 2 total).
value	count	share
7	4558	84.1%
Not Available	863	15.9%

Count of Facility MORT Measures categorical feature

Counts of facility mortality measures stored as strings, with 8 distinct values across 5421 rows and no nulls. The dominant category is 'Not Available' at 32.8% (1777 rows), while the remaining values are integers 1-7, with '7' the most common numeric level at 850. High entropy ratio (0.92) indicates the non-missing counts are spread fairly evenly across 1-7.

Treatment: Recode 'Not Available' as missing, cast remaining values to integer, then treat as ordinal.

anthropic:claude-opus-4-7 · confidence high

Out[58]:

saturn.columns["Count of Facility MORT Measures"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	8
top_value	Not Available
top_rate	0.3278
cardinality	8
entropy	2.765
entropy_ratio	0.9217

Fig 23.

Top values for Count of Facility MORT Measures.

Show data table

Top values for Count of Facility MORT Measures (8 unique shown, of 8 total).
value	count	share
Not Available	1777	32.8%
7	850	15.7%
6	587	10.8%
1	495	9.1%
5	455	8.4%
3	444	8.2%
2	420	7.7%
4	393	7.2%

Count of MORT Measures Better categorical feature

Counts the number of mortality measures where a hospital scored 'better than national average', stored as strings 0-7 plus 'Not Available'. The distribution is heavily concentrated at 0 (57.8% of 5421 rows) with another 1777 rows literally encoded as 'Not Available', leaving only ~10% of facilities recording one or more better-than-average measures. Cardinality is just 9 with entropy ratio 0.46, so the signal is sparse and dominated by zeros and missingness sentinels.

Treatment: Recode 'Not Available' to NaN, cast remaining values to integer, and consider binarising (any-better vs none) given the heavy zero mass.

anthropic:claude-opus-4-7 · confidence high

Out[61]:

saturn.columns["Count of MORT Measures Better"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	9
top_value	0
top_rate	0.5785
cardinality	9
entropy	1.453
entropy_ratio	0.4583

Fig 24.

Top values for Count of MORT Measures Better.

Show data table

Top values for Count of MORT Measures Better (9 unique shown, of 9 total).
value	count	share
0	3136	57.8%
Not Available	1777	32.8%
1	297	5.5%
2	133	2.5%
3	53	1.0%
4	15	0.3%
5	7	0.1%
7	2	0.0%
6	1	0.0%

Count of MORT Measures No Different categorical feature

This column appears to be a count (0-7) of mortality measures rated 'no different than national rate' per facility, but it's stored categorically with 'Not Available' as the dominant value at 32.8% of 5421 rows. Among numeric values, the distribution is fairly even across 1-7 (422-672 each), while '0' is rare at only 12 occurrences. The high entropy ratio (0.885) confirms the non-null values spread broadly across the 8 numeric buckets.

Treatment: Coerce numeric strings to integers and treat 'Not Available' as an explicit missing-indicator before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[64]:

saturn.columns["Count of MORT Measures No Different"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	9
top_value	Not Available
top_rate	0.3278
cardinality	9
entropy	2.806
entropy_ratio	0.8852

Fig 25.

Top values for Count of MORT Measures No Different.

Show data table

Top values for Count of MORT Measures No Different (9 unique shown, of 9 total).
value	count	share
Not Available	1777	32.8%
6	672	12.4%
5	541	10.0%
1	513	9.5%
3	509	9.4%
4	503	9.3%
2	472	8.7%
7	422	7.8%
0	12	0.2%

Count of MORT Measures Worse categorical feature

A small-integer count (0-5) of mortality measures on which a hospital performed worse than the national benchmark, stored as strings alongside a 'Not Available' sentinel. The distribution is heavily concentrated: 60.2% are '0' and another 1,777 rows (about a third) are 'Not Available', leaving only 378 hospitals with one or more worse measures. The long tail is extreme — just 11 rows have 3 or more, and a single row reports 5.

Treatment: Cast to integer with 'Not Available' mapped to NaN (or a missing flag), then consider binning to 0/1+ given the sparse tail.

anthropic:claude-opus-4-7 · confidence high

Out[67]:

saturn.columns["Count of MORT Measures Worse"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	7
top_value	0
top_rate	0.6025
cardinality	7
entropy	1.294
entropy_ratio	0.4608

Fig 26.

Top values for Count of MORT Measures Worse.

Show data table

Top values for Count of MORT Measures Worse (7 unique shown, of 7 total).
value	count	share
0	3266	60.2%
Not Available	1777	32.8%
1	310	5.7%
2	57	1.1%
3	7	0.1%
4	3	0.1%
5	1	0.0%

MORT Group Footnote numeric metadata

Despite being typed numeric, this column behaves like a categorical footnote code: only 4 distinct values appear across 5421 rows, ranging discretely from 5 to 23 with a median of 5 and IQR spanning 5 to 19. Two-thirds of rows (null_rate 0.672) are empty, consistent with footnotes attached only to flagged MORT group records. The bimodal-looking spread (kurtosis -1.96, near-zero skew) reinforces that these are reference codes, not measurements.

Treatment: Cast to categorical footnote code and join to a footnote lookup rather than treating as a numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[70]:

saturn.columns["MORT Group Footnote"].stats

stat	value
n	5,421
nulls	3,643 (67.2%)
unique	4
min	5
max	23
mean	11.58
median	5
std	7.057
q1	5
q3	19
iqr	14
skew	0.1488
kurtosis	-1.959
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	67.2% null

Fig 27.

Distribution of MORT Group Footnote. Vertical dash marks the median.

Show data table

Histogram bins for MORT Group Footnote (median: 5.0).
bin	count
5 – 5.45	950
5.45 – 5.9	0
5.9 – 6.35	0
6.35 – 6.8	0
6.8 – 7.25	0
7.25 – 7.7	0
7.7 – 8.15	0
8.15 – 8.6	0
8.6 – 9.05	0
9.05 – 9.5	0
9.5 – 9.95	0
9.95 – 10.4	0
10.4 – 10.85	0
10.85 – 11.3	0
11.3 – 11.75	0
11.75 – 12.2	0
12.2 – 12.65	0
12.65 – 13.1	0
13.1 – 13.55	0
13.55 – 14	0
14 – 14.45	0
14.45 – 14.9	0
14.9 – 15.35	0
15.35 – 15.8	0
15.8 – 16.25	0
16.25 – 16.7	0
16.7 – 17.15	0
17.15 – 17.6	0
17.6 – 18.05	0
18.05 – 18.5	0
18.5 – 18.95	0
18.95 – 19.4	795
19.4 – 19.85	0
19.85 – 20.3	0
20.3 – 20.75	0
20.75 – 21.2	0
21.2 – 21.65	0
21.65 – 22.1	32
22.1 – 22.55	0
22.55 – 23	1

Safety Group Measure Count categorical feature

This is a categorical column with only two values: "8" (84.1% of 5421 rows) and "Not Available" (the remaining 863 rows). Despite the name suggesting a count, the field is effectively a flag indicating whether the safety group has the standard 8 measures or no data at all. The complete absence of any other counts is unusual for a 'count' field.

Treatment: Recode as a binary available/missing indicator before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[73]:

saturn.columns["Safety Group Measure Count"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	8
top_rate	0.8408
cardinality	2
entropy	0.6324
entropy_ratio	0.6324

Fig 28.

Top values for Safety Group Measure Count.

Show data table

Top values for Safety Group Measure Count (2 unique shown, of 2 total).
value	count	share
8	4558	84.1%
Not Available	863	15.9%

Count of Facility Safety Measures categorical feature

This column reports the count of facility safety measures, stored as a categorical with 9 distinct values (1–8 plus 'Not Available'). The dominant value is 'Not Available' at 38.1% of 5421 rows, which means missingness is encoded as a string rather than a null (null_rate is 0.0). Among reported counts, '7' (733) and '2' (519) lead, while '4' (223) is the rarest, giving a fairly even spread (entropy_ratio 0.868).

Treatment: Recode 'Not Available' as null and cast remaining values to integer before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[76]:

saturn.columns["Count of Facility Safety Measures"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	9
top_value	Not Available
top_rate	0.3809
cardinality	9
entropy	2.753
entropy_ratio	0.8684

Fig 29.

Top values for Count of Facility Safety Measures.

Show data table

Top values for Count of Facility Safety Measures (9 unique shown, of 9 total).
value	count	share
Not Available	2065	38.1%
7	733	13.5%
2	519	9.6%
6	460	8.5%
8	453	8.4%
1	443	8.2%
3	290	5.3%
5	235	4.3%
4	223	4.1%

Count of Safety Measures Better categorical feature

This is a categorical column counting how many safety measures improved, with values 0-6 stored as strings alongside a 'Not Available' sentinel. 'Not Available' dominates at 38.1% (2065 of 5421), effectively acting as a hidden null, and the remaining counts decay sharply from 1548 zeros down to just 3 sixes. Entropy ratio of 0.70 across 8 categories reflects this concentration in the low end.

Treatment: Recode 'Not Available' to null and cast remaining levels to integer before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[79]:

saturn.columns["Count of Safety Measures Better"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	8
top_value	Not Available
top_rate	0.3809
cardinality	8
entropy	2.11
entropy_ratio	0.7033

Fig 30.

Top values for Count of Safety Measures Better.

Show data table

Top values for Count of Safety Measures Better (8 unique shown, of 8 total).
value	count	share
Not Available	2065	38.1%
0	1548	28.6%
1	1052	19.4%
2	430	7.9%
3	216	4.0%
4	93	1.7%
5	14	0.3%
6	3	0.1%

Count of Safety Measures No Different categorical feature

This is a low-cardinality count field (10 distinct values) capturing how many safety measures were rated 'No Different', with integer values 0-8 stored as strings alongside a 'Not Available' sentinel. The dominant surprise is that 38.1% of rows (2065/5421) are 'Not Available', making missingness the modal outcome despite a 0% null rate. Among reported counts, the distribution is fairly even across 1-6 (434-656 each), with 0 (20) and 8 (10) being rare extremes.

Treatment: Cast numeric strings to int and recode 'Not Available' as an explicit missing indicator before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[82]:

saturn.columns["Count of Safety Measures No Different"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	10
top_value	Not Available
top_rate	0.3809
cardinality	10
entropy	2.685
entropy_ratio	0.8083

Fig 31.

Top values for Count of Safety Measures No Different.

Show data table

Top values for Count of Safety Measures No Different (10 unique shown, of 10 total).
value	count	share
Not Available	2065	38.1%
5	656	12.1%
2	551	10.2%
4	527	9.7%
1	509	9.4%
6	482	8.9%
3	434	8.0%
7	167	3.1%
0	20	0.4%
8	10	0.2%

Count of Safety Measures Worse categorical feature

This is a low-cardinality count of safety measures rated 'worse', taking only 5 distinct values across 5421 rows with no nulls. Most facilities (54.3%) report 0, and a substantial 2065 rows carry the literal string 'Not Available' rather than a numeric value, mixing missingness into the value domain. Actual counts above 0 are rare (365 ones, 44 twos, 6 threes), giving a heavy zero-and-missing skew.

Treatment: Recode 'Not Available' to NaN, cast remainder to integer, and treat as a low-count ordinal or binary (>0) feature.

anthropic:claude-opus-4-7 · confidence high

Out[85]:

saturn.columns["Count of Safety Measures Worse"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	5
top_value	0
top_rate	0.5425
cardinality	5
entropy	1.338
entropy_ratio	0.5764

Fig 32.

Top values for Count of Safety Measures Worse.

Show data table

Top values for Count of Safety Measures Worse (5 unique shown, of 5 total).
value	count	share
0	2941	54.3%
Not Available	2065	38.1%
1	365	6.7%
2	44	0.8%
3	6	0.1%

Safety Group Footnote numeric metadata

This appears to be a footnote code attached to safety group records, stored numerically but acting as a categorical flag with only 4 distinct values ranging from 5 to 23. The column is sparsely populated, with 61.8% nulls, suggesting footnotes apply only to a minority of rows. The bimodal-leaning distribution (median 5, Q3 19, kurtosis -1.81) reinforces that these are discrete code categories rather than a true measurement.

Treatment: Cast to categorical and treat nulls as 'no footnote' before any modelling.

anthropic:claude-opus-4-7 · confidence high

Out[88]:

saturn.columns["Safety Group Footnote"].stats

stat	value
n	5,421
nulls	3,350 (61.8%)
unique	4
min	5
max	23
mean	10.69
median	5
std	6.95
q1	5
q3	19
iqr	14
skew	0.4116
kurtosis	-1.809
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	61.8% null

Fig 33.

Distribution of Safety Group Footnote. Vertical dash marks the median.

Show data table

Histogram bins for Safety Group Footnote (median: 5.0).
bin	count
5 – 5.45	1238
5.45 – 5.9	0
5.9 – 6.35	0
6.35 – 6.8	0
6.8 – 7.25	0
7.25 – 7.7	0
7.7 – 8.15	0
8.15 – 8.6	0
8.6 – 9.05	0
9.05 – 9.5	0
9.5 – 9.95	0
9.95 – 10.4	0
10.4 – 10.85	0
10.85 – 11.3	0
11.3 – 11.75	0
11.75 – 12.2	0
12.2 – 12.65	0
12.65 – 13.1	0
13.1 – 13.55	0
13.55 – 14	0
14 – 14.45	0
14.45 – 14.9	0
14.9 – 15.35	0
15.35 – 15.8	0
15.8 – 16.25	0
16.25 – 16.7	0
16.7 – 17.15	0
17.15 – 17.6	0
17.6 – 18.05	0
18.05 – 18.5	0
18.5 – 18.95	0
18.95 – 19.4	795
19.4 – 19.85	0
19.85 – 20.3	0
20.3 – 20.75	0
20.75 – 21.2	0
21.2 – 21.65	0
21.65 – 22.1	32
22.1 – 22.55	0
22.55 – 23	6

READM Group Measure Count categorical feature

A binary categorical field that records the count of measures in a readmission group, but stored as strings: 84.08% of 5421 rows are "11" and the remaining 863 rows are "Not Available". With only 2 distinct values and no nulls, this acts as a presence flag rather than a true count.

Treatment: Recode to a boolean availability flag since the numeric value is constant when present.

anthropic:claude-opus-4-7 · confidence high

Out[91]:

saturn.columns["READM Group Measure Count"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	11
top_rate	0.8408
cardinality	2
entropy	0.6324
entropy_ratio	0.6324

Fig 34.

Top values for READM Group Measure Count.

Show data table

Top values for READM Group Measure Count (2 unique shown, of 2 total).
value	count	share
11	4558	84.1%
Not Available	863	15.9%

Count of Facility READM Measures categorical feature

This column appears to be the count of hospital readmission (READM) measures reported per facility, stored as strings rather than integers. Values range across 12 categories from "2" through "11" plus a sizeable "Not Available" bucket that dominates at 21.2% (1,150 of 5,421 rows). Distribution across the numeric levels is fairly even (entropy ratio 0.965), with no nulls but the string "Not Available" effectively acting as missingness.

Treatment: Recode "Not Available" to NaN and cast remaining values to integer before modelling.

anthropic:claude-opus-4-7 · confidence high

Out[94]:

saturn.columns["Count of Facility READM Measures"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	12
top_value	Not Available
top_rate	0.2121
cardinality	12
entropy	3.459
entropy_ratio	0.965

Fig 35.

Top values for Count of Facility READM Measures.

Show data table

Top values for Count of Facility READM Measures (12 unique shown, of 12 total).
value	count	share
Not Available	1150	21.2%
11	498	9.2%
8	466	8.6%
6	438	8.1%
9	425	7.8%
3	375	6.9%
2	374	6.9%
7	358	6.6%
5	347	6.4%
4	335	6.2%
10	333	6.1%
1	322	5.9%

Count of READM Measures Better categorical feature

This column counts how many readmission measures a provider scored 'better' on, stored as strings ranging from '0' to '5' alongside a 'Not Available' sentinel. The distribution is heavily concentrated at '0' (61.5%, 3332 of 5421 rows), and 'Not Available' is the second most common value at 1150 rows, exceeding any nonzero count. Only 41 rows score 3 or higher, so meaningful positive signal is rare.

Treatment: Cast numerics to int, encode 'Not Available' as a missing flag, and consider collapsing the long tail (3-5) into a single bucket.

anthropic:claude-opus-4-7 · confidence high

Out[97]:

saturn.columns["Count of READM Measures Better"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	7
top_value	0
top_rate	0.6146
cardinality	7
entropy	1.51
entropy_ratio	0.5379

Fig 36.

Top values for Count of READM Measures Better.

Show data table

Top values for Count of READM Measures Better (7 unique shown, of 7 total).
value	count	share
0	3332	61.5%
Not Available	1150	21.2%
1	737	13.6%
2	161	3.0%
3	28	0.5%
4	10	0.2%
5	3	0.1%

Count of READM Measures No Different categorical feature

This is a count of hospital readmission measures where performance was 'no different' than national, stored as strings ranging '1'–'9' (plus likely higher) alongside a 'Not Available' sentinel. The sentinel dominates at 21.2% (1150 of 5421), and the 13 distinct values are spread fairly evenly (entropy ratio 0.92), with numeric counts each landing in the 370–500 range.

Treatment: Cast to integer after replacing 'Not Available' with NaN, then treat as ordinal numeric.

anthropic:claude-opus-4-7 · confidence high

Out[100]:

saturn.columns["Count of READM Measures No Different"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	13
top_value	Not Available
top_rate	0.2121
cardinality	13
entropy	3.408
entropy_ratio	0.9211

Fig 37.

Top values for Count of READM Measures No Different.

Show data table

Top values for Count of READM Measures No Different (13 unique shown, of 13 total).
value	count	share
Not Available	1150	21.2%
7	497	9.2%
8	491	9.1%
6	480	8.9%
2	428	7.9%
3	428	7.9%
5	426	7.9%
9	418	7.7%
4	398	7.3%
1	372	6.9%
10	249	4.6%
11	81	1.5%
0	3	0.1%

Count of READM Measures Worse categorical feature

This appears to be a count of readmission measures rated 'worse' per hospital, stored as a categorical/string column with 9 distinct values ranging from '0' to '7' plus 'Not Available'. The distribution is heavily concentrated at zero (55.1% of 5,421 rows) and 'Not Available' accounts for 1,150 rows, which is a substantial missing-data signal masquerading as a category. Higher counts are rare, with only 31 rows at 4 or above.

Treatment: Cast numeric levels to integer, recode 'Not Available' as null, then treat as ordinal or count feature.

anthropic:claude-opus-4-7 · confidence high

Out[103]:

saturn.columns["Count of READM Measures Worse"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	9
top_value	0
top_rate	0.5512
cardinality	9
entropy	1.758
entropy_ratio	0.5545

Fig 38.

Top values for Count of READM Measures Worse.

Show data table

Top values for Count of READM Measures Worse (9 unique shown, of 9 total).
value	count	share
0	2988	55.1%
Not Available	1150	21.2%
1	839	15.5%
2	308	5.7%
3	105	1.9%
4	26	0.5%
6	2	0.0%
5	2	0.0%
7	1	0.0%

READM Group Footnote numeric metadata

This appears to be a footnote/flag code attached to a readmission metric, encoded numerically with only 3 distinct values (5, 19, and 22 based on the quartiles and max). The column is overwhelmingly empty at a 78.79% null rate, meaning footnotes apply to a small minority of records. Despite being stored as numeric, the values are categorical codes — the mean of 15.15 and std of 6.37 have no real interpretive meaning.

Treatment: Cast to categorical footnote codes and treat nulls as 'no footnote' rather than imputing.

anthropic:claude-opus-4-7 · confidence high

Out[106]:

saturn.columns["READM Group Footnote"].stats

stat	value
n	5,421
nulls	4,271 (78.8%)
unique	3
min	5
max	22
mean	15.15
median	19
std	6.366
q1	5
q3	19
iqr	14
skew	-0.9528
kurtosis	-1.051
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	78.8% null

Fig 39.

Distribution of READM Group Footnote. Vertical dash marks the median.

Show data table

Histogram bins for READM Group Footnote (median: 19.0).
bin	count
5 – 5.515	323
5.515 – 6.03	0
6.03 – 6.545	0
6.545 – 7.061	0
7.061 – 7.576	0
7.576 – 8.091	0
8.091 – 8.606	0
8.606 – 9.121	0
9.121 – 9.636	0
9.636 – 10.15	0
10.15 – 10.67	0
10.67 – 11.18	0
11.18 – 11.7	0
11.7 – 12.21	0
12.21 – 12.73	0
12.73 – 13.24	0
13.24 – 13.76	0
13.76 – 14.27	0
14.27 – 14.79	0
14.79 – 15.3	0
15.3 – 15.82	0
15.82 – 16.33	0
16.33 – 16.85	0
16.85 – 17.36	0
17.36 – 17.88	0
17.88 – 18.39	0
18.39 – 18.91	0
18.91 – 19.42	795
19.42 – 19.94	0
19.94 – 20.45	0
20.45 – 20.97	0
20.97 – 21.48	0
21.48 – 22	32

Pt Exp Group Measure Count categorical metadata

Binary categorical with only two values: "8" (84.1% of 5421 rows) and "Not Available" (the remaining 863). The literal string "Not Available" stands in for missing data, so the column is effectively a constant of 8 with a 15.9% missingness flag rather than a true feature. Entropy ratio of 0.63 confirms the low information content.

Treatment: Recode "Not Available" to null and drop, or keep only as a binary missingness indicator.

anthropic:claude-opus-4-7 · confidence high

Out[109]:

saturn.columns["Pt Exp Group Measure Count"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	8
top_rate	0.8408
cardinality	2
entropy	0.6324
entropy_ratio	0.6324

Fig 40.

Top values for Pt Exp Group Measure Count.

Show data table

Top values for Pt Exp Group Measure Count (2 unique shown, of 2 total).
value	count	share
8	4558	84.1%
Not Available	863	15.9%

Count of Facility Pt Exp Measures categorical feature

This column reports the count of facility patient experience measures, but it is effectively binary: every one of the 5421 rows is either the literal string "8" (58.2%) or "Not Available" (41.8%). The high entropy ratio of 0.98 reflects that near 50/50 split rather than any real numeric variation. The surprise is that a supposed count has only one non-null numeric level, so it carries no granularity beyond a presence/absence flag.

Treatment: Recode as a binary has_measures flag (8 vs Not Available) rather than treating as a numeric count.

anthropic:claude-opus-4-7 · confidence high

Out[112]:

saturn.columns["Count of Facility Pt Exp Measures"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	8
top_rate	0.5818
cardinality	2
entropy	0.9806
entropy_ratio	0.9806

Fig 41.

Top values for Count of Facility Pt Exp Measures.

Show data table

Top values for Count of Facility Pt Exp Measures (2 unique shown, of 2 total).
value	count	share
8	3154	58.2%
Not Available	2267	41.8%

Pt Exp Group Footnote numeric metadata

This is a footnote code attached to a 'Pt Exp Group' (likely patient experience group) metric, encoded numerically but with only 3 distinct values (5, ~19, 22) across 5421 rows. It is null 58.18% of the time, which is expected for footnote columns that flag exceptions on a minority of rows. The bimodal-looking spread (median 5, Q3 19, max 22) and negative kurtosis (-1.66) confirm it behaves as a sparse categorical flag rather than a continuous measure.

Treatment: Cast to categorical footnote codes and treat nulls as 'no footnote' rather than imputing numerically.

anthropic:claude-opus-4-7 · confidence high

Out[115]:

saturn.columns["Pt Exp Group Footnote"].stats

stat	value
n	5,421
nulls	3,154 (58.2%)
unique	3
min	5
max	22
mean	10.15
median	5
std	6.806
q1	5
q3	19
iqr	14
skew	0.571
kurtosis	-1.658
n_outliers	0
outlier_rate	0
zero_rate	0
alert: null_rate	58.2% null

Fig 42.

Distribution of Pt Exp Group Footnote. Vertical dash marks the median.

Show data table

Histogram bins for Pt Exp Group Footnote (median: 5.0).
bin	count
5 – 5.425	1440
5.425 – 5.85	0
5.85 – 6.275	0
6.275 – 6.7	0
6.7 – 7.125	0
7.125 – 7.55	0
7.55 – 7.975	0
7.975 – 8.4	0
8.4 – 8.825	0
8.825 – 9.25	0
9.25 – 9.675	0
9.675 – 10.1	0
10.1 – 10.52	0
10.52 – 10.95	0
10.95 – 11.38	0
11.38 – 11.8	0
11.8 – 12.22	0
12.22 – 12.65	0
12.65 – 13.07	0
13.07 – 13.5	0
13.5 – 13.92	0
13.92 – 14.35	0
14.35 – 14.78	0
14.78 – 15.2	0
15.2 – 15.62	0
15.62 – 16.05	0
16.05 – 16.48	0
16.48 – 16.9	0
16.9 – 17.32	0
17.32 – 17.75	0
17.75 – 18.17	0
18.17 – 18.6	0
18.6 – 19.02	795
19.02 – 19.45	0
19.45 – 19.88	0
19.88 – 20.3	0
20.3 – 20.73	0
20.73 – 21.15	0
21.15 – 21.57	0
21.57 – 22	32

TE Group Measure Count categorical metadata

A binary categorical field where 84.1% of the 5421 rows take the literal string "12" and the remaining 863 rows are "Not Available". Despite the name suggesting a count, it is stored as a string with only 2 distinct values and no nulls, so "Not Available" is functioning as an in-band missing marker rather than a true category.

Treatment: Recode "Not Available" to null and collapse to a boolean indicator, since the only real value is 12.

anthropic:claude-opus-4-7 · confidence high

Out[118]:

saturn.columns["TE Group Measure Count"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	2
top_value	12
top_rate	0.8408
cardinality	2
entropy	0.6324
entropy_ratio	0.6324

Fig 43.

Top values for TE Group Measure Count.

Show data table

Top values for TE Group Measure Count (2 unique shown, of 2 total).
value	count	share
12	4558	84.1%
Not Available	863	15.9%

Count of Facility TE Measures categorical feature

This column reports the count of Facility TE (Timely & Effective) Measures per row, stored as strings with 13 distinct values across 5,421 records. The most common value is the sentinel "Not Available" at 17.1% (928 rows), with numeric counts ranging at least from 4 to 12 mixed in as text. Entropy ratio of 0.93 indicates the non-null values are spread fairly evenly across the count buckets.

Treatment: Coerce to integer with "Not Available" mapped to NaN, then treat as an ordinal/numeric feature.

anthropic:claude-opus-4-7 · confidence high

Out[121]:

saturn.columns["Count of Facility TE Measures"].stats

stat	value
n	5,421
nulls	0 (0.0%)
unique	13
top_value	Not Available
top_rate	0.1712
cardinality	13
entropy	3.458
entropy_ratio	0.9343

Fig 44.

Top values for Count of Facility TE Measures.

Show data table

Top values for Count of Facility TE Measures (13 unique shown, of 13 total).
value	count	share
Not Available	928	17.1%
10	759	14.0%
11	724	13.4%
9	543	10.0%
8	391	7.2%
5	351	6.5%
12	347	6.4%
6	337	6.2%
7	284	5.2%
4	272	5.0%
3	269	5.0%
2	163	3.0%
1	53	1.0%

TE Group Footnote numeric metadata

This appears to be a footnote code column for a 'TE Group' classification, stored numerically but functioning as a categorical reference (only 3 unique values across 5421 rows). It is mostly empty (82.88% null), and among populated rows the value 19 dominates so heavily that q1, median, and q3 all equal 19, producing a zero IQR and a strong negative skew of -2.43. The 133 flagged outliers (14.3%) are simply the minority codes (down to 5) being measured against a degenerate distribution.

Treatment: Cast to categorical footnote code and exclude from numeric modelling.

anthropic:claude-opus-4-7 · confidence high

Out[124]:

saturn.columns["TE Group Footnote"].stats

stat	value
n	5,421
nulls	4,493 (82.9%)
unique	3
min	5
max	22
mean	17.58
median	19
std	4.432
q1	19
q3	19
iqr	0
skew	-2.43
kurtosis	4.12
n_outliers	133
outlier_rate	0.1433
zero_rate	0
alert: null_rate	82.9% null
alert: high_skew	skew=-2.43
alert: outliers	14.3% rows beyond 1.5 IQR

Fig 45.

Distribution of TE Group Footnote. Vertical dash marks the median.

Show data table

Histogram bins for TE Group Footnote (median: 19.0).
bin	count
5 – 5.567	101
5.567 – 6.133	0
6.133 – 6.7	0
6.7 – 7.267	0
7.267 – 7.833	0
7.833 – 8.4	0
8.4 – 8.967	0
8.967 – 9.533	0
9.533 – 10.1	0
10.1 – 10.67	0
10.67 – 11.23	0
11.23 – 11.8	0
11.8 – 12.37	0
12.37 – 12.93	0
12.93 – 13.5	0
13.5 – 14.07	0
14.07 – 14.63	0
14.63 – 15.2	0
15.2 – 15.77	0
15.77 – 16.33	0
16.33 – 16.9	0
16.9 – 17.47	0
17.47 – 18.03	0
18.03 – 18.6	0
18.6 – 19.17	795
19.17 – 19.73	0
19.73 – 20.3	0
20.3 – 20.87	0
20.87 – 21.43	0
21.43 – 22	32

Overview

Summary confidence: high

facility_id text identifier

facility_name text identifier

address text identifier

city text feature

state categorical feature

zip_code numeric identifier

county_name text feature

phone_number text identifier

hospital_type categorical feature

hospital_ownership categorical feature

emergency_services categorical feature

Meets criteria for birthing friendly designation categorical feature

Hospital overall rating categorical label

Hospital overall rating footnote categorical metadata

MORT Group Measure Count categorical feature

Count of Facility MORT Measures categorical feature

Count of MORT Measures Better categorical feature

Count of MORT Measures No Different categorical feature

Count of MORT Measures Worse categorical feature

MORT Group Footnote numeric metadata

Safety Group Measure Count categorical feature

Count of Facility Safety Measures categorical feature

Count of Safety Measures Better categorical feature

Count of Safety Measures No Different categorical feature

Count of Safety Measures Worse categorical feature

Safety Group Footnote numeric metadata

READM Group Measure Count categorical feature

Count of Facility READM Measures categorical feature

Count of READM Measures Better categorical feature

Count of READM Measures No Different categorical feature

Count of READM Measures Worse categorical feature

READM Group Footnote numeric metadata

Pt Exp Group Measure Count categorical metadata

Count of Facility Pt Exp Measures categorical feature

Pt Exp Group Footnote numeric metadata

TE Group Measure Count categorical metadata

Count of Facility TE Measures categorical feature

TE Group Footnote numeric metadata

How to cite