cms cms hospitals 20260121
Reading
This dataset catalogs 5,421 U.S. hospitals with 38 columns covering location (city, county, state, ZIP), facility identity, ownership and type, and CMS quality-measure rollups (mortality, readmission, safety, patient experience, timely & effective care). The most interesting structural story is the quality-rating coverage: 'Hospital overall rating' is 'Not Available' for 47% of hospitals, and the various footnote columns are null for 53–83% of rows, so any analysis of star ratings has to handle a large missing slice. On the categorical side, the mix is dominated by Acute Care Hospitals (~58%) and Voluntary non-profit – Private ownership (~42%), with Texas and California leading state counts. The 'Meets criteria for birthing friendly designation' field only ever takes the value 'Y' (58% null, no 'N'), so it is effectively a flag rather than a comparator.
citing: row_count · column_count · Hospital overall rating.top_values · Hospital overall rating.top_rate · hospital_type.top_values · hospital_ownership.top_values · state.top_values · Meets criteria for birthing friendly designation.null_rate · Meets criteria for birthing friendly designation.top_value · MORT Group Footnote.null_rate · READM Group Footnote.null_rate · Safety Group Footnote.null_rate · Pt Exp Group Footnote.null_rate · emergency_services.top_values
Charts the summary said to look at first
Show data table
| value | count | share |
|---|---|---|
| Not Available | 2552 | 47.1% |
| 3 | 937 | 17.3% |
| 4 | 765 | 14.1% |
| 2 | 649 | 12.0% |
| 5 | 289 | 5.3% |
| 1 | 229 | 4.2% |
Show data table
| value | count | share |
|---|---|---|
| Acute Care Hospitals | 3120 | 57.6% |
| Critical Access Hospitals | 1375 | 25.4% |
| Psychiatric | 626 | 11.5% |
| Acute Care - Veterans Administration | 132 | 2.4% |
| Childrens | 94 | 1.7% |
| Rural Emergency Hospital | 38 | 0.7% |
| Acute Care - Department of Defense | 32 | 0.6% |
| Long-term | 4 | 0.1% |
Show data table
| value | count | share |
|---|---|---|
| Voluntary non-profit - Private | 2291 | 42.3% |
| Proprietary | 1067 | 19.7% |
| Government - Hospital District or Authority | 521 | 9.6% |
| Government - Local | 400 | 7.4% |
| Voluntary non-profit - Other | 361 | 6.7% |
| Voluntary non-profit - Church | 275 | 5.1% |
| Government - State | 210 | 3.9% |
| Veterans Health Administration | 132 | 2.4% |
| Physician | 74 | 1.4% |
| Government - Federal | 44 | 0.8% |
| Department of Defense | 32 | 0.6% |
| Tribal | 14 | 0.3% |
Show data table
| value | count | share |
|---|---|---|
| TX | 462 | 8.5% |
| CA | 378 | 7.0% |
| FL | 221 | 4.1% |
| IL | 194 | 3.6% |
| OH | 194 | 3.6% |
| NY | 191 | 3.5% |
| PA | 187 | 3.4% |
| LA | 160 | 3.0% |
| GA | 149 | 2.7% |
| IN | 149 | 2.7% |
| MI | 147 | 2.7% |
| WI | 142 | 2.6% |
| KS | 139 | 2.6% |
| MN | 136 | 2.5% |
| OK | 135 | 2.5% |
| TN | 123 | 2.3% |
| MO | 121 | 2.2% |
| NC | 120 | 2.2% |
| IA | 118 | 2.2% |
| AZ | 106 | 2.0% |
Show data table
| value | count | share |
|---|---|---|
| Yes | 4505 | 83.1% |
| No | 916 | 16.9% |
Schema
38 columns| Alerts | ||||
|---|---|---|---|---|
| facility_id | text | 0.0% | 5,421 |
near_unique
one_word
allcaps
short_text
|
| facility_name | text | 0.0% | 5,286 |
near_unique
allcaps
|
| address | text | 0.0% | 5,387 |
near_unique
allcaps
|
| city | text | 0.0% | 3,049 |
one_word
allcaps
short_text
duplicates
|
| state | categorical | 0.0% | 56 |
|
| zip_code | numeric | 0.0% | 4,721 |
|
| county_name | text | 0.0% | 1,555 |
one_word
allcaps
short_text
duplicates
|
| phone_number | text | 0.0% | 5,383 |
near_unique
allcaps
short_text
|
| hospital_type | categorical | 0.0% | 8 |
|
| hospital_ownership | categorical | 0.0% | 12 |
|
| emergency_services | categorical | 0.0% | 2 |
|
| Meets criteria for birthing friendly designation | categorical | 58.2% | 1 |
null_rate
imbalance
|
| Hospital overall rating | categorical | 0.0% | 6 |
|
| Hospital overall rating footnote | categorical | 52.7% | 7 |
null_rate
|
| MORT Group Measure Count | categorical | 0.0% | 2 |
|
| Count of Facility MORT Measures | categorical | 0.0% | 8 |
|
| Count of MORT Measures Better | categorical | 0.0% | 9 |
|
| Count of MORT Measures No Different | categorical | 0.0% | 9 |
|
| Count of MORT Measures Worse | categorical | 0.0% | 7 |
|
| MORT Group Footnote | numeric | 67.2% | 4 |
null_rate
|
| Safety Group Measure Count | categorical | 0.0% | 2 |
|
| Count of Facility Safety Measures | categorical | 0.0% | 9 |
|
| Count of Safety Measures Better | categorical | 0.0% | 8 |
|
| Count of Safety Measures No Different | categorical | 0.0% | 10 |
|
| Count of Safety Measures Worse | categorical | 0.0% | 5 |
|
| Safety Group Footnote | numeric | 61.8% | 4 |
null_rate
|
| READM Group Measure Count | categorical | 0.0% | 2 |
|
| Count of Facility READM Measures | categorical | 0.0% | 12 |
|
| Count of READM Measures Better | categorical | 0.0% | 7 |
|
| Count of READM Measures No Different | categorical | 0.0% | 13 |
|
| Count of READM Measures Worse | categorical | 0.0% | 9 |
|
| READM Group Footnote | numeric | 78.8% | 3 |
null_rate
|
| Pt Exp Group Measure Count | categorical | 0.0% | 2 |
|
| Count of Facility Pt Exp Measures | categorical | 0.0% | 2 |
|
| Pt Exp Group Footnote | numeric | 58.2% | 3 |
null_rate
|
| TE Group Measure Count | categorical | 0.0% | 2 |
|
| Count of Facility TE Measures | categorical | 0.0% | 13 |
|
| TE Group Footnote | numeric | 82.9% | 3 |
null_rate
high_skew
outliers
|
facility_id
text identifier near_unique one_word allcaps short_textThis is a facility identifier: every one of the 5421 rows holds a unique 6-character, single-token, all-caps code with no nulls or duplicates. The samples are zero-padded numeric strings (e.g. 010001, 010005), suggesting a fixed-width registry code rather than free text. Treatment: Use as a primary key for joins; do not feed into models.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 5,421
- len_min
- 6
- len_max
- 6
- len_mean
- 6
- len_median
- 6
- len_p95
- 6
- word_mean
- 1
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 0
- duplicate_rate
- 0
- vocab_size
- 5,421
- readability_flesch_mean
- 121.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 1
- allcaps_rate
- 1
- boilerplate_rate
- 0
facility_name
text identifier near_unique allcapsThis column holds healthcare facility names — 'hospital', 'center', 'medical', and 'health' dominate the top words, with typical entries around 4 words and 29 characters. It is near-unique (5286 distinct values across 5421 rows) yet still shows 135 duplicates (2.5%), suggesting either shared facility names across locations or genuine repeats. Notably, 99.3% of values are all-caps, which is a formatting quirk worth normalising. Treatment: Lowercase and normalise whitespace, then treat as a high-cardinality entity key rather than a model feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 5,286
- len_min
- 3
- len_max
- 74
- len_mean
- 29.21
- len_median
- 28
- len_p95
- 45
- word_mean
- 3.995
- word_median
- 4
- n_empty
- 0
- n_duplicates
- 135
- duplicate_rate
- 0.0249
- vocab_size
- 3,942
- readability_flesch_mean
- 6.842
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.001845
- allcaps_rate
- 0.9932
- boilerplate_rate
- 0
address
text identifier near_unique allcapsFree-text street addresses: 5,387 unique values out of 5,421 rows (34 duplicates) with no nulls, averaging 3.75 words and 19 characters. Top tokens are street/road/avenue and cardinal directions, consistent with US-style mailing addresses. Notably 99.2% of values are ALLCAPS, suggesting upstream normalization rather than user free-form entry. Treatment: Drop or hash for modelling; parse into components if geocoding is needed.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 5,387
- len_min
- 7
- len_max
- 50
- len_mean
- 19.37
- len_median
- 19
- len_p95
- 29
- word_mean
- 3.754
- word_median
- 4
- n_empty
- 0
- n_duplicates
- 34
- duplicate_rate
- 0.006272
- vocab_size
- 4,996
- readability_flesch_mean
- 79.27
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 0.9921
- boilerplate_rate
- 0
city
text feature one_word allcaps short_text duplicatesThis is a US city name field, stored almost entirely in uppercase (allcaps_rate 0.994) and dominated by single-word entries (one_word_rate 0.771, word_median 1). With 3049 unique values across 5421 rows and a 0.438 duplicate_rate, common metros like CHICAGO (34), HOUSTON (31), and COLUMBUS (23) recur but the long tail is heavy. Lengths are short and tight (len_mean 8.6, len_max 24), and there are no nulls or empties. Treatment: Normalize case and pair with state/country before using as a categorical or geocoding key.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 3,049
- len_min
- 3
- len_max
- 24
- len_mean
- 8.611
- len_median
- 8
- len_p95
- 13
- word_mean
- 1.241
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 2,372
- duplicate_rate
- 0.4376
- vocab_size
- 2,890
- readability_flesch_mean
- 18.29
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.7709
- allcaps_rate
- 0.9943
- boilerplate_rate
- 0
state
categorical featureThis column holds US state codes (top values TX, CA, FL, IL, OH), with 56 distinct values across 5421 rows and no nulls. Cardinality slightly exceeds the 50 states, suggesting territories or DC are mixed in. Distribution is fairly even — entropy ratio 0.917 and the top state TX accounts for only 8.5% — so no single state dominates. Treatment: One-hot or target-encode for modelling; verify the 6 extra codes beyond 50 states.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 56
- top_value
- TX
- top_rate
- 0.08522
- cardinality
- 56
- entropy
- 5.328
- entropy_ratio
- 0.9174
zip_code
numeric identifierThis is almost certainly a US ZIP code field, stored numerically with values spanning 603 to 99929 across 5421 rows and 4721 unique values. The numeric framing is misleading: the mean of 53780 and std of 27064 reflect ZIP geography, not a continuous quantity, and leading-zero ZIPs (e.g. New England) have likely been truncated given the minimum of 603. No nulls or statistical outliers are reported. Treatment: Cast to zero-padded 5-character strings and treat as a categorical/geographic key, not a numeric feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 4,721
- min
- 603
- max
- 99,929
- mean
- 5.378e+04
- median
- 55,066
- std
- 2.706e+04
- q1
- 32,771
- q3
- 76,104
- iqr
- 43,333
- skew
- -0.1646
- kurtosis
- -0.9879
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
county_name
text feature one_word allcaps short_text duplicatesThis is a US county name field, stored entirely in uppercase (allcaps_rate 1.0) and mostly single-token (one_word_rate 0.87, word_mean 1.14). Across 5421 rows there are 1555 distinct values with a 71.3% duplicate_rate, led by LOS ANGELES (88), JEFFERSON (59), and COOK (59) — consistent with common US county names recurring across states. No nulls or empties, and lengths are short and tight (median 7, max 25). Treatment: Normalize case and pair with a state column before joining or grouping, since county names repeat across states.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 1,555
- len_min
- 3
- len_max
- 25
- len_mean
- 7.34
- len_median
- 7
- len_p95
- 11
- word_mean
- 1.135
- word_median
- 1
- n_empty
- 0
- n_duplicates
- 3,866
- duplicate_rate
- 0.7132
- vocab_size
- 1,591
- readability_flesch_mean
- 34.44
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0.8733
- allcaps_rate
- 1
- boilerplate_rate
- 0
phone_number
text identifier near_unique allcaps short_textFormatted US phone numbers, every value exactly 14 characters and 2 "words" (area code in parentheses plus the rest), with top tokens like (406), (605), (402) confirming the (XXX) prefix pattern. Of 5421 rows, 5383 are unique with 38 duplicates (0.7%) and zero nulls, so the column is near-unique but not a clean key. The allcaps flag is an artifact of digits/punctuation and can be ignored. Treatment: Drop or hash for PII; do not use as a model feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 5,383
- len_min
- 14
- len_max
- 14
- len_mean
- 14
- len_median
- 14
- len_p95
- 14
- word_mean
- 2
- word_median
- 2
- n_empty
- 0
- n_duplicates
- 38
- duplicate_rate
- 0.00701
- vocab_size
- 5,550
- readability_flesch_mean
- 120.2
- emoji_rate
- 0
- url_rate
- 0
- one_word_rate
- 0
- allcaps_rate
- 1
- boilerplate_rate
- 0
hospital_type
categorical featureCategorical classifier of hospital facility type across 8 distinct values with no nulls. Acute Care Hospitals dominate at 57.6% (3120 of 5421), followed by Critical Access Hospitals (1375) and Psychiatric (626); the long tail is sparse, with Long-term appearing only 4 times. Entropy ratio of 0.55 confirms the distribution is heavily concentrated on the top category. Treatment: One-hot encode and consider collapsing the four rarest types (<3% each) into an 'Other' bucket.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 8
- top_value
- Acute Care Hospitals
- top_rate
- 0.5755
- cardinality
- 8
- entropy
- 1.654
- entropy_ratio
- 0.5513
hospital_ownership
categorical featureThis column classifies each of 5,421 hospitals by ownership type across 12 categories with no nulls. Voluntary non-profit - Private dominates at 2,291 rows (42.3% top_rate), followed by Proprietary at 1,067, with a long tail down to Physician (74) and Government - Federal (44). Entropy ratio of 0.72 confirms a moderately skewed but usable distribution. Treatment: One-hot or target-encode; consider grouping rare classes (Physician, Government - Federal) into an 'Other' bucket.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 12
- top_value
- Voluntary non-profit - Private
- top_rate
- 0.4226
- cardinality
- 12
- entropy
- 2.586
- entropy_ratio
- 0.7215
emergency_services
categorical featureA binary Yes/No flag indicating whether emergency services are present, with no missing values across 5421 rows. The split is heavily skewed toward 'Yes' at 83.1% (4505 vs 916), giving an entropy ratio of 0.66. Treatment: Encode as 0/1; consider class imbalance if used as a predictor or target.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- Yes
- top_rate
- 0.831
- cardinality
- 2
- entropy
- 0.6553
- entropy_ratio
- 0.6553
Meets criteria for birthing friendly designation
categorical feature null_rate imbalanceThis is a binary flag indicating whether a facility meets criteria for a 'birthing friendly' designation, but every non-null value is 'Y' (2264 rows, top_rate 1.0, cardinality 1). The remaining 58.24% of rows are null, so the column effectively encodes presence/absence of the designation rather than a Y/N contrast. Entropy is 0.0, meaning it carries no information beyond the null pattern itself. Treatment: Recode as a boolean (designated vs. not) from the null mask, or drop as near-constant.
- n
- 5,421
- nulls
- 3,157 (58.2%)
- unique
- 1
- top_value
- Y
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
Hospital overall rating
categorical labelThis is the CMS-style hospital overall star rating, encoded as strings 1-5 with a 'Not Available' sentinel covering 47.1% of 5,421 rows. The remaining ratings concentrate around 3 (937) and 4 (765), with extremes 5 (289) and 1 (229) much rarer. The dominant 'Not Available' bucket is the headline surprise — nearly half of hospitals have no rating at all. Treatment: Recode 'Not Available' as missing and treat the remainder as an ordinal 1-5 scale.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 6
- top_value
- Not Available
- top_rate
- 0.4708
- cardinality
- 6
- entropy
- 2.133
- entropy_ratio
- 0.8252
Hospital overall rating footnote
categorical metadata null_rateFootnote codes that qualify the Hospital overall rating, with only 7 distinct values across 5421 rows. Over half the column (52.7%) is null, and among the populated rows code '16' dominates at 65.4% followed by '19' at ~31%, leaving the other codes as long-tail rarities. One compound entry ('16, 23') hints that multiple footnotes can be concatenated in a single cell. Treatment: Treat as categorical metadata; split compound codes and either one-hot encode or drop given the high null rate.
- n
- 5,421
- nulls
- 2,857 (52.7%)
- unique
- 7
- top_value
- 16
- top_rate
- 0.6537
- cardinality
- 7
- entropy
- 1.158
- entropy_ratio
- 0.4126
MORT Group Measure Count
categorical featureBinary categorical column where 84.1% of the 5421 rows hold the literal string "7" and the remaining 863 rows are "Not Available". This looks like a fixed mortality-group measure count (always 7 when reported) with explicit missingness encoded as a sentinel string rather than null, so null_rate is 0 despite real absence. Treatment: Recode "Not Available" to null and convert to a binary availability flag, since the numeric value carries no variance.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 7
- top_rate
- 0.8408
- cardinality
- 2
- entropy
- 0.6324
- entropy_ratio
- 0.6324
Count of Facility MORT Measures
categorical featureCounts of facility mortality measures stored as strings, with 8 distinct values across 5421 rows and no nulls. The dominant category is 'Not Available' at 32.8% (1777 rows), while the remaining values are integers 1-7, with '7' the most common numeric level at 850. High entropy ratio (0.92) indicates the non-missing counts are spread fairly evenly across 1-7. Treatment: Recode 'Not Available' as missing, cast remaining values to integer, then treat as ordinal.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 8
- top_value
- Not Available
- top_rate
- 0.3278
- cardinality
- 8
- entropy
- 2.765
- entropy_ratio
- 0.9217
Count of MORT Measures Better
categorical featureCounts the number of mortality measures where a hospital scored 'better than national average', stored as strings 0-7 plus 'Not Available'. The distribution is heavily concentrated at 0 (57.8% of 5421 rows) with another 1777 rows literally encoded as 'Not Available', leaving only ~10% of facilities recording one or more better-than-average measures. Cardinality is just 9 with entropy ratio 0.46, so the signal is sparse and dominated by zeros and missingness sentinels. Treatment: Recode 'Not Available' to NaN, cast remaining values to integer, and consider binarising (any-better vs none) given the heavy zero mass.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- 0
- top_rate
- 0.5785
- cardinality
- 9
- entropy
- 1.453
- entropy_ratio
- 0.4583
Count of MORT Measures No Different
categorical featureThis column appears to be a count (0-7) of mortality measures rated 'no different than national rate' per facility, but it's stored categorically with 'Not Available' as the dominant value at 32.8% of 5421 rows. Among numeric values, the distribution is fairly even across 1-7 (422-672 each), while '0' is rare at only 12 occurrences. The high entropy ratio (0.885) confirms the non-null values spread broadly across the 8 numeric buckets. Treatment: Coerce numeric strings to integers and treat 'Not Available' as an explicit missing-indicator before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- Not Available
- top_rate
- 0.3278
- cardinality
- 9
- entropy
- 2.806
- entropy_ratio
- 0.8852
Count of MORT Measures Worse
categorical featureA small-integer count (0-5) of mortality measures on which a hospital performed worse than the national benchmark, stored as strings alongside a 'Not Available' sentinel. The distribution is heavily concentrated: 60.2% are '0' and another 1,777 rows (about a third) are 'Not Available', leaving only 378 hospitals with one or more worse measures. The long tail is extreme — just 11 rows have 3 or more, and a single row reports 5. Treatment: Cast to integer with 'Not Available' mapped to NaN (or a missing flag), then consider binning to 0/1+ given the sparse tail.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 7
- top_value
- 0
- top_rate
- 0.6025
- cardinality
- 7
- entropy
- 1.294
- entropy_ratio
- 0.4608
MORT Group Footnote
numeric metadata null_rateDespite being typed numeric, this column behaves like a categorical footnote code: only 4 distinct values appear across 5421 rows, ranging discretely from 5 to 23 with a median of 5 and IQR spanning 5 to 19. Two-thirds of rows (null_rate 0.672) are empty, consistent with footnotes attached only to flagged MORT group records. The bimodal-looking spread (kurtosis -1.96, near-zero skew) reinforces that these are reference codes, not measurements. Treatment: Cast to categorical footnote code and join to a footnote lookup rather than treating as a numeric feature.
- n
- 5,421
- nulls
- 3,643 (67.2%)
- unique
- 4
- min
- 5
- max
- 23
- mean
- 11.58
- median
- 5
- std
- 7.057
- q1
- 5
- q3
- 19
- iqr
- 14
- skew
- 0.1488
- kurtosis
- -1.959
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
Safety Group Measure Count
categorical featureThis is a categorical column with only two values: "8" (84.1% of 5421 rows) and "Not Available" (the remaining 863 rows). Despite the name suggesting a count, the field is effectively a flag indicating whether the safety group has the standard 8 measures or no data at all. The complete absence of any other counts is unusual for a 'count' field. Treatment: Recode as a binary available/missing indicator before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 8
- top_rate
- 0.8408
- cardinality
- 2
- entropy
- 0.6324
- entropy_ratio
- 0.6324
Count of Facility Safety Measures
categorical featureThis column reports the count of facility safety measures, stored as a categorical with 9 distinct values (1–8 plus 'Not Available'). The dominant value is 'Not Available' at 38.1% of 5421 rows, which means missingness is encoded as a string rather than a null (null_rate is 0.0). Among reported counts, '7' (733) and '2' (519) lead, while '4' (223) is the rarest, giving a fairly even spread (entropy_ratio 0.868). Treatment: Recode 'Not Available' as null and cast remaining values to integer before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- Not Available
- top_rate
- 0.3809
- cardinality
- 9
- entropy
- 2.753
- entropy_ratio
- 0.8684
Count of Safety Measures Better
categorical featureThis is a categorical column counting how many safety measures improved, with values 0-6 stored as strings alongside a 'Not Available' sentinel. 'Not Available' dominates at 38.1% (2065 of 5421), effectively acting as a hidden null, and the remaining counts decay sharply from 1548 zeros down to just 3 sixes. Entropy ratio of 0.70 across 8 categories reflects this concentration in the low end. Treatment: Recode 'Not Available' to null and cast remaining levels to integer before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 8
- top_value
- Not Available
- top_rate
- 0.3809
- cardinality
- 8
- entropy
- 2.11
- entropy_ratio
- 0.7033
Count of Safety Measures No Different
categorical featureThis is a low-cardinality count field (10 distinct values) capturing how many safety measures were rated 'No Different', with integer values 0-8 stored as strings alongside a 'Not Available' sentinel. The dominant surprise is that 38.1% of rows (2065/5421) are 'Not Available', making missingness the modal outcome despite a 0% null rate. Among reported counts, the distribution is fairly even across 1-6 (434-656 each), with 0 (20) and 8 (10) being rare extremes. Treatment: Cast numeric strings to int and recode 'Not Available' as an explicit missing indicator before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 10
- top_value
- Not Available
- top_rate
- 0.3809
- cardinality
- 10
- entropy
- 2.685
- entropy_ratio
- 0.8083
Count of Safety Measures Worse
categorical featureThis is a low-cardinality count of safety measures rated 'worse', taking only 5 distinct values across 5421 rows with no nulls. Most facilities (54.3%) report 0, and a substantial 2065 rows carry the literal string 'Not Available' rather than a numeric value, mixing missingness into the value domain. Actual counts above 0 are rare (365 ones, 44 twos, 6 threes), giving a heavy zero-and-missing skew. Treatment: Recode 'Not Available' to NaN, cast remainder to integer, and treat as a low-count ordinal or binary (>0) feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 5
- top_value
- 0
- top_rate
- 0.5425
- cardinality
- 5
- entropy
- 1.338
- entropy_ratio
- 0.5764
Safety Group Footnote
numeric metadata null_rateThis appears to be a footnote code attached to safety group records, stored numerically but acting as a categorical flag with only 4 distinct values ranging from 5 to 23. The column is sparsely populated, with 61.8% nulls, suggesting footnotes apply only to a minority of rows. The bimodal-leaning distribution (median 5, Q3 19, kurtosis -1.81) reinforces that these are discrete code categories rather than a true measurement. Treatment: Cast to categorical and treat nulls as 'no footnote' before any modelling.
- n
- 5,421
- nulls
- 3,350 (61.8%)
- unique
- 4
- min
- 5
- max
- 23
- mean
- 10.69
- median
- 5
- std
- 6.95
- q1
- 5
- q3
- 19
- iqr
- 14
- skew
- 0.4116
- kurtosis
- -1.809
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
READM Group Measure Count
categorical featureA binary categorical field that records the count of measures in a readmission group, but stored as strings: 84.08% of 5421 rows are "11" and the remaining 863 rows are "Not Available". With only 2 distinct values and no nulls, this acts as a presence flag rather than a true count. Treatment: Recode to a boolean availability flag since the numeric value is constant when present.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 11
- top_rate
- 0.8408
- cardinality
- 2
- entropy
- 0.6324
- entropy_ratio
- 0.6324
Count of Facility READM Measures
categorical featureThis column appears to be the count of hospital readmission (READM) measures reported per facility, stored as strings rather than integers. Values range across 12 categories from "2" through "11" plus a sizeable "Not Available" bucket that dominates at 21.2% (1,150 of 5,421 rows). Distribution across the numeric levels is fairly even (entropy ratio 0.965), with no nulls but the string "Not Available" effectively acting as missingness. Treatment: Recode "Not Available" to NaN and cast remaining values to integer before modelling.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 12
- top_value
- Not Available
- top_rate
- 0.2121
- cardinality
- 12
- entropy
- 3.459
- entropy_ratio
- 0.965
Count of READM Measures Better
categorical featureThis column counts how many readmission measures a provider scored 'better' on, stored as strings ranging from '0' to '5' alongside a 'Not Available' sentinel. The distribution is heavily concentrated at '0' (61.5%, 3332 of 5421 rows), and 'Not Available' is the second most common value at 1150 rows, exceeding any nonzero count. Only 41 rows score 3 or higher, so meaningful positive signal is rare. Treatment: Cast numerics to int, encode 'Not Available' as a missing flag, and consider collapsing the long tail (3-5) into a single bucket.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 7
- top_value
- 0
- top_rate
- 0.6146
- cardinality
- 7
- entropy
- 1.51
- entropy_ratio
- 0.5379
Count of READM Measures No Different
categorical featureThis is a count of hospital readmission measures where performance was 'no different' than national, stored as strings ranging '1'–'9' (plus likely higher) alongside a 'Not Available' sentinel. The sentinel dominates at 21.2% (1150 of 5421), and the 13 distinct values are spread fairly evenly (entropy ratio 0.92), with numeric counts each landing in the 370–500 range. Treatment: Cast to integer after replacing 'Not Available' with NaN, then treat as ordinal numeric.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 13
- top_value
- Not Available
- top_rate
- 0.2121
- cardinality
- 13
- entropy
- 3.408
- entropy_ratio
- 0.9211
Count of READM Measures Worse
categorical featureThis appears to be a count of readmission measures rated 'worse' per hospital, stored as a categorical/string column with 9 distinct values ranging from '0' to '7' plus 'Not Available'. The distribution is heavily concentrated at zero (55.1% of 5,421 rows) and 'Not Available' accounts for 1,150 rows, which is a substantial missing-data signal masquerading as a category. Higher counts are rare, with only 31 rows at 4 or above. Treatment: Cast numeric levels to integer, recode 'Not Available' as null, then treat as ordinal or count feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 9
- top_value
- 0
- top_rate
- 0.5512
- cardinality
- 9
- entropy
- 1.758
- entropy_ratio
- 0.5545
READM Group Footnote
numeric metadata null_rateThis appears to be a footnote/flag code attached to a readmission metric, encoded numerically with only 3 distinct values (5, 19, and 22 based on the quartiles and max). The column is overwhelmingly empty at a 78.79% null rate, meaning footnotes apply to a small minority of records. Despite being stored as numeric, the values are categorical codes — the mean of 15.15 and std of 6.37 have no real interpretive meaning. Treatment: Cast to categorical footnote codes and treat nulls as 'no footnote' rather than imputing.
- n
- 5,421
- nulls
- 4,271 (78.8%)
- unique
- 3
- min
- 5
- max
- 22
- mean
- 15.15
- median
- 19
- std
- 6.366
- q1
- 5
- q3
- 19
- iqr
- 14
- skew
- -0.9528
- kurtosis
- -1.051
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
Pt Exp Group Measure Count
categorical metadataBinary categorical with only two values: "8" (84.1% of 5421 rows) and "Not Available" (the remaining 863). The literal string "Not Available" stands in for missing data, so the column is effectively a constant of 8 with a 15.9% missingness flag rather than a true feature. Entropy ratio of 0.63 confirms the low information content. Treatment: Recode "Not Available" to null and drop, or keep only as a binary missingness indicator.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 8
- top_rate
- 0.8408
- cardinality
- 2
- entropy
- 0.6324
- entropy_ratio
- 0.6324
Count of Facility Pt Exp Measures
categorical featureThis column reports the count of facility patient experience measures, but it is effectively binary: every one of the 5421 rows is either the literal string "8" (58.2%) or "Not Available" (41.8%). The high entropy ratio of 0.98 reflects that near 50/50 split rather than any real numeric variation. The surprise is that a supposed count has only one non-null numeric level, so it carries no granularity beyond a presence/absence flag. Treatment: Recode as a binary has_measures flag (8 vs Not Available) rather than treating as a numeric count.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 8
- top_rate
- 0.5818
- cardinality
- 2
- entropy
- 0.9806
- entropy_ratio
- 0.9806
Pt Exp Group Footnote
numeric metadata null_rateThis is a footnote code attached to a 'Pt Exp Group' (likely patient experience group) metric, encoded numerically but with only 3 distinct values (5, ~19, 22) across 5421 rows. It is null 58.18% of the time, which is expected for footnote columns that flag exceptions on a minority of rows. The bimodal-looking spread (median 5, Q3 19, max 22) and negative kurtosis (-1.66) confirm it behaves as a sparse categorical flag rather than a continuous measure. Treatment: Cast to categorical footnote codes and treat nulls as 'no footnote' rather than imputing numerically.
- n
- 5,421
- nulls
- 3,154 (58.2%)
- unique
- 3
- min
- 5
- max
- 22
- mean
- 10.15
- median
- 5
- std
- 6.806
- q1
- 5
- q3
- 19
- iqr
- 14
- skew
- 0.571
- kurtosis
- -1.658
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
TE Group Measure Count
categorical metadataA binary categorical field where 84.1% of the 5421 rows take the literal string "12" and the remaining 863 rows are "Not Available". Despite the name suggesting a count, it is stored as a string with only 2 distinct values and no nulls, so "Not Available" is functioning as an in-band missing marker rather than a true category. Treatment: Recode "Not Available" to null and collapse to a boolean indicator, since the only real value is 12.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- 12
- top_rate
- 0.8408
- cardinality
- 2
- entropy
- 0.6324
- entropy_ratio
- 0.6324
Count of Facility TE Measures
categorical featureThis column reports the count of Facility TE (Timely & Effective) Measures per row, stored as strings with 13 distinct values across 5,421 records. The most common value is the sentinel "Not Available" at 17.1% (928 rows), with numeric counts ranging at least from 4 to 12 mixed in as text. Entropy ratio of 0.93 indicates the non-null values are spread fairly evenly across the count buckets. Treatment: Coerce to integer with "Not Available" mapped to NaN, then treat as an ordinal/numeric feature.
- n
- 5,421
- nulls
- 0 (0.0%)
- unique
- 13
- top_value
- Not Available
- top_rate
- 0.1712
- cardinality
- 13
- entropy
- 3.458
- entropy_ratio
- 0.9343
TE Group Footnote
numeric metadata null_rate high_skew outliersThis appears to be a footnote code column for a 'TE Group' classification, stored numerically but functioning as a categorical reference (only 3 unique values across 5421 rows). It is mostly empty (82.88% null), and among populated rows the value 19 dominates so heavily that q1, median, and q3 all equal 19, producing a zero IQR and a strong negative skew of -2.43. The 133 flagged outliers (14.3%) are simply the minority codes (down to 5) being measured against a degenerate distribution. Treatment: Cast to categorical footnote code and exclude from numeric modelling.
- n
- 5,421
- nulls
- 4,493 (82.9%)
- unique
- 3
- min
- 5
- max
- 22
- mean
- 17.58
- median
- 19
- std
- 4.432
- q1
- 19
- q3
- 19
- iqr
- 0
- skew
- -2.43
- kurtosis
- 4.12
- n_outliers
- 133
- outlier_rate
- 0.1433
- zero_rate
- 0