saturn

/home/coolhand/html/datavis/data_trove/geographic/counties_simplified.geojson 3,234 rows sample n=3,234 seed 42 2026-06-22T01:06:32+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/geographic/counties_simplified.geojson
Total rows	3,234
Profiled sample	3,234
Columns	18
Generated	2026-06-22T01:06:32+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
STATEFP	categorical	0.0%
COUNTYFP	categorical	0.0%
COUNTYNS	text	0.0%
GEOID	text	0.0%
NAME	text	0.0%
NAMELSAD	text	0.0%
LSAD	categorical	0.0%
CLASSFP	categorical	0.0%
MTFCC	categorical	0.0%
CSAFP	categorical	61.2%
CBSAFP	categorical	40.8%
METDIVFP	categorical	96.6%
FUNCSTAT	categorical	0.0%
ALAND	numeric	0.0%
AWATER	numeric	0.0%
INTPTLAT	text	0.0%
INTPTLON	text	0.0%
geometry_type	categorical	0.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is a US county-level geographic reference file containing 3,234 county (and county-equivalent) records across 56 state FIPS codes, with spatial attributes, area measurements, and metropolitan area classifications. The most notable pattern is that roughly 41% of counties share a name with at least one other county — 'Washington' alone appears 31 times — reflecting the historic reuse of patriotic and presidential names across states. Two numeric columns, ALAND (land area) and AWATER (water area), show extreme right skew with over 11–14% outliers, meaning a small number of counties are vastly larger or wetter than the median, which warrants attention in any area-weighted analysis. Additionally, over 40% of counties have no CBSAFP code (no core-based statistical area assignment), indicating a large rural, non-metro population of counties that could easily be overlooked in urban-focused analyses.

COUNTYNS high anthropic:default

COUNTYNS is a FIPS-style county National Standard (ANSI/GNIS) code — an 8-character, zero-padded numeric identifier assigned by the U.S. Geological Survey to uniquely identify counties. Every one of the 3,234 rows carries a distinct value (duplicate_rate 0.0, n_unique 3,234) with no nulls, and all values are exactly 8 characters long (len_min = len_max = 8), consistent with the fixed-width GNIS format. The perfect uniqueness and fixed length make this a reliable surrogate key for county-level joins to official geographic reference tables.

GEOID high anthropic:default

GEOID is a US Census geographic identifier column containing 5-digit FIPS county codes (e.g., '06091', '48327'), where the first two digits encode state and the last three encode county. Every value is exactly 5 characters long (len_min=5, len_max=5, len_mean=5.0), perfectly unique across all 3,234 rows with zero nulls or duplicates, confirming this is a primary key for county-level geographic records. The allcaps_rate of 1.0 is a classifier artifact — these are numeric strings, not alphabetic text. The vocab_size of 3,234 matching n_unique=3,234 means this dataset likely covers a near-complete set of US counties (there are ~3,243 counties/equivalents in the US).

INTPTLAT high anthropic:default

INTPTLAT is the internal point latitude coordinate for geographic entities (a standard Census Bureau field name), stored as a fixed-width text string rather than a numeric type. Every one of the 3,234 rows is unique, all values are exactly 11 characters long (e.g. '+41.9158651'), and the duplicate rate is 0.0, confirming these are precise geographic identifiers. The surprising signal is that a coordinate stored as text with 'allcaps' and 'one_word' alerts was profiled as a string column — it should be numeric but the leading '+' sign likely forced text treatment.

INTPTLON high anthropic:default

INTPTLON is the internal point longitude field, a standard Census Bureau coordinate column storing the longitude of a representative point within each geographic entity. Every one of the 3,234 rows is unique, all values are exactly 12 characters long (mean, median, min, and max all equal 12), and the duplicate rate is 0.0 — consistent with precise decimal-degree coordinates stored as fixed-format strings. All values appear to be negative (Western Hemisphere), ranging roughly from -065 to -123 degrees, aligning with US continental and territory coverage.

NAME high anthropic:default

This column contains names of U.S. counties or county-equivalent administrative divisions, dominated by patriotic/presidential surnames (Washington, Jefferson, Franklin, Lincoln, Jackson, Madison) and geographic terms (Union, Lake, Montgomery, Marion). The duplicate rate of 40.5% (1,311 out of 3,234 rows) is expected given that common county names repeat across states, but analysts should note this column alone cannot serve as a unique identifier. The vocabulary of 1,958 tokens across 3,234 rows and a median length of 7 characters confirms these are short, single-word labels in most cases (93.1% one-word rate).

NAMELSAD high anthropic:default

NAMELSAD is a US Census Legal/Statistical Area Description field containing the full human-readable name of county-equivalent geographic units (e.g., 'Washington County', 'Jefferson Parish'). The word 'county' appears in 3,007 of 3,234 rows, with 'municipio' (78) and 'parish' (64) indicating Puerto Rico and Louisiana records respectively. The 39.1% duplicate rate (1,265 duplicates across 1,969 unique values) is fully expected — common county names like 'Washington County' (30 occurrences) repeat across different US states. The multi-word structure (mean 2.08 words) and short length (median 14 chars) are consistent with standardized geographic labels.

ALAND high anthropic:default

ALAND is a US Census land area field (measured in square metres), representing the land area of each geographic entity — likely counties or census tracts given n=3,234. The distribution is extremely right-skewed (skew=27.13, kurtosis=976.66): while the median is ~1.56 billion m², the max reaches 377 billion m², roughly 241× the median, indicating a small number of very large geographic units (e.g., western US counties). 362 values (11.2%) are flagged as outliers, consistent with the well-known size disparity between densely subdivided eastern counties and sprawling western ones.

AWATER high anthropic:default

AWATER is almost certainly a US Census TIGER/Line water area field, representing the total water surface area (in square meters) of a geographic unit such as a county or census tract. All 3,234 rows are unique and non-null, consistent with one record per geographic entity. The distribution is extremely right-skewed (skew 13.33, kurtosis 215.85): the median is ~19.5 million m² while the mean balloons to ~220 million m², and the maximum reaches ~26 billion m² — about 14× the mean — with 456 outliers (14.1% of rows) driven by large water-heavy units like coastal counties or Great Lakes-adjacent areas. Only 1 record has a zero value, which is plausible for fully land-locked units.

CBSAFP high anthropic:default

CBSAFP is a Core Based Statistical Area (CBSA) FIPS code, a U.S. Census geographic identifier linking records to metropolitan or micropolitan statistical areas. With 939 unique codes across 3,234 rows, the distribution is notably flat (entropy_ratio 0.94), meaning records are spread thinly across many areas rather than concentrated. The null rate of 40.75% is a significant concern — likely representing locations outside any defined CBSA (rural areas), which is a meaningful geographic signal rather than simple missingness. The most frequent value '41980' (San Jose-Sunnyvale-Santa Clara, CA) appears only 40 times (~2.1%), confirming no single area dominates.

CLASSFP high anthropic:default

CLASSFP is the FIPS functional classification code for geographic/administrative entities, almost certainly places in a US Census dataset. The distribution is severely imbalanced: 'H1' (incorporated places) accounts for 96.3% of the 3,234 rows, with the remaining four codes (C7, H6, H4, H5) collectively covering only 119 records. The entropy ratio of 0.128 confirms near-minimal informational content, meaning this column carries little discriminatory power as a feature in its current form.

FUNCSTAT high anthropic:default

FUNCSTAT is a U.S. Census functional status code, classifying geographic or administrative entities by their operational state (e.g., 'A' = active, 'F' = fictitious, 'C' = consolidated). The distribution is severely imbalanced: 'A' accounts for 96.35% of the 3,234 records, while the remaining 6 categories together cover only 118 rows — with 'G' appearing just once. Entropy ratio of 0.107 confirms near-minimal informational diversity. Minority classes may warrant special handling but will be extremely difficult to model as targets.

MTFCC high anthropic:default

MTFCC is a MAF/TIGER Feature Class Code, a U.S. Census Bureau classification code for geographic features. Every single one of the 3,234 rows carries the identical value 'G4020' (which corresponds to a local road/street segment), with zero nulls and an entropy of 0.0 — this column is entirely constant across the dataset. It carries no discriminatory signal whatsoever.

geometry_type high anthropic:default

This column classifies the geometric representation type of spatial features, distinguishing between 'Polygon' and 'MultiPolygon' geometries across 3,234 records. The severe class imbalance is the standout signal: 'Polygon' dominates at 98.36% (3,181 records), leaving 'MultiPolygon' as a rare minority at only 53 occurrences (1.64%). The near-zero entropy (0.121) confirms this column carries almost no information variance, which limits its predictive utility.

CSAFP medium anthropic:default

CSAFP is likely a Combined Statistical Area FIPS (or similar geographic area code), given the numeric-string values in the hundreds range and cardinality of 175 — consistent with a US metropolitan/micropolitan area classification code. The most alarming signal is a 61.16% null rate, meaning nearly two-thirds of the 3,234 rows carry no value, which likely indicates records that do not belong to any defined statistical area. The distribution is nearly flat across all 175 codes (entropy ratio 0.936, top value '490' appears only 3.8% of the time), suggesting no single area dominates.

COUNTYFP high anthropic:default

COUNTYFP is a FIPS county code — a standardized 3-digit zero-padded numeric string used in US geographic identifiers. With 330 unique values across 3,234 rows and a high entropy ratio of 0.85, codes are broadly distributed with near-uniform frequency: the most common value ('003') appears only 50 times (~1.5% top_rate). The sequential odd-number pattern in top values (001, 003, 005, 007…) is characteristic of FIPS county numbering conventions, confirming this is a geographic lookup key rather than a raw feature.

LSAD high anthropic:default

LSAD (Legal/Statistical Area Description) is a Census Bureau code that classifies geographic entities by type — values like '06' (county), '13', '15', '25' are standard LSAD codes. The distribution is severely dominated by code '06', which accounts for 3,007 of 3,234 rows (92.98%), indicating this dataset is overwhelmingly composed of one entity type (most likely counties). With only 11 unique values and near-zero entropy ratio (0.156), this column carries very little discriminative information despite being semantically meaningful.

STATEFP high anthropic:default

STATEFP is the U.S. Census Bureau FIPS state code, a two-digit numeric string identifying each U.S. state or territory. With exactly 56 unique values and 3,234 rows it likely represents one record per county or similar sub-state geographic unit. The top value '48' (Texas, 254 rows) alone accounts for 7.85% of all records — consistent with Texas having the most counties of any U.S. state — while values like '13' (Georgia, 159) and '51' (Virginia, 133) also rank high, reflecting those states' large county counts. The entropy ratio of 0.919 indicates a fairly even spread across states despite Texas's dominance.

Numeric correlation

Show data table

Pearson correlation across 2 numeric columns (values clipped to 2 decimals).
	ALAND	AWATER
ALAND	+1.00	+0.58
AWATER	+0.58	+1.00

STATEFP categorical

rows3,234

null0 (0.0%)

unique56

top_value48

top_rate0.079

cardinality56

entropy5.337

entropy_ratio0.919

Show data table

Top values for STATEFP (20 unique shown, of 56 total).
value	count	share
48	254	7.9%
13	159	4.9%
51	133	4.1%
21	120	3.7%
29	115	3.6%
20	105	3.2%
17	102	3.2%
37	100	3.1%
19	99	3.1%
47	95	2.9%
31	93	2.9%
18	92	2.8%
39	88	2.7%
27	87	2.7%
26	83	2.6%
28	82	2.5%
72	78	2.4%
40	77	2.4%
05	75	2.3%
55	72	2.2%

Top values (rank 1–20)

48 — 254
13 — 159
51 — 133
21 — 120
29 — 115
20 — 105
17 — 102
37 — 100
19 — 99
47 — 95
31 — 93
18 — 92
39 — 88
27 — 87
26 — 83
28 — 82
72 — 78
40 — 77
05 — 75
55 — 72

COUNTYFP categorical

rows3,234

null0 (0.0%)

unique330

top_value003

top_rate0.015

cardinality330

entropy7.118

entropy_ratio0.851

Show data table

Top values for COUNTYFP (20 unique shown, of 330 total).
value	count	share
003	50	1.5%
001	50	1.5%
005	50	1.5%
009	49	1.5%
007	48	1.5%
013	48	1.5%
011	47	1.5%
015	47	1.5%
019	46	1.4%
017	46	1.4%
027	45	1.4%
023	45	1.4%
021	45	1.4%
025	43	1.3%
031	42	1.3%
029	42	1.3%
033	41	1.3%
037	40	1.2%
035	40	1.2%
039	39	1.2%

Top values (rank 1–20)

003 — 50
001 — 50
005 — 50
009 — 49
007 — 48
013 — 48
011 — 47
015 — 47
019 — 46
017 — 46
027 — 45
023 — 45
021 — 45
025 — 43
031 — 42
029 — 42
033 — 41
037 — 40
035 — 40
039 — 39

COUNTYNS text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars

rows3,234

null0 (0.0%)

unique3,234

len_min8

len_max8

len_mean8.000

len_median8.000

len_p958.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size3,234

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Show data table

Character-length distribution for COUNTYNS (mean: 8.0).
chars	count
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	3234
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0

Sample values (first 10)

00835876
01493928
01558262
01383811
01074062
00465192
00835845
00758561
01639754
01383909

GEOID text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars

rows3,234

null0 (0.0%)

unique3,234

len_min5

len_max5

len_mean5.000

len_median5.000

len_p955.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size3,234

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Show data table

Character-length distribution for GEOID (mean: 5.0).
chars	count
4 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	3234
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 5	0
5 – 6	0

Sample values (first 10)

31109
51135
54009
48051
39099
19005
31047
29217
47077
48247

NAME text

93.1% rows are a single word 95th-percentile length under 20 chars 40.5% duplicate strings

rows3,234

null0 (0.0%)

unique1,923

len_min3

len_max21

len_mean7.040

len_median7.000

len_p9511.000

word_mean1.074

word_median1.000

n_empty0

n_duplicates1,311

duplicate_rate0.405

vocab_size1,958

readability_flesch_mean31.966

emoji_rate0.000

url_rate0.000

one_word_rate0.931

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for NAME (mean: 7.0395794681508965).
chars	count
3 – 3	27
3 – 4	0
4 – 4	257
4 – 5	0
5 – 5	470
5 – 6	0
6 – 6	696
6 – 7	0
7 – 7	614
7 – 8	0
8 – 8	0
8 – 8	501
8 – 9	0
9 – 9	292
9 – 10	0
10 – 10	211
10 – 11	0
11 – 11	60
11 – 12	0
12 – 12	0
12 – 12	48
12 – 13	0
13 – 13	22
13 – 14	0
14 – 14	14
14 – 15	0
15 – 15	8
15 – 16	0
16 – 16	5
16 – 16	0
16 – 17	0
17 – 17	3
17 – 18	0
18 – 18	1
18 – 19	0
19 – 19	1
19 – 20	0
20 – 20	3
20 – 21	0
21 – 21	1

Sample values (first 10)

Lancaster
Nottoway
Brooke
Burleson
Mahoning
Allamakee
Dawson
Vernon
Henderson
Jim Hogg

NAMELSAD text

95th-percentile length under 20 chars 39.1% duplicate strings

rows3,234

null0 (0.0%)

unique1,969

len_min4

len_max33

len_mean14.123

len_median14.000

len_p9518.000

word_mean2.079

word_median2.000

n_empty0

n_duplicates1,265

duplicate_rate0.391

vocab_size1,965

readability_flesch_mean32.291

emoji_rate0.000

url_rate0.000

one_word_rate3.09e-04

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for NAMELSAD (mean: 14.122758194186765).
chars	count
4 – 5	1
5 – 5	0
5 – 6	0
6 – 7	0
7 – 8	0
8 – 8	0
8 – 9	0
9 – 10	0
10 – 11	29
11 – 11	256
11 – 12	0
12 – 13	465
13 – 13	683
13 – 14	590
14 – 15	0
15 – 16	496
16 – 16	297
16 – 17	223
17 – 18	0
18 – 18	67
18 – 19	51
19 – 20	0
20 – 21	23
21 – 21	16
21 – 22	14
22 – 23	0
23 – 24	7
24 – 24	4
24 – 25	5
25 – 26	0
26 – 26	2
26 – 27	1
27 – 28	0
28 – 29	1
29 – 29	1
29 – 30	0
30 – 31	0
31 – 32	1
32 – 32	0
32 – 33	1

Sample values (first 10)

Lancaster County
Nottoway County
Brooke County
Burleson County
Mahoning County
Allamakee County
Dawson County
Vernon County
Henderson County
Jim Hogg County

LSAD categorical

rows3,234

null0 (0.0%)

unique11

top_value06

top_rate0.930

cardinality11

entropy0.539

entropy_ratio0.156

Show data table

Top values for LSAD (11 unique shown, of 11 total).
value	count	share
06	3007	93.0%
13	78	2.4%
15	64	2.0%
25	40	1.2%
04	13	0.4%
05	11	0.3%
12	6	0.2%
00	5	0.2%
03	4	0.1%
10	3	0.1%
07	3	0.1%

Top values (rank 1–20)

06 — 3,007
13 — 78
15 — 64
25 — 40
04 — 13
05 — 11
12 — 6
00 — 5
03 — 4
10 — 3
07 — 3

CLASSFP categorical

top value is 96.3% of rows

rows3,234

null0 (0.0%)

unique5

top_valueH1

top_rate0.963

cardinality5

entropy0.296

entropy_ratio0.128

Show data table

Top values for CLASSFP (5 unique shown, of 5 total).
value	count	share
H1	3115	96.3%
C7	41	1.3%
H6	38	1.2%
H4	29	0.9%
H5	11	0.3%

Top values (rank 1–20)

H1 — 3,115
C7 — 41
H6 — 38
H4 — 29
H5 — 11

MTFCC categorical

top value is 100.0% of rows

rows3,234

null0 (0.0%)

unique1

top_valueG4020

top_rate1.000

cardinality1

entropy-0.000

entropy_ratio0.000

Show data table

Top values for MTFCC (1 unique shown, of 1 total).
value	count	share
G4020	3234	100.0%

Top values (rank 1–20)

G4020 — 3,234

CSAFP categorical

61.2% null

rows3,234

null1,978 (61.2%)

unique175

top_value490

top_rate0.038

cardinality175

entropy6.977

entropy_ratio0.936

Show data table

Top values for CSAFP (20 unique shown, of 175 total).
value	count	share
490	48	1.5%
122	42	1.3%
548	41	1.3%
408	31	1.0%
545	22	0.7%
312	22	0.7%
378	21	0.6%
176	19	0.6%
148	19	0.6%
206	19	0.6%
178	18	0.6%
294	18	0.6%
198	17	0.5%
476	17	0.5%
170	16	0.5%
428	16	0.5%
400	16	0.5%
350	15	0.5%
184	14	0.4%
174	14	0.4%

Top values (rank 1–20)

490 — 48
122 — 42
548 — 41
408 — 31
545 — 22
312 — 22
378 — 21
176 — 19
148 — 19
206 — 19
178 — 18
294 — 18
198 — 17
476 — 17
170 — 16
428 — 16
400 — 16
350 — 15
184 — 14
174 — 14

CBSAFP categorical

602 singleton categories 40.8% null

rows3,234

null1,318 (40.8%)

unique939

top_value41980

top_rate0.021

cardinality939

entropy9.278

entropy_ratio0.940

Show data table

Top values for CBSAFP (20 unique shown, of 939 total).
value	count	share
41980	40	1.2%
12060	29	0.9%
47900	25	0.8%
35620	23	0.7%
47260	19	0.6%
40060	17	0.5%
17140	16	0.5%
33460	15	0.5%
41180	15	0.5%
16980	14	0.4%
28140	14	0.4%
34980	13	0.4%
16740	11	0.3%
37980	11	0.3%
19100	11	0.3%
26900	11	0.3%
19740	10	0.3%
18140	10	0.3%
12940	10	0.3%
31140	10	0.3%

Top values (rank 1–20)

41980 — 40
12060 — 29
47900 — 25
35620 — 23
47260 — 19
40060 — 17
17140 — 16
33460 — 15
41180 — 15
16980 — 14
28140 — 14
34980 — 13
16740 — 11
37980 — 11
19100 — 11
26900 — 11
19740 — 10
18140 — 10
12940 — 10
31140 — 10

METDIVFP categorical

96.6% null

rows3,234

null3,124 (96.6%)

unique31

top_value47894

top_rate0.209

cardinality31

entropy4.361

entropy_ratio0.880

Show data table

Top values for METDIVFP (20 unique shown, of 31 total).
value	count	share
47894	23	0.7%
35614	11	0.3%
19124	7	0.2%
35084	6	0.2%
16984	5	0.2%
47664	5	0.2%
23844	4	0.1%
23104	4	0.1%
35154	4	0.1%
14454	3	0.1%
15804	3	0.1%
48864	3	0.1%
20994	3	0.1%
33874	3	0.1%
35004	2	0.1%
37964	2	0.1%
29404	2	0.1%
41884	2	0.1%
23224	2	0.1%
36084	2	0.1%

Top values (rank 1–20)

47894 — 23
35614 — 11
19124 — 7
35084 — 6
16984 — 5
47664 — 5
23844 — 4
23104 — 4
35154 — 4
14454 — 3
15804 — 3
48864 — 3
20994 — 3
33874 — 3
35004 — 2
37964 — 2
29404 — 2
41884 — 2
23224 — 2
36084 — 2

FUNCSTAT categorical

top value is 96.4% of rows

rows3,234

null0 (0.0%)

unique7

top_valueA

top_rate0.964

cardinality7

entropy0.301

entropy_ratio0.107

Show data table

Top values for FUNCSTAT (7 unique shown, of 7 total).
value	count	share
A	3116	96.4%
F	43	1.3%
C	33	1.0%
N	27	0.8%
S	11	0.3%
B	3	0.1%
G	1	0.0%

Top values (rank 1–20)

A — 3,116
F — 43
C — 33
N — 27
S — 11
B — 3
G — 1

ALAND numeric

skew=+27.13 11.2% rows beyond 1.5 IQR

rows3,234

null0 (0.0%)

unique3,234

min82,093

max377,038,917,450

mean2,832,701,709

median1,563,349,650

std9,186,156,810

q11,078,544,021

q32,368,055,605

iqr1,289,511,584

skew27.126

kurtosis976.658

n_outliers362

outlier_rate0.112

zero_rate0.000

Show data table

Histogram bins for ALAND (median: 1563349650.5).
bin	count
8.209e+04 – 9.426e+09	3096
9.426e+09 – 1.885e+10	97
1.885e+10 – 2.828e+10	22
2.828e+10 – 3.77e+10	3
3.77e+10 – 4.713e+10	4
4.713e+10 – 5.656e+10	3
5.656e+10 – 6.598e+10	5
6.598e+10 – 7.541e+10	0
7.541e+10 – 8.483e+10	0
8.483e+10 – 9.426e+10	1
9.426e+10 – 1.037e+11	0
1.037e+11 – 1.131e+11	1
1.131e+11 – 1.225e+11	0
1.225e+11 – 1.32e+11	0
1.32e+11 – 1.414e+11	0
1.414e+11 – 1.508e+11	0
1.508e+11 – 1.602e+11	0
1.602e+11 – 1.697e+11	0
1.697e+11 – 1.791e+11	0
1.791e+11 – 1.885e+11	0
1.885e+11 – 1.979e+11	0
1.979e+11 – 2.074e+11	0
2.074e+11 – 2.168e+11	0
2.168e+11 – 2.262e+11	0
2.262e+11 – 2.356e+11	1
2.356e+11 – 2.451e+11	0
2.451e+11 – 2.545e+11	0
2.545e+11 – 2.639e+11	0
2.639e+11 – 2.734e+11	0
2.734e+11 – 2.828e+11	0
2.828e+11 – 2.922e+11	0
2.922e+11 – 3.016e+11	0
3.016e+11 – 3.111e+11	0
3.111e+11 – 3.205e+11	0
3.205e+11 – 3.299e+11	0
3.299e+11 – 3.393e+11	0
3.393e+11 – 3.488e+11	0
3.488e+11 – 3.582e+11	0
3.582e+11 – 3.676e+11	0
3.676e+11 – 3.77e+11	1

AWATER numeric

skew=+13.33 14.1% rows beyond 1.5 IQR

rows3,234

null0 (0.0%)

unique3,234

min0.000

max25,989,695,209

mean220,188,953

median19,505,620

std1,225,718,213

q17,043,836

q361,199,722

iqr54,155,886

skew13.326

kurtosis215.855

n_outliers456

outlier_rate0.141

zero_rate3.09e-04

Show data table

Histogram bins for AWATER (median: 19505620.5).
bin	count
0 – 6.497e+08	3040
6.497e+08 – 1.299e+09	92
1.299e+09 – 1.949e+09	30
1.949e+09 – 2.599e+09	25
2.599e+09 – 3.249e+09	14
3.249e+09 – 3.898e+09	3
3.898e+09 – 4.548e+09	3
4.548e+09 – 5.198e+09	7
5.198e+09 – 5.848e+09	1
5.848e+09 – 6.497e+09	3
6.497e+09 – 7.147e+09	1
7.147e+09 – 7.797e+09	0
7.797e+09 – 8.447e+09	1
8.447e+09 – 9.096e+09	0
9.096e+09 – 9.746e+09	0
9.746e+09 – 1.04e+10	0
1.04e+10 – 1.105e+10	1
1.105e+10 – 1.17e+10	1
1.17e+10 – 1.235e+10	0
1.235e+10 – 1.299e+10	2
1.299e+10 – 1.364e+10	0
1.364e+10 – 1.429e+10	3
1.429e+10 – 1.494e+10	2
1.494e+10 – 1.559e+10	1
1.559e+10 – 1.624e+10	0
1.624e+10 – 1.689e+10	0
1.689e+10 – 1.754e+10	0
1.754e+10 – 1.819e+10	0
1.819e+10 – 1.884e+10	0
1.884e+10 – 1.949e+10	0
1.949e+10 – 2.014e+10	0
2.014e+10 – 2.079e+10	0
2.079e+10 – 2.144e+10	1
2.144e+10 – 2.209e+10	0
2.209e+10 – 2.274e+10	1
2.274e+10 – 2.339e+10	0
2.339e+10 – 2.404e+10	0
2.404e+10 – 2.469e+10	0
2.469e+10 – 2.534e+10	1
2.534e+10 – 2.599e+10	1

INTPTLAT text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars

rows3,234

null0 (0.0%)

unique3,234

len_min11

len_max11

len_mean11.000

len_median11.000

len_p9511.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size3,234

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Show data table

Character-length distribution for INTPTLAT (mean: 11.0).
chars	count
10 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	3234
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 11	0
11 – 12	0

Sample values (first 10)

+40.7835474
+37.1411668
+40.2726454
+30.4934867
+41.0108798
+43.2749637
+40.8678400
+37.8501957
+35.6539945
+27.0532315

INTPTLON text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars

rows3,234

null0 (0.0%)

unique3,234

len_min12

len_max12

len_mean12.000

len_median12.000

len_p9512.000

word_mean1.000

word_median1.000

n_empty0

n_duplicates0

duplicate_rate0.000

vocab_size3,234

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate1.000

allcaps_rate1.000

boilerplate_rate0.000

Show data table

Character-length distribution for INTPTLON (mean: 12.0).
chars	count
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	3234
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0
12 – 12	0

Sample values (first 10)

-096.6886584
-078.0538655
-080.5786910
-096.6220912
-080.7703956
-091.3827510
-099.8155833
-094.3415972
-088.3876742
-098.7475716

geometry_type categorical

top value is 98.4% of rows

rows3,234

null0 (0.0%)

unique2

top_valuePolygon

top_rate0.984

cardinality2

entropy0.121

entropy_ratio0.121

Show data table

Top values for geometry_type (2 unique shown, of 2 total).
value	count	share
Polygon	3181	98.4%
MultiPolygon	53	1.6%

Top values (rank 1–20)

Polygon — 3,181
MultiPolygon — 53