saturn

/home/coolhand/html/datavis/data_trove/cache/accessibility/ssa_sa_fywl.csv 1,093 rows sample n=1,093 seed 42 2026-05-01T17:12:33+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/cache/accessibility/ssa_sa_fywl.csv
Total rows	1,093
Profiled sample	1,093
Columns	30
Generated	2026-05-01T17:12:33+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset medium anthropic:claude-opus-4-7

This appears to be the SSA-SA-FYWL dataset (Social Security Administration state/area fiscal-year workload data) with 1,093 rows and 30 columns, but the headers were not parsed correctly — most columns carry placeholder names like `_duplicated_*` and several columns hold metadata constants (file name, update date 3/13/2023, date type 'FY'). The most informative real fields are the geographic and time dimensions: `_duplicated_2` holds 53 US state codes (each appearing 21 times), `_duplicated_1` holds 11 region codes dominated by ATL (168 rows), and `_duplicated_4` holds 22 fiscal years from 2001 onward in a balanced panel. Many numeric measures (e.g. `_duplicated_22`, `_duplicated_12`, `_duplicated_10`) were ingested as text/categorical strings of decimal numbers, so they should be retyped before analysis. Start by fixing headers and dtypes, then look at the region/state/year structure to confirm the panel layout.

**Please note** 2021 data in columns H, K, R, and U are populated with 2020 data until current data is released. high anthropic:claude-opus-4-7

This column is effectively a constant file-name tag ("SSA-SA-FYWL.csv" appears 1092 of 1093 times, top_rate 0.999) with a single stray "File Name" value that looks like a header row leaked into the data. The column header itself is a free-text note about 2021 data being backfilled with 2020 data, suggesting this is provenance metadata rather than a feature. Entropy is essentially zero (0.0106), so it carries no discriminative signal.

high anthropic:claude-opus-4-7

Binary categorical column with 1093 rows and only 2 distinct values, but it is effectively a constant: "2" appears 1092 times (top_rate 0.999) while "File Version" appears once. The lone "File Version" string alongside numeric "2" suggests a stray header row leaked into the data. Entropy of 0.0106 confirms there is virtually no information here.

_duplicated_0 high anthropic:claude-opus-4-7

This appears to be a duplicated date column where 1092 of 1093 rows hold the single value '3/13/2023', with the lone other entry being the literal string 'Update Date' — almost certainly a header row that leaked into the data. Entropy is effectively zero (0.0106) and the top rate is 0.999, so the column carries no discriminative signal. The 'Update Date' value also confirms a parsing/ingest issue worth fixing upstream.

_duplicated_1 high anthropic:claude-opus-4-7

Three-letter city/airport codes (ATL, DEN, BOS, PHL, CHI, DAL, SEA, SFO, KCM, NYC...) across 1093 rows with 11 unique values and no nulls. Distribution is fairly even — entropy ratio 0.947 and top value ATL only 15.4% — suggesting a balanced categorical rather than a skewed label. The column name `_duplicated_1` flags it as an auto-detected duplicate of another column in the profile.

_duplicated_2 high anthropic:claude-opus-4-7

This column holds two-letter US state/territory abbreviations with a trailing space (e.g. 'AK ', 'AL ', 'AR '), with 53 distinct values across 1093 rows and no nulls. The distribution is almost perfectly uniform — entropy_ratio of 0.996 and the top value appearing just 21 times (1.92%) — suggesting the data is a regular grid of states repeated roughly 21 times each. The 53 categories slightly exceed the 50 states, consistent with DC and US territories, and the trailing whitespace in every value is a data-hygiene flag.

_duplicated_3 high anthropic:claude-opus-4-7

A binary categorical column completely dominated by the value 'FY' (1092 of 1093 rows, top_rate 0.999), with a single stray 'Date Type' entry. Entropy is effectively zero (0.0106), and the name '_duplicated_3' suggests this is a residual from a duplicated header or pivot artifact rather than a real feature. The lone 'Date Type' value looks like a header row that leaked into the data.

_duplicated_4 high anthropic:claude-opus-4-7

This column holds 22 distinct year strings ranging from at least 2001 onward, with each year appearing almost exactly 52 times across 1,093 rows and zero nulls. The near-uniform distribution (entropy ratio 0.986, top rate just 0.0476) and the count of 52 strongly suggest weekly observations stacked per year. The '_duplicated_4' name indicates saturn detected this as a duplicate of another column in the dataset.

_duplicated_5 high anthropic:claude-opus-4-7

Stored as text but the values are short numeric tokens (length 6-21, mean 6.85, one word in 99.9% of rows), almost certainly some kind of numeric ID. Cardinality is near-unique (1037 distinct out of 1093) yet 56 rows duplicate (5.1% duplicate rate), which is unexpected for an identifier and worth checking. The column name '_duplicated_5' also suggests this was auto-generated from a collision during ingest.

_duplicated_6 high anthropic:claude-opus-4-7

Almost every value is a single all-caps token of 5-6 characters (len_mean 5.68, one_word_rate 0.999), with 1090 unique values across 1093 rows and only 3 duplicates. Top tokens are mostly numeric strings like '91371', '18795', '158314', suggesting this is an identifier or numeric code column rather than natural text — though a stray header-like fragment ('ssa', 'disability', 'beneficiaries', 'age', '18-64*') hints the source file had embedded header rows mixed into the data.

_duplicated_7 medium anthropic:claude-opus-4-7

Column is typed categorical but holds 511 distinct numeric strings like "5.50", "5.07", "4.90" across 1093 rows, suggesting a continuous measurement (price, rating, or similar) stored as text. Distribution is nearly flat: entropy ratio is 0.968 and the most common value covers only 1.01% of rows. The "_duplicated_7" name implies this is a redundant copy of another column produced during a join or pivot.

_duplicated_8 high anthropic:claude-opus-4-7

Single-token, all-caps short strings (length 6-26, mean 6.84, ~1 word each) that are overwhelmingly numeric — top values like '468802', '2702811', '1646445' are integers stored as text. With 1041 unique values across 1093 rows and only 52 duplicates, this looks like a near-unique numeric identifier rather than a feature. The 'allcaps' and Flesch=121.22 signals are artifacts of digit-only tokens; no URLs, emojis, or boilerplate appear.

_duplicated_9 high anthropic:claude-opus-4-7

Almost certainly an identifier-like code column: 1081 unique values across 1093 rows, single-token entries averaging 4.85 characters, and the top repeated values are short numeric strings like '4190' and '8630'. The 99.9% allcaps and one_word rates plus max length of 14 suggest compact alphanumeric codes rather than prose. The 12 duplicates (1.1%) are minor but worth checking given the column is otherwise near-unique.

_duplicated_10 medium anthropic:claude-opus-4-7

Stored as a categorical but the values are numeric strings clustered tightly around 1.0 (top values include '0.97', '1.11', '1.01', '1.04', '0.92'), suggesting a ratio, multiplier, or normalised index. Distribution is highly diffuse with 199 distinct values across 1093 rows and an entropy ratio of 0.929, so no single bucket dominates (top_rate just 0.023). The '_duplicated_10' name implies this column is a redundant copy from an upstream join.

_duplicated_11 high anthropic:claude-opus-4-7

Almost certainly a short alphanumeric code column: 1062 distinct values across 1093 rows, 99.9% one-word and 99.9% all-caps, lengths between 3 and 30 characters with a median of 4. Top tokens are bare numeric strings like '6632' and '1573', each appearing only 2-3 times, suggesting ID-like codes rather than categories. The '_duplicated_11' name and 31 duplicates (2.8%) hint this is a copy of another column with minor collisions.

_duplicated_12 medium anthropic:claude-opus-4-7

This column holds 69 distinct numeric-looking strings (e.g. '0.38', '0.34', '0.32') across 1093 rows with no nulls, suggesting a decimal ratio or rate stored as text. The distribution is fairly flat — top value '0.38' covers only 5.0% and entropy ratio is 0.905 — so no single value dominates. The '_duplicated_12' name signals it is a duplicate of another column, which is the main thing to flag.

_duplicated_13 high anthropic:claude-opus-4-7

This column holds short, single-token uppercase strings that are almost entirely unique (1079 unique out of 1093), with lengths between 4 and 24 characters and a median of 5. The top-frequency tokens are all numeric strings ('17955', '5808', etc.) appearing only twice each, suggesting this is a near-unique identifier code rather than natural text. The 'allcaps' and 'one_word' rates near 99.9% confirm a structured code format, and the column name '_duplicated_13' hints it was auto-generated during a join or pivot.

_duplicated_14 high anthropic:claude-opus-4-7

This column, labelled `_duplicated_14`, holds 1093 numeric-looking strings (e.g. "31.13", "44.89") with 883 unique values and no nulls — almost certainly a continuous measurement that was ingested as categorical. Entropy ratio of 0.99 and a top frequency of just 4 (0.37%) confirm near-uniqueness; the `long_tail` alert and the `_duplicated_` prefix suggest it is a redundant copy of another numeric column.

_duplicated_15 medium anthropic:claude-opus-4-7

This column holds short single-token numeric strings (one_word_rate 0.999, len_mean 6.4, max 24) stored as text rather than integers, with 1019 unique values across 1093 rows. The value '0' appears 21 times while every other top value occurs only twice, suggesting '0' is a sentinel or default. The name '_duplicated_15' and the 6.8% duplicate rate hint this is a redundant copy of a numeric identifier column from an upstream join.

_duplicated_16 high anthropic:claude-opus-4-7

Despite being typed as text, this column is dominated by short single-token numeric strings (one_word_rate 0.999, len_mean 4.54, max 38) with 1057 unique values across 1093 rows. The top tokens are bare integers like "0" (21 occurrences), "1358", "840", suggesting an ID or numeric code stored as text rather than natural language. The allcaps_rate of 0.98 is an artifact of digits/non-letter content, and the column name `_duplicated_16` implies it was auto-generated during a column-name collision.

_duplicated_17 medium anthropic:claude-opus-4-7

Stored as categorical strings but the values are numeric ('0.00', '1.68', '0.58', '1.07'), suggesting a small-magnitude continuous measurement that was read as text. Cardinality is high (272 unique across 1093 rows) with very flat distribution: top value '0.00' covers only 1.92% and entropy ratio is 0.949. The '_duplicated_17' name implies this is a duplicate of another column produced during a join or concat.

_duplicated_18 high anthropic:claude-opus-4-7

Despite being typed as text, this column holds single-token numeric strings (one_word_rate 0.999, word_mean 1.00, len_mean 6.4) with 1021 unique values across 1093 rows — effectively a high-cardinality numeric ID stored as text. The value '0' appears 20 times while every other top value occurs at most twice, hinting at '0' as a sentinel/placeholder amid otherwise near-unique IDs. The 'allcaps' alert is a quirk of digit-only strings rather than meaningful casing.

_duplicated_19 high anthropic:claude-opus-4-7

Despite being typed as text, this column is essentially short numeric tokens — 99.9% are single words with mean length 4.05 characters and a max of 32. With 1018 unique values across 1093 rows and the most common entry '0' appearing only 21 times, it behaves like a high-cardinality numeric identifier stored as strings. The 'allcaps' alert (97.99%) is an artifact of digits having no lowercase form rather than a meaningful signal.

_duplicated_20 medium anthropic:claude-opus-4-7

Despite being typed categorical, every one of the 156 distinct values is a two-decimal numeric string between 0.00 and 0.61+, suggesting a proportion or rate that was stored as text. The distribution is nearly flat (entropy ratio 0.907), with the modal value '0.30' covering only 2.6% of 1093 rows and no nulls. The column name '_duplicated_20' implies it is a copy of another column flagged during ingestion.

_duplicated_21 high anthropic:claude-opus-4-7

This column is labelled `_duplicated_21`, suggesting saturn detected it as a duplicate of another field; values appear to be numeric strings stored as categorical. With 957 unique values across 1093 rows and an entropy ratio of 0.9885, it is nearly an identifier — the only meaningful concentration is `"0"` at 21 occurrences (1.92%), likely a sentinel or default. The long_tail alert and near-unique cardinality mean it carries almost no categorical signal as-is.

_duplicated_22 medium anthropic:claude-opus-4-7

This column holds 70 distinct short decimal strings clustered tightly around 0.16–0.25, suggesting a numeric ratio or rate (perhaps a proportion or probability) that has been stored as text. Distribution is fairly even with the top value '0.18' taking only 7.0% of rows and entropy ratio 0.84, so no single bucket dominates. The 'categorical' kind plus the '_duplicated_22' name hint that saturn detected this as a duplicate of another column and parsed it as strings rather than floats.

_duplicated_23 high anthropic:claude-opus-4-7

Despite the text kind, every value is a single short token (word_mean 1.004, len_mean 4.05, len_max 37) and the top values are all numeric strings like "0", "406", "404". With 1028 unique values across 1093 rows and a 5.9% duplicate_rate dominated by "0" (21 occurrences), this looks like a numeric identifier or count stored as text. The allcaps_rate of 0.98 is a quirk of digit-only strings being flagged as uppercase.

_duplicated_24 high anthropic:claude-opus-4-7

Despite being typed categorical, the values are numeric strings (e.g. '0.00', '47.52', '51.82'), suggesting a monetary or measurement field that was read as text. With 900 unique values across 1093 rows and entropy ratio 0.9874, it is nearly unique; the only meaningful concentration is '0.00' at 1.38% (15 rows). The '_duplicated_24' name implies this is a repeated copy of another column in the source.

_duplicated_25 high anthropic:claude-opus-4-7

Almost every value is a single short ALLCAPS token (one_word_rate 0.999, allcaps_rate 0.999, len_mean 4.9, word_mean 1.0), and 1088 of 1093 rows are unique with only 5 duplicates. The top tokens are mostly numeric strings like '3584' or '14860', suggesting this is a near-unique short code rather than natural text. The column name '_duplicated_25' hints it was auto-generated from a duplicated source column during profiling.

_duplicated_26 high anthropic:claude-opus-4-7

Single-token, all-caps strings averaging 4.57 characters with 1069 unique values across 1093 rows — almost certainly an identifier or short code column. The top values are all numeric strings (e.g., '2280', '2086') appearing 2-3 times each, suggesting these are numeric IDs stored as text rather than meaningful tokens. The 99.9% one-word and all-caps rates plus near-unique cardinality rule out free text.

_duplicated_27 high anthropic:claude-opus-4-7

Stored as categorical strings but every observed value parses as a two-decimal number (e.g. '37.60', '41.85'), so this is almost certainly a numeric measurement — possibly a price, rate or score — that was ingested as text. With 873 unique values across 1093 rows and entropy ratio 0.989, it is near-unique; the most frequent value '37.60' appears just 4 times (top rate 0.37%). The '_duplicated_27' name suggests it is a duplicate of another column produced upstream.

Please note 2021 data in columns H, K, R, and U are populated with 2020 data until current data is released. categorical

top value is 99.9% of rows

rows1,093

null0 (0.0%)

unique2

top_valueSSA-SA-FYWL.csv

top_rate0.999

cardinality2

entropy0.011

entropy_ratio0.011

Top values (rank 1–20)

SSA-SA-FYWL.csv — 1,092
File Name — 1

categorical

top value is 99.9% of rows

rows1,093

null0 (0.0%)

unique2

top_value2

top_rate0.999

cardinality2

entropy0.011

entropy_ratio0.011

Top values (rank 1–20)

2 — 1,092
File Version — 1

_duplicated_0 categorical

top value is 99.9% of rows

rows1,093

null0 (0.0%)

unique2

top_value3/13/2023

top_rate0.999

cardinality2

entropy0.011

entropy_ratio0.011

Top values (rank 1–20)

3/13/2023 — 1,092
Update Date — 1

_duplicated_1 categorical

rows1,093

null0 (0.0%)

unique11

top_valueATL

top_rate0.154

cardinality11

entropy3.277

entropy_ratio0.947

Top values (rank 1–20)

ATL — 168
DEN — 126
BOS — 126
PHL — 126
CHI — 126
DAL — 105
SEA — 84
SFO — 84
KCM — 84
NYC — 63
Region Code — 1

_duplicated_2 categorical

rows1,093

null0 (0.0%)

unique53

top_valueAK

top_rate0.019

cardinality53

entropy5.706

entropy_ratio0.996

Top values (rank 1–20)

AK — 21
AL — 21
AR — 21
AZ — 21
CA — 21
CO — 21
CT — 21
DC — 21
DE — 21
FL — 21
GA — 21
HI — 21
IA — 21
ID — 21
IL — 21
IN — 21
KS — 21
KY — 21
LA — 21
MA — 21

_duplicated_3 categorical

top value is 99.9% of rows

rows1,093

null0 (0.0%)

unique2

top_valueFY

top_rate0.999

cardinality2

entropy0.011

entropy_ratio0.011

Top values (rank 1–20)

FY — 1,092
Date Type — 1

_duplicated_4 categorical

rows1,093

null0 (0.0%)

unique22

top_value2001

top_rate0.048

cardinality22

entropy4.399

entropy_ratio0.986

Top values (rank 1–20)

2001 — 52
2002 — 52
2003 — 52
2004 — 52
2005 — 52
2006 — 52
2007 — 52
2008 — 52
2009 — 52
2010 — 52
2011 — 52
2012 — 52
2013 — 52
2014 — 52
2015 — 52
2016 — 52
2017 — 52
2018 — 52
2019 — 52
2020 — 52

_duplicated_5 text

99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,037

len_min6

len_max21

len_mean6.846

len_median7.000

len_p958.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates56

duplicate_rate0.051

vocab_size1,039

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

407208
5197780
802274
4470992
5075318
3483629
862241
384373
830302
4629213

_duplicated_6 text

99.7% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,090

len_min5

len_max40

len_mean5.683

len_median6.000

len_p956.000

word_mean1.005

word_median1.000

n_empty0

n_duplicates3

duplicate_rate2.74e-03

vocab_size1,094

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

12791
293982
31643
288035
262109
159132
35481
29628
77327
203305

_duplicated_7 categorical

rows1,093

null0 (0.0%)

unique511

top_value5.50

top_rate0.010

cardinality511

entropy8.710

entropy_ratio0.968

Top values (rank 1–20)

5.50 — 11
5.07 — 9
4.90 — 9
4.19 — 8
5.08 — 8
4.70 — 8
4.96 — 7
5.29 — 7
4.11 — 6
4.55 — 6
5.18 — 6
4.45 — 6
6.18 — 6
4.98 — 6
5.63 — 6
7.16 — 6
5.33 — 5
5.15 — 5
5.45 — 5
4.71 — 5

_duplicated_8 text

95.2% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,041

len_min6

len_max26

len_mean6.835

len_median7.000

len_p958.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates52

duplicate_rate0.048

vocab_size1,043

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

394417
4903798
770616
4182957
4813209
3318849
826760
354780
752975
4425908

_duplicated_9 text

98.9% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,081

len_min3

len_max14

len_mean4.854

len_median5.000

len_p956.000

word_mean1.001

word_median1.000

n_empty0

n_duplicates12

duplicate_rate0.011

vocab_size1,082

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

3487
42558
6257
37868
53646
22580
7902
4431
8934
41298

_duplicated_10 categorical

rows1,093

null0 (0.0%)

unique199

top_value0.97

top_rate0.023

cardinality199

entropy7.097

entropy_ratio0.929

Top values (rank 1–20)

0.97 — 25
1.11 — 24
1.01 — 23
1.04 — 19
0.92 — 19
1.08 — 18
1.02 — 17
1.12 — 16
1.07 — 16
1.15 — 16
0.96 — 15
1.00 — 14
1.13 — 14
1.10 — 14
0.89 — 13
1.23 — 13
0.94 — 13
0.90 — 13
1.05 — 13
0.85 — 13

_duplicated_11 text

97.2% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,062

len_min3

len_max30

len_mean4.498

len_median4.000

len_p955.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates31

duplicate_rate0.028

vocab_size1,064

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

1573
15940
2107
14243
18698
8951
3441
1938
2851
16121

_duplicated_12 categorical

rows1,093

null0 (0.0%)

unique69

top_value0.38

top_rate0.050

cardinality69

entropy5.527

entropy_ratio0.905

Top values (rank 1–20)

0.38 — 55
0.34 — 45
0.32 — 43
0.40 — 41
0.35 — 39
0.44 — 36
0.37 — 35
0.31 — 35
0.36 — 33
0.33 — 33
0.39 — 33
0.46 — 32
0.43 — 31
0.48 — 30
0.45 — 30
0.42 — 29
0.30 — 29
0.41 — 27
0.52 — 26
0.54 — 25

_duplicated_13 text

98.7% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,079

len_min4

len_max24

len_mean4.849

len_median5.000

len_p956.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates14

duplicate_rate0.013

vocab_size1,081

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

3369
41577
6096
37132
46098
22283
7360
4370
8916
39609

_duplicated_14 categorical

707 singleton categories

rows1,093

null0 (0.0%)

unique883

top_value31.13

top_rate3.66e-03

cardinality883

entropy9.686

entropy_ratio0.990

Top values (rank 1–20)

31.13 — 4
44.89 — 3
33.20 — 3
47.46 — 3
30.73 — 3
35.51 — 3
41.78 — 3
40.12 — 3
36.06 — 3
29.74 — 3
36.98 — 3
37.02 — 3
38.32 — 3
29.63 — 3
36.17 — 3
30.34 — 3
32.50 — 3
36.14 — 3
32.47 — 3
31.93 — 3

_duplicated_15 text

99.9% rows are a single word 98.0% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,019

len_min1

len_max24

len_mean6.415

len_median6.000

len_p957.000

word_mean1.003

word_median1.000

n_empty0

n_duplicates74

duplicate_rate0.068

vocab_size1,022

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.980

boilerplate_rate0.000

Sample values (first 10)

188453
1870106
299867
1366857
1847182
1301219
304573
1860793
250404
1751532

_duplicated_16 text

96.7% of rows are unique strings 99.9% rows are a single word 98.0% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,057

len_min1

len_max38

len_mean4.535

len_median5.000

len_p955.000

word_mean1.004

word_median1.000

n_empty0

n_duplicates36

duplicate_rate0.033

vocab_size1,061

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.980

boilerplate_rate0.000

Sample values (first 10)

970
22782
1214
21199
23598
10390
1730
1356
3878
19821

_duplicated_17 categorical

rows1,093

null0 (0.0%)

unique272

top_value0.00

top_rate0.019

cardinality272

entropy7.671

entropy_ratio0.949

Top values (rank 1–20)

0.00 — 21
1.68 — 13
0.58 — 12
1.07 — 12
1.08 — 12
1.24 — 12
1.15 — 12
0.64 — 12
1.52 — 11
1.42 — 11
1.18 — 11
1.70 — 11
1.81 — 11
1.20 — 10
1.09 — 10
1.44 — 10
1.11 — 10
0.94 — 10
1.78 — 10
1.56 — 10

_duplicated_18 text

99.9% rows are a single word 98.1% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null1 (0.1%)

unique1,021

len_min1

len_max26

len_mean6.407

len_median6.000

len_p957.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates71

duplicate_rate0.065

vocab_size1,023

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.981

boilerplate_rate0.000

Sample values (first 10)

187483
1847324
298653
1345658
863746
1289462
302843
1838723
246526
1731711

_duplicated_19 text

99.9% rows are a single word 98.0% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,018

len_min1

len_max32

len_mean4.047

len_median4.000

len_p955.000

word_mean1.004

word_median1.000

n_empty0

n_duplicates75

duplicate_rate0.069

vocab_size1,022

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.980

boilerplate_rate0.000

Sample values (first 10)

416
6917
343
5908
9872
2045
541
337
914
8064

_duplicated_20 categorical

rows1,093

null0 (0.0%)

unique156

top_value0.30

top_rate0.026

cardinality156

entropy6.609

entropy_ratio0.907

Top values (rank 1–20)

0.30 — 28
0.35 — 26
0.33 — 26
0.37 — 24
0.45 — 24
0.36 — 23
0.61 — 23
0.42 — 22
0.00 — 21
0.43 — 21
0.38 — 20
0.40 — 19
0.48 — 19
0.32 — 19
0.41 — 18
0.39 — 18
0.58 — 18
0.18 — 18
0.57 — 17
0.71 — 17

_duplicated_21 categorical

852 singleton categories

rows1,093

null0 (0.0%)

unique957

top_value0

top_rate0.019

cardinality957

entropy9.789

entropy_ratio0.989

Top values (rank 1–20)

0 — 21
1321 — 4
352 — 3
597 — 3
777 — 3
580 — 3
1184 — 3
1353 — 3
710 — 3
3128 — 3
463 — 3
227 — 3
1043 — 2
421 — 2
2891 — 2
5079 — 2
3228 — 2
299 — 2
3337 — 2
238 — 2

_duplicated_22 categorical

rows1,093

null0 (0.0%)

unique70

top_value0.18

top_rate0.070

cardinality70

entropy5.138

entropy_ratio0.838

Top values (rank 1–20)

0.18 — 77
0.20 — 75
0.21 — 64
0.22 — 62
0.25 — 61
0.17 — 61
0.23 — 49
0.19 — 46
0.24 — 45
0.16 — 43
0.15 — 41
0.26 — 36
0.14 — 27
0.12 — 26
0.27 — 26
0.13 — 25
0.11 — 25
0.10 — 22
0.00 — 21
0.29 — 20

_duplicated_23 text

99.9% rows are a single word 98.0% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,028

len_min1

len_max37

len_mean4.053

len_median4.000

len_p955.000

word_mean1.004

word_median1.000

n_empty0

n_duplicates65

duplicate_rate0.059

vocab_size1,032

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.980

boilerplate_rate0.000

Sample values (first 10)

404
6979
365
5876
8430
2164
510
314
939
8006

_duplicated_24 categorical

756 singleton categories

rows1,093

null6 (0.5%)

unique900

top_value0.00

top_rate0.014

cardinality900

entropy9.690

entropy_ratio0.987

Top values (rank 1–20)

0.00 — 15
47.52 — 5
51.82 — 4
47.04 — 4
54.24 — 4
48.91 — 4
51.90 — 3
48.89 — 3
37.97 — 3
51.35 — 3
44.18 — 3
63.06 — 3
40.64 — 3
38.66 — 3
57.98 — 3
30.92 — 3
53.94 — 3
60.20 — 3
39.15 — 3
30.05 — 3

_duplicated_25 text

99.5% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,088

len_min4

len_max18

len_mean4.906

len_median5.000

len_p956.000

word_mean1.001

word_median1.000

n_empty0

n_duplicates5

duplicate_rate4.57e-03

vocab_size1,089

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

3773
48556
6461
43008
54528
24447
7870
4684
9855
47615

_duplicated_26 text

97.8% of rows are unique strings 99.9% rows are a single word 99.9% rows are all-caps 95th-percentile length under 20 chars

rows1,093

null0 (0.0%)

unique1,069

len_min3

len_max28

len_mean4.574

len_median5.000

len_p955.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates24

duplicate_rate0.022

vocab_size1,071

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.999

boilerplate_rate0.000

Sample values (first 10)

1856
18872
2295
17664
22008
10304
3748
2163
3393
19458

_duplicated_27 categorical

693 singleton categories

rows1,093

null0 (0.0%)

unique873

top_value37.60

top_rate3.66e-03

cardinality873

entropy9.662

entropy_ratio0.989

Top values (rank 1–20)

37.60 — 4
36.60 — 4
41.85 — 4
38.47 — 4
49.19 — 3
32.63 — 3
42.28 — 3
29.96 — 3
42.14 — 3
38.12 — 3
33.04 — 3
40.70 — 3
40.45 — 3
33.84 — 3
30.27 — 3
31.35 — 3
39.43 — 3
33.77 — 3
30.69 — 3
31.39 — 3

Overview

Insights opt-in

**Please note** 2021 data in columns H, K, R, and U are populated with 2020 data until current data is released. categorical

categorical

_duplicated_0 categorical

_duplicated_1 categorical

_duplicated_2 categorical

_duplicated_3 categorical

_duplicated_4 categorical

_duplicated_5 text

_duplicated_6 text

_duplicated_7 categorical

_duplicated_8 text

_duplicated_9 text

_duplicated_10 categorical

_duplicated_11 text

_duplicated_12 categorical

_duplicated_13 text

_duplicated_14 categorical

_duplicated_15 text

_duplicated_16 text

_duplicated_17 categorical

_duplicated_18 text

_duplicated_19 text

_duplicated_20 categorical

_duplicated_21 categorical

_duplicated_22 categorical

_duplicated_23 text

_duplicated_24 categorical

_duplicated_25 text

_duplicated_26 text

_duplicated_27 categorical

Please note 2021 data in columns H, K, R, and U are populated with 2020 data until current data is released. categorical