saturn

/home/coolhand/html/datavis/data_trove/data/wild/animal_attacks/shark_attacks_gsaf.csv 6,462 rows sample n=6,462 seed 42 2026-06-21T23:26:09+00:00

Overview

Source	/home/coolhand/html/datavis/data_trove/data/wild/animal_attacks/shark_attacks_gsaf.csv
Total rows	6,462
Profiled sample	6,462
Columns	24
Generated	2026-06-21T23:26:09+00:00

Show data table

Per-column null rate across the corpus.
column	kind	null %
index	numeric	0.0%
Case Number	text	0.0%
Date	text	0.0%
Year	numeric	0.0%
Type	categorical	0.1%
Country	categorical	0.8%
Area	categorical	7.2%
Location	text	8.4%
Activity	text	8.5%
Name	text	3.3%
Unnamed: 9	categorical	99.6%
Age	categorical	44.4%
Injury	text	0.4%
Fatal (Y/N)	categorical	8.5%
Time	categorical	52.5%
Species	text	45.2%
Investigator or Source	text	0.3%
pdf	text	52.6%
href formula	text	52.6%
href	text	52.6%
Case Number.1	text	52.6%
Case Number.2	text	52.6%
original order	numeric	52.6%
Unnamed: 23	categorical	100.0%

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:default.

Dataset high anthropic:default

This dataset is the Global Shark Attack File (GSAF), containing 6,462 records of shark attack incidents spanning centuries of documented cases. The most important thing to examine first is the attack outcome: roughly 75% of incidents are non-fatal ('N'), but 1,400 are recorded as fatal ('Y'), and the 'Injury' column reveals 823 entries simply marked 'FATAL' — worth cross-checking for data consistency. A second priority is the geographic and activity breakdown: the USA dominates with 2,310 cases (36%), Florida alone accounts for 1,076, and surfing (1,025) and swimming (932) are by far the most dangerous activities. The 'Year' column carries a data quality warning — a maximum value of 3019 and high kurtosis signal outliers that should be cleaned before any time-series analysis.

Case Number.1 medium anthropic:default

This column appears to be a duplicate or alternate version of a case number field, with values formatted as date-like codes (e.g., '1966.12.26', '1923.00.00.a') suggesting archival case identifiers tied to dates with alphabetic suffixes for disambiguation. A striking 52.62% null rate makes this column unreliable for most analyses, and the near-unique flag (3,054 unique values across 6,462 rows) combined with only 8 true duplicates confirms it functions as a quasi-identifier. The allcaps rate of 78.97% is notable given that values appear to be alphanumeric codes rather than natural language, and the '.1' suffix in the column name strongly suggests this is a duplicated column from a merge or pivot operation.

Case Number.2 high anthropic:default

This column appears to be a structured case number identifier, likely encoding a date-based reference system (e.g., '1966.12.26', '1915.07.06.a.r') typical of archival, legal, or historical record catalogues. With 52.62% null rate across 6,462 rows and only 3,055 unique values out of 3,058 vocabulary size, the column is near-unique but severely incomplete. The 78.97% all-caps rate combined with date-like tokens and alphabetic suffixes (a, b, r) suggests a custom alphanumeric coding scheme rather than free text. Only 7 duplicate values exist, making this effectively an identifier where present.

href high anthropic:default

This column contains URLs linking to PDF source documents in a shark attack file directory (sharkattackfile.net/spreadsheets/pdf_directory/), serving as citation or evidence links for individual incident records. Over half the rows are null (52.62%), indicating many records lack a linked source document. Nearly all non-null values are single-token URLs (one_word_rate 0.988, url_rate 0.9997), with very few duplicates (11 duplicates out of 3,051 unique values), consistent with per-incident citation links. The high null rate is the key analyst concern — roughly half of incidents have no associated PDF reference.

href formula high anthropic:default

This column contains hyperlink formulas pointing to PDF source documents on sharkattackfile.net, each URL referencing a dated incident report (e.g., '1935.06.05.r-solomonislands.pdf'). Over half the rows (52.62%) are null, meaning many records lack a linked source document. Values are nearly all single-token URLs (one_word_rate 0.9879, url_rate 0.9997), and the extremely negative Flesch readability score (-820.18) confirms these are machine-generated URL strings, not natural text. Only 11 duplicate values exist across 3,051 unique entries, suggesting most cited PDFs are distinct incident references.

Case Number high anthropic:default

This column is a case identifier, with values that appear to encode dates in YYYY.MM.DD format (e.g., '2019.10.08'), suggesting case numbers tied to filing or incident dates. With 6,442 unique values out of 6,462 rows and a null rate of 0.0003, it is near-unique and functions as a primary key. The 18 duplicate values (duplicate_rate 0.0028) are a mild anomaly worth investigating — one value '2012.09.02.b' hints that suffixes are used to disambiguate same-date cases, implying the deduplication logic is not fully consistent. The allcaps_rate of 0.748 suggests a mix of formatting styles across records.

Species high anthropic:default

This column records shark species (and incident validity notes) from what appears to be a shark attack dataset, with values ranging from specific species ('White shark', 'Tiger shark', 'Bull shark') to free-text qualifiers like 'Shark involvement not confirmed' and 'Invalid'. The null rate is severe at 45.25%, and 58.56% of non-null values are duplicates — expected for a species label with only 1,466 unique values across 6,462 rows. More surprising is the multilingual alert: while 2,582 values are classified as English, 14 are German, 18 Finnish, 11 Chinese, and 8 Turkish among others, suggesting some records were entered in non-English locales or scraped from multilingual sources. The mix of species names, size descriptions ('4\' shark', '1.8 m [6\'] shark'), and incident-status phrases means this column is semantically heterogeneous and will require parsing or splitting before use.

pdf high anthropic:default

This column contains PDF filenames or partial file paths, evidenced by the '.pdf' suffixes in the top words and a mean token count of ~1 word per value. Over half the rows (52.55%) are null, and with 3,054 unique values out of 3,054 non-null distinct tokens the column is near-unique, functioning more like a document reference key than a descriptive field. The extremely negative Flesch readability score (−66.81) is consistent with structured filename strings rather than natural language. A small number of duplicates (12) suggest some documents are referenced by multiple records.

Injury high anthropic:default

This column describes the outcome or nature of injuries in what appears to be a shark attack dataset, containing free-text descriptions ranging from 'FATAL' to specific anatomical bite locations (e.g., 'Left foot bitten', 'Leg bitten'). The dominant value is 'FATAL' appearing 823 times, making it by far the most frequent entry. Two signals stand out: a high duplicate rate of 41.9% (2,695 duplicates across 6,462 rows) driven by repetitive categorical-style phrases, and an all-caps rate of 13.1% suggesting inconsistent data entry conventions. Additionally, 496 German-language entries co-exist with 3,812 English ones, indicating multilingual sourcing that will complicate any text-based analysis.

Activity high anthropic:default

This column captures the water-based activity a person was engaged in at the time of an incident (likely a shark attack or drowning registry), dominated by Surfing (1,025) and Swimming (932) with a small tail of descriptive phrases. Despite being labelled 'text', it behaves largely as a categorical label: 62.9% of values are single words, only 1,516 unique values exist across 6,462 rows, and the duplicate rate is 74.3%, indicating a loosely controlled vocabulary rather than a strict enum. The median string length of 8 characters versus a max of 254 suggests a mix of clean category entries and occasional free-text annotations, which may require normalisation before use.

Investigator or Source high anthropic:default

This column records the investigator or data source credited for each shark attack incident, typically formatted as an abbreviated name plus an organizational affiliation (e.g., 'C. Moore, GSAF'). GSAF (Global Shark Attack File) dominates the top entries and appears in 983 word tokens, making it the primary contributing organization. The duplicate rate of 22.7% (1,464 duplicates across 6,462 rows) is expected given a finite set of investigators filing multiple reports, but the multilingual alert across 22 detected languages is notable — 'en' accounts for 4,457 entries while 'es' (134), 'fr' (100), 'de' (56), and 'it' (62) reflect international source attribution or non-English investigator names. The near-unique cardinality (4,979 unique values out of 6,462) suggests many entries are one-off source combinations rather than standardized identifiers.

Location high anthropic:default

This column captures geographic incident locations, predominantly structured as 'City, County' strings—strongly associated with shark attack or ocean incident records given top values like 'New Smyrna Beach, Volusia County' (181 occurrences) and dominant words 'county', 'beach', 'island', 'bay'. The duplicate rate is notably high at ~29.9% (1,769 duplicates out of 6,462 rows), reflecting repeated incidents at the same hotspot locations rather than data error. The multilingual alert is triggered by automated language detection misclassifying short geographic names (e.g. 'Durban', 'Boa Viagem, Recife') as non-English—3,746 of 6,462 values are detected as English, with the remainder split across 29 other 'languages' due to short-string ambiguity. Null rate is 8.43%, which may represent unknown or offshore incident locations.

Time high anthropic:default

This column captures time-of-day information, but it is stored inconsistently: values mix coarse labels ('Afternoon', 'Morning') with specific clock times in 'HhMM' format ('11h00', '16h30'), yielding 366 unique values across 6,462 rows. The null rate is severe at 52.49%, meaning over half of all records are missing a time entirely. The top value 'Afternoon' accounts for only 6.29% of rows, and entropy ratio is 0.77, indicating a long tail of rarely-seen time strings — likely data entry inconsistency across sources or time periods.

Date high anthropic:default

This column contains free-form date annotations for what appears to be a historical or archival dataset, storing dates in highly inconsistent formats: bare years (e.g., '1957', '1942'), structured dates ('05-Oct-2003'), vague phrases ('Before 1958', 'No date', 'ca.', 'summer', 'late'). The top word 'reported' appearing 559 times suggests many values follow a pattern like 'reported [date]', which is a red flag for downstream parsing. With 14.1% duplicate rate (909 duplicates across 5,552 unique values) and a max length of 64 characters, this column cannot be safely cast to a datetime type without substantial normalization work.

original order medium anthropic:default

This column appears to be a positional or sequence index assigned to records, likely reflecting the original sort order of items in a source dataset. The most striking issue is a 52.62% null rate — over half the rows carry no value, which is highly anomalous for an ordering field and suggests either a join that left many rows unmatched or that ordering was only recorded for a subset of records. With only 3,061 unique values across 6,462 rows (and ~3,061 non-null rows expected given the null rate), duplicates exist even among non-null entries, undermining uniqueness as a sequence key. The distribution is mildly right-skewed (skew 0.99) with notable leptokurtosis (3.55) and 27 outliers, hinting at a few unusually large order values relative to the bulk.

Age high anthropic:default

This column represents age stored as a categorical (string) type rather than a numeric type, covering 154 distinct values across 6,462 rows. The most striking issue is a 44.43% null rate, flagged as an alert, meaning nearly half the records lack an age value. The distribution skews young — the top values cluster tightly between ages 15–25, with '17' being most frequent at only 4.46% of rows, suggesting a youth-focused population (e.g., students or a juvenile-related dataset). The high entropy ratio of 0.80 confirms values are spread broadly across the 154 categories despite the youth concentration.

Year high anthropic:default

This column represents a publication or production year for 6,462 records, with values spanning a plausible modern range centered around a median of 1980 and IQR of 1943–2006. Two signals are highly surprising: a minimum of 0.0 (nearly 2% zero rate, almost certainly sentinel/missing-year placeholders) and a maximum of 3019.0, which is a data-entry error (likely a typo for a 4-digit year such as 2019). These outliers (266 records, ~4.1%) drive extreme negative skew (−6.55) and extraordinary kurtosis (42.54), masking what is otherwise a fairly clean temporal distribution.

Area high anthropic:default

This column represents geographic sub-national regions (states, provinces) drawn from multiple countries — the US (Florida, Hawaii, California, South Carolina), Australia (New South Wales, Queensland, Western Australia), and South Africa (KwaZulu-Natal, Western Cape Province, Eastern Cape Province). Florida dominates at 17.9% of records, while 810 unique values against 6,462 rows signals a severe long-tail distribution where the vast majority of areas appear only rarely. The 7.16% null rate and high geographic diversity across at least three countries suggest this dataset is multinational in scope.

Name high anthropic:default

This column is a 'Name' field from what appears to be a historical incident or casualty dataset (likely maritime, given top values like 'boat', 'sailor', 'a sailor'). It is heavily contaminated with non-name entries: the most frequent value is 'male' (579 occurrences), followed by 'female' (106), 'boy' (23), '2 males' (19), and 'boat' (14), indicating that gender/role descriptors were freely mixed with actual proper names. The duplicate rate of 14.5% (908 duplicates across 5,339 unique values) and a 3.33% null rate further confirm this column is inconsistently populated and not a clean identifier.

index high anthropic:default

This column is a row index — a sequential integer identifier running from 0 to 6461 with 6462 unique values and no nulls, perfectly matching the row count. Its distribution is exactly uniform (skew = 0.0, kurtosis ≈ −1.2, mean = median = 3230.5, zero outliers), confirming it was generated as a positional index rather than carrying any domain meaning. The single 'zero' (zero_rate ≈ 0.00015) is simply row 0. There is no analytical signal here.

Type high anthropic:default

This column classifies shark attack incidents by the nature of the encounter, with 12 distinct categories across 6,462 records. It is heavily dominated by 'Unprovoked' at 73% of all records, creating notable class imbalance. A surprising data quality issue is the fragmentation of watercraft-related incidents across three near-synonymous labels — 'Watercraft' (142), 'Boat' (109), and 'Boating' (92) — which are almost certainly the same category and should be consolidated. The 'Invalid' category (552 records, ~8.5%) also warrants attention as it may represent records that should be excluded from incident analysis.

Country high anthropic:default

This column records the country of origin or occurrence for each record, with 205 distinct country values across 6,462 rows. The distribution is heavily skewed: USA alone accounts for 36% of records (2,310), followed by AUSTRALIA at 21% (1,374) and SOUTH AFRICA at 9% (585), meaning these three countries together represent roughly two-thirds of the dataset. The entropy ratio of 0.51 confirms moderate concentration despite 205 unique values, and the near-zero null rate (0.79%) means coverage is excellent.

Fatal (Y/N) high anthropic:default

This column is a binary fatality flag for incidents, expected to hold only 'Y' or 'N' values. The dominant value is 'N' (4,439 occurrences, 75% of records), with 'Y' accounting for 1,400 cases (~21.7%). Surprising data quality issues exist: 5 rows contain clearly erroneous values ('F', 'M', '2017', lowercase 'y') suggesting data entry errors or row misalignment, and 71 rows are labeled 'UNKNOWN'. The 8.46% null rate adds further incompleteness.

Numeric correlation

Show data table

Pearson correlation across 3 numeric columns (values clipped to 2 decimals).
	index	Year	original order
index	+1.00	-0.40	-0.97
Year	-0.40	+1.00	+0.38
original order	-0.97	+0.38	+1.00

Languages detected

Per-string language detection across text columns (sampled).

Show data table

Per-language counts (total 16,164 detected strings).
lang	count	share
en	14597	90.3%
de	627	3.9%
es	219	1.4%
fr	177	1.1%
it	123	0.8%
pt	80	0.5%
nl	45	0.3%
ru	31	0.2%
fi	26	0.2%
sv	23	0.1%
zh	22	0.1%
pl	18	0.1%
ja	16	0.1%
id	15	0.1%
ca	14	0.1%
ceb	13	0.1%
tr	13	0.1%
eu	13	0.1%
vi	12	0.1%
cy	10	0.1%
ro	9	0.1%
ms	8	0.0%
eo	7	0.0%
sq	7	0.0%
af	5	0.0%
hu	5	0.0%
hr	4	0.0%
jbo	4	0.0%
war	4	0.0%
cs	4	0.0%
lv	3	0.0%
sh	3	0.0%
sw	2	0.0%
tl	1	0.0%
th	1	0.0%
no	1	0.0%
sl	1	0.0%
uk	1	0.0%

index numeric

rows6,462

null0 (0.0%)

unique6,462

min0.000

max6,461

mean3,230

median3,230

std1,866

q11,615

q34,846

iqr3,230

skew0.000

kurtosis-1.200

n_outliers0

outlier_rate0.000

zero_rate1.55e-04

Show data table

Histogram bins for index (median: 3230.5).
bin	count
0 – 161.5	162
161.5 – 323.1	162
323.1 – 484.6	161
484.6 – 646.1	162
646.1 – 807.6	161
807.6 – 969.2	162
969.2 – 1131	161
1131 – 1292	162
1292 – 1454	161
1454 – 1615	162
1615 – 1777	161
1777 – 1938	162
1938 – 2100	161
2100 – 2261	162
2261 – 2423	161
2423 – 2584	162
2584 – 2746	161
2746 – 2907	162
2907 – 3069	161
3069 – 3230	162
3230 – 3392	162
3392 – 3554	161
3554 – 3715	162
3715 – 3877	161
3877 – 4038	162
4038 – 4200	161
4200 – 4361	162
4361 – 4523	161
4523 – 4684	162
4684 – 4846	161
4846 – 5007	162
5007 – 5169	161
5169 – 5330	162
5330 – 5492	161
5492 – 5653	162
5653 – 5815	161
5815 – 5976	162
5976 – 6138	161
6138 – 6299	162
6299 – 6461	162

Case Number text

99.7% of rows are unique strings 99.9% rows are a single word 74.8% rows are all-caps 95th-percentile length under 20 chars

rows6,462

null2 (0.0%)

unique6,442

len_min6

len_max18

len_mean10.627

len_median10.000

len_p9512.000

word_mean1.001

word_median1.000

n_empty0

n_duplicates18

duplicate_rate2.79e-03

vocab_size6,445

readability_flesch_mean121.220

emoji_rate0.000

url_rate0.000

one_word_rate0.999

allcaps_rate0.748

boilerplate_rate0.000

Show data table

Character-length distribution for Case Number (mean: 10.626934984520124).
chars	count
6 – 6	4
6 – 7	0
7 – 7	0
7 – 7	122
7 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 9	0
9 – 9	0
9 – 9	0
9 – 10	0
10 – 10	0
10 – 10	4150
10 – 10	0
10 – 11	0
11 – 11	14
11 – 11	0
11 – 12	0
12 – 12	0
12 – 12	2128
12 – 13	0
13 – 13	0
13 – 13	14
13 – 14	0
14 – 14	0
14 – 14	24
14 – 14	0
14 – 15	0
15 – 15	0
15 – 15	2
15 – 16	0
16 – 16	0
16 – 16	1
16 – 16	0
16 – 17	0
17 – 17	0
17 – 17	0
17 – 18	0
18 – 18	1

Sample values (first 10)

2020.01.05
1942.11.01
1910.03.08
1926.08.24
1992.06.17
1830.07.26
1990.00.00
1896.12.17.R
1925.11.22
2017.10.21

Date text

87.3% rows are a single word

rows6,462

null1 (0.0%)

unique5,552

len_min4

len_max64

len_mean11.425

len_median11.000

len_p9520.000

word_mean1.155

word_median1.000

n_empty0

n_duplicates909

duplicate_rate0.141

vocab_size5,496

readability_flesch_mean89.927

emoji_rate0.000

url_rate0.000

one_word_rate0.873

allcaps_rate0.051

boilerplate_rate0.000

Show data table

Character-length distribution for Date (mean: 11.425475932518186).
chars	count
4 – 6	308
6 – 7	0
7 – 8	347
8 – 10	31
10 – 12	5088
12 – 13	33
13 – 14	43
14 – 16	4
16 – 18	7
18 – 19	5
19 – 20	552
20 – 22	5
22 – 24	5
24 – 25	7
25 – 26	8
26 – 28	0
28 – 30	1
30 – 31	0
31 – 32	3
32 – 34	1
34 – 36	2
36 – 37	1
37 – 38	2
38 – 40	0
40 – 42	1
42 – 43	1
43 – 44	1
44 – 46	0
46 – 48	0
48 – 49	1
49 – 50	0
50 – 52	0
52 – 54	0
54 – 55	2
55 – 56	1
56 – 58	0
58 – 60	0
60 – 61	0
61 – 62	0
62 – 64	1

Sample values (first 10)

05-Jan-2020
Nov-1942
Mar-1910
23-Jul-1926
17-Jun-1992
26-Jul-1830
19-Dec-1989
Reported 17-Dec-1896
22-Nov-1925
21-Oct-2017

Year numeric

skew=-6.55

rows6,462

null3 (0.0%)

unique252

min0.000

max3,019

mean1,930

median1,980

std278.316

q11,943

q32,006

iqr63.000

skew-6.554

kurtosis42.541

n_outliers266

outlier_rate0.041

zero_rate0.019

Show data table

Histogram bins for Year (median: 1980.0).
bin	count
0 – 75.47	126
75.47 – 150.9	1
150.9 – 226.4	0
226.4 – 301.9	0
301.9 – 377.4	0
377.4 – 452.8	0
452.8 – 528.3	1
528.3 – 603.8	0
603.8 – 679.3	0
679.3 – 754.8	0
754.8 – 830.2	0
830.2 – 905.7	0
905.7 – 981.2	0
981.2 – 1057	0
1057 – 1132	0
1132 – 1208	0
1208 – 1283	0
1283 – 1359	0
1359 – 1434	0
1434 – 1510	0
1510 – 1585	4
1585 – 1660	6
1660 – 1736	7
1736 – 1811	37
1811 – 1887	371
1887 – 1962	1975
1962 – 2038	3930
2038 – 2113	0
2113 – 2189	0
2189 – 2264	0
2264 – 2340	0
2340 – 2415	0
2415 – 2491	0
2491 – 2566	0
2566 – 2642	0
2642 – 2717	0
2717 – 2793	0
2793 – 2868	0
2868 – 2944	0
2944 – 3019	1

Type categorical

rows6,462

null5 (0.1%)

unique12

top_valueUnprovoked

top_rate0.730

cardinality12

entropy1.457

entropy_ratio0.406

Show data table

Top values for Type (12 unique shown, of 12 total).
value	count	share
Unprovoked	4716	73.0%
Provoked	593	9.2%
Invalid	552	8.5%
Sea Disaster	239	3.7%
Watercraft	142	2.2%
Boat	109	1.7%
Boating	92	1.4%
Questionable	10	0.2%
Unconfirmed	1	0.0%
Unverified	1	0.0%
Under investigation	1	0.0%
Boatomg	1	0.0%

Top values (rank 1–20)

Unprovoked — 4,716
Provoked — 593
Invalid — 552
Sea Disaster — 239
Watercraft — 142
Boat — 109
Boating — 92
Questionable — 10
Unconfirmed — 1
Unverified — 1
Under investigation — 1
Boatomg — 1

Country categorical

rows6,462

null51 (0.8%)

unique205

top_valueUSA

top_rate0.360

cardinality205

entropy3.909

entropy_ratio0.509

Show data table

Top values for Country (20 unique shown, of 205 total).
value	count	share
USA	2310	35.7%
AUSTRALIA	1374	21.3%
SOUTH AFRICA	585	9.1%
NEW ZEALAND	135	2.1%
PAPUA NEW GUINEA	135	2.1%
BAHAMAS	115	1.8%
BRAZIL	113	1.7%
MEXICO	95	1.5%
ITALY	71	1.1%
PHILIPPINES	62	1.0%
FIJI	62	1.0%
REUNION	60	0.9%
NEW CALEDONIA	56	0.9%
CUBA	46	0.7%
SPAIN	44	0.7%
MOZAMBIQUE	44	0.7%
EGYPT	42	0.6%
INDIA	40	0.6%
JAPAN	34	0.5%
CROATIA	34	0.5%

Top values (rank 1–20)

USA — 2,310
AUSTRALIA — 1,374
SOUTH AFRICA — 585
NEW ZEALAND — 135
PAPUA NEW GUINEA — 135
BAHAMAS — 115
BRAZIL — 113
MEXICO — 95
ITALY — 71
PHILIPPINES — 62
FIJI — 62
REUNION — 60
NEW CALEDONIA — 56
CUBA — 46
SPAIN — 44
MOZAMBIQUE — 44
EGYPT — 42
INDIA — 40
JAPAN — 34
CROATIA — 34

Area categorical

540 singleton categories

rows6,462

null463 (7.2%)

unique810

top_valueFlorida

top_rate0.179

cardinality810

entropy6.163

entropy_ratio0.638

Show data table

Top values for Area (20 unique shown, of 810 total).
value	count	share
Florida	1076	16.7%
New South Wales	498	7.7%
Queensland	325	5.0%
Hawaii	312	4.8%
California	294	4.5%
KwaZulu-Natal	215	3.3%
Western Australia	197	3.0%
Western Cape Province	195	3.0%
Eastern Cape Province	165	2.6%
South Carolina	163	2.5%
North Carolina	111	1.7%
South Australia	104	1.6%
Victoria	92	1.4%
Texas	75	1.2%
Pernambuco	74	1.1%
Torres Strait	72	1.1%
North Island	70	1.1%
New Jersey	55	0.9%
Tasmania	42	0.6%
South Island	41	0.6%

Top values (rank 1–20)

Florida — 1,076
New South Wales — 498
Queensland — 325
Hawaii — 312
California — 294
KwaZulu-Natal — 215
Western Australia — 197
Western Cape Province — 195
Eastern Cape Province — 165
South Carolina — 163
North Carolina — 111
South Australia — 104
Victoria — 92
Texas — 75
Pernambuco — 74
Torres Strait — 72
North Island — 70
New Jersey — 55
Tasmania — 42
South Island — 41

Location text

31 languages detected in sample 29.9% duplicate strings

rows6,462

null545 (8.4%)

unique4,148

len_min3

len_max119

len_mean22.755

len_median21.000

len_p9547.000

word_mean3.540

word_median3.000

n_empty0

n_duplicates1,769

duplicate_rate0.299

vocab_size4,483

readability_flesch_mean53.399

emoji_rate0.000

url_rate0.000

one_word_rate0.149

allcaps_rate5.07e-04

boilerplate_rate0.000

Show data table

Character-length distribution for Location (mean: 22.75477437890823).
chars	count
3 – 6	106
6 – 9	542
9 – 12	758
12 – 15	720
15 – 18	445
18 – 20	358
20 – 23	352
23 – 26	393
26 – 29	524
29 – 32	293
32 – 35	505
35 – 38	203
38 – 41	152
41 – 44	124
44 – 46	124
46 – 49	95
49 – 52	54
52 – 55	52
55 – 58	26
58 – 61	17
61 – 64	21
64 – 67	14
67 – 70	8
70 – 73	5
73 – 76	2
76 – 78	4
78 – 81	6
81 – 84	1
84 – 87	2
87 – 90	1
90 – 93	1
93 – 96	0
96 – 99	1
99 – 102	1
102 – 104	2
104 – 107	0
107 – 110	3
110 – 113	1
113 – 116	0
116 – 119	1

Sample values (first 10)

Esperance
Mornington Island, Gulf of Carpentaria
Tripoli
Rosebud
Boa Viagem, Recife
Sydney Harbor
North Jetty Park, Fort Pierce, St Lucie County
Caibarien Harbor
Middle Brighton, Port Phillip
Gars Garabulli

Activity text

62.9% rows are a single word 74.3% duplicate strings

rows6,462

null552 (8.5%)

unique1,516

len_min1

len_max254

len_mean16.208

len_median8.000

len_p9549.000

word_mean2.497

word_median1.000

n_empty0

n_duplicates4,394

duplicate_rate0.743

vocab_size2,244

readability_flesch_mean39.558

emoji_rate0.000

url_rate0.000

one_word_rate0.629

allcaps_rate5.08e-04

boilerplate_rate0.000

Show data table

Character-length distribution for Activity (mean: 16.207952622673435).
chars	count
1 – 7	2033
7 – 14	2150
14 – 20	468
20 – 26	376
26 – 33	228
33 – 39	177
39 – 45	134
45 – 52	84
52 – 58	42
58 – 64	51
64 – 71	35
71 – 77	25
77 – 83	20
83 – 90	10
90 – 96	9
96 – 102	14
102 – 109	10
109 – 115	6
115 – 121	5
121 – 128	1
128 – 134	1
134 – 140	7
140 – 146	4
146 – 153	4
153 – 159	3
159 – 165	0
165 – 172	2
172 – 178	1
178 – 184	0
184 – 191	2
191 – 197	3
197 – 203	0
203 – 210	0
210 – 216	1
216 – 222	0
222 – 229	0
229 – 235	2
235 – 241	1
241 – 248	0
248 – 254	1

Sample values (first 10)

Scuba diving
Swimming
Swimming
Launching rowboat through the surf
Surfing
Wreck of the USS Somers
Swimming
Sea Disaster
Fishing
Spearfishing

Name text

rows6,462

null215 (3.3%)

unique5,339

len_min1

len_max221

len_mean14.830

len_median13.000

len_p9535.000

word_mean2.453

word_median2.000

n_empty0

n_duplicates908

duplicate_rate0.145

vocab_size6,536

readability_flesch_mean51.734

emoji_rate0.000

url_rate0.000

one_word_rate0.166

allcaps_rate6.88e-03

boilerplate_rate0.000

Show data table

Character-length distribution for Name (mean: 14.830158476068513).
chars	count
1 – 6	939
6 – 12	1355
12 – 18	2774
18 – 23	488
23 – 28	223
28 – 34	128
34 – 40	94
40 – 45	47
45 – 50	66
50 – 56	34
56 – 62	25
62 – 67	26
67 – 72	20
72 – 78	6
78 – 84	3
84 – 89	8
89 – 94	2
94 – 100	1
100 – 106	1
106 – 111	2
111 – 116	1
116 – 122	0
122 – 128	0
128 – 133	1
133 – 138	1
138 – 144	1
144 – 150	0
150 – 155	0
155 – 160	0
160 – 166	0
166 – 172	0
172 – 177	0
177 – 182	0
182 – 188	0
188 – 194	0
194 – 199	0
199 – 204	0
204 – 210	0
210 – 216	0
216 – 221	1

Sample values (first 10)

Peter ___
sailor
Jules Antoine
Mrs. Hoskin
Siale Sime
male
Todd R. Wenke
male
male
Susan Peteka

Unnamed: 9 categorical

99.6% null

rows6,462

null6,434 (99.6%)

unique2

top_valueM

top_rate0.857

cardinality2

entropy0.592

entropy_ratio0.592

Show data table

Top values for Unnamed: 9 (2 unique shown, of 2 total).
value	count	share
M	24	0.4%
F	4	0.1%

Top values (rank 1–20)

M — 24
F — 4

Age categorical

44.4% null

rows6,462

null2,871 (44.4%)

unique154

top_value17

top_rate0.045

cardinality154

entropy5.827

entropy_ratio0.802

Show data table

Top values for Age (20 unique shown, of 154 total).
value	count	share
17	160	2.5%
18	155	2.4%
20	146	2.3%
19	145	2.2%
16	140	2.2%
15	140	2.2%
21	122	1.9%
22	118	1.8%
25	111	1.7%
24	109	1.7%
14	104	1.6%
13	97	1.5%
26	85	1.3%
23	83	1.3%
28	82	1.3%
30	80	1.2%
29	80	1.2%
27	79	1.2%
12	75	1.2%
32	72	1.1%

Top values (rank 1–20)

17 — 160
18 — 155
20 — 146
19 — 145
16 — 140
15 — 140
21 — 122
22 — 118
25 — 111
24 — 109
14 — 104
13 — 97
26 — 85
23 — 83
28 — 82
30 — 80
29 — 80
27 — 79
12 — 75
32 — 72

Injury text

10 languages detected in sample 13.1% rows are all-caps 41.9% duplicate strings

rows6,462

null29 (0.4%)

unique3,738

len_min5

len_max234

len_mean31.529

len_median25.000

len_p9582.000

word_mean5.414

word_median4.000

n_empty0

n_duplicates2,695

duplicate_rate0.419

vocab_size2,550

readability_flesch_mean53.742

emoji_rate0.000

url_rate0.000

one_word_rate0.149

allcaps_rate0.131

boilerplate_rate0.000

Show data table

Character-length distribution for Injury (mean: 31.52868024249961).
chars	count
5 – 11	1206
11 – 16	740
16 – 22	851
22 – 28	788
28 – 34	636
34 – 39	429
39 – 45	382
45 – 51	251
51 – 57	238
57 – 62	200
62 – 68	132
68 – 74	128
74 – 79	100
79 – 85	73
85 – 91	55
91 – 97	47
97 – 102	45
102 – 108	31
108 – 114	13
114 – 120	22
120 – 125	15
125 – 131	6
131 – 137	9
137 – 142	10
142 – 148	4
148 – 154	2
154 – 160	2
160 – 165	0
165 – 171	3
171 – 177	3
177 – 182	3
182 – 188	1
188 – 194	0
194 – 200	2
200 – 205	2
205 – 211	1
211 – 217	0
217 – 223	1
223 – 228	0
228 – 234	2

Sample values (first 10)

FATAL
Hip bitten
Right hand severely bitten by netted shark PROVOKED INCIDENT
Arm injured
No details
FATAL
No injury, board broken in two
FATAL
Thigh lacerated
Minor injuries

Fatal (Y/N) categorical

rows6,462

null547 (8.5%)

unique7

top_valueN

top_rate0.750

cardinality7

entropy0.890

entropy_ratio0.317

Show data table

Top values for Fatal (Y/N) (7 unique shown, of 7 total).
value	count	share
N	4439	68.7%
Y	1400	21.7%
UNKNOWN	71	1.1%
F	2	0.0%
M	1	0.0%
2017	1	0.0%
y	1	0.0%

Top values (rank 1–20)

N — 4,439
Y — 1,400
UNKNOWN — 71
F — 2
M — 1
2017 — 1
y — 1

Time categorical

199 singleton categories 52.5% null

rows6,462

null3,392 (52.5%)

unique366

top_valueAfternoon

top_rate0.063

cardinality366

entropy6.559

entropy_ratio0.770

Show data table

Top values for Time (20 unique shown, of 366 total).
value	count	share
Afternoon	193	3.0%
11h00	131	2.0%
Morning	126	1.9%
12h00	113	1.7%
15h00	111	1.7%
16h00	106	1.6%
14h00	102	1.6%
16h30	79	1.2%
17h30	77	1.2%
13h00	75	1.2%
17h00	74	1.1%
14h30	73	1.1%
18h00	72	1.1%
15h30	67	1.0%
11h30	65	1.0%
13h30	64	1.0%
10h00	63	1.0%
Night	63	1.0%
09h00	55	0.9%
10h30	51	0.8%

Top values (rank 1–20)

Afternoon — 193
11h00 — 131
Morning — 126
12h00 — 113
15h00 — 111
16h00 — 106
14h00 — 102
16h30 — 79
17h30 — 77
13h00 — 75
17h00 — 74
14h30 — 73
18h00 — 72
15h30 — 67
11h30 — 65
13h30 — 64
10h00 — 63
Night — 63
09h00 — 55
10h30 — 51

Species text

15 languages detected in sample 45.2% null 58.6% duplicate strings

rows6,462

null2,924 (45.2%)

unique1,466

len_min3

len_max194

len_mean22.951

len_median17.000

len_p9550.000

word_mean4.445

word_median4.000

n_empty0

n_duplicates2,072

duplicate_rate0.586

vocab_size1,105

readability_flesch_mean88.631

emoji_rate0.000

url_rate0.000

one_word_rate0.041

allcaps_rate2.83e-04

boilerplate_rate0.000

Show data table

Character-length distribution for Species (mean: 22.95110231769361).
chars	count
3 – 8	105
8 – 13	855
13 – 17	833
17 – 22	448
22 – 27	324
27 – 32	285
32 – 36	120
36 – 41	127
41 – 46	115
46 – 51	170
51 – 56	32
56 – 60	25
60 – 65	21
65 – 70	13
70 – 75	17
75 – 79	11
79 – 84	7
84 – 89	4
89 – 94	6
94 – 98	4
98 – 103	2
103 – 108	0
108 – 113	1
113 – 118	1
118 – 122	0
122 – 127	2
127 – 132	1
132 – 137	1
137 – 141	1
141 – 146	1
146 – 151	0
151 – 156	2
156 – 161	1
161 – 165	1
165 – 170	0
170 – 175	0
175 – 180	0
180 – 184	1
184 – 189	0
189 – 194	1

Sample values (first 10)

White shark
Questionable incident
Questionable incident
Carpet shark, 5'
Bull shark
Invalid
1 m shark
Shark involvement prior to death unconfirmed
"Attacked by a number of sharks"
2 m shark

Investigator or Source text

23 languages detected in sample 22.7% duplicate strings

rows6,462

null19 (0.3%)

unique4,979

len_min3

len_max210

len_mean32.237

len_median26.000

len_p9577.000

word_mean4.792

word_median3.000

n_empty0

n_duplicates1,464

duplicate_rate0.227

vocab_size7,898

readability_flesch_mean73.623

emoji_rate0.000

url_rate2.33e-03

one_word_rate0.026

allcaps_rate0.018

boilerplate_rate0.000

Show data table

Character-length distribution for Investigator or Source (mean: 32.23731181126804).
chars	count
3 – 8	152
8 – 13	461
13 – 19	1164
19 – 24	844
24 – 29	1098
29 – 34	816
34 – 39	400
39 – 44	315
44 – 50	203
50 – 55	174
55 – 60	150
60 – 65	142
65 – 70	116
70 – 75	59
75 – 81	61
81 – 86	45
86 – 91	39
91 – 96	40
96 – 101	30
101 – 106	23
106 – 112	23
112 – 117	18
117 – 122	14
122 – 127	14
127 – 132	9
132 – 138	8
138 – 143	5
143 – 148	5
148 – 153	4
153 – 158	3
158 – 163	4
163 – 169	0
169 – 174	0
174 – 179	1
179 – 184	0
184 – 189	1
189 – 194	1
194 – 200	0
200 – 205	0
205 – 210	1

Sample values (first 10)

B. Myatt, GSAF
M. Murphy; V.M. Coppleson (1962), pp.207-208
The Sun, 4/3/1910; Authenticity questioned by G.H. Balazs in J. Borg, p.70
NY Herald Tribune, 7/25/1926; A. De Maddalena; Anon. (1926a), Anon. (1926b); C. Moore, GSAF
Charlotte Observer, 6/24/1992, p.1C & 8/8/1992, p.2C
C. Black, GSAF; Sydney Gazette, 1/22/1831
Courier-Mail, 11/24/1989, p.3; J. West, ASAF
The Star, 12/17/1896
V.M. Coppleson.W2, (1933); V.M. Coppleson (1958), pp.111 & 241; West Australia, 1/5/1967; A. Sharpe, pp.129-130; H. Edwards, pp.131-133
New Zealand Herald, 10/22/2017

pdf text

99.6% of rows are unique strings 98.7% rows are a single word 52.6% null

rows6,462

null3,396 (52.6%)

unique3,054

len_min10

len_max41

len_mean23.731

len_median23.000

len_p9531.000

word_mean1.022

word_median1.000

n_empty0

n_duplicates12

duplicate_rate3.91e-03

vocab_size3,098

readability_flesch_mean-66.809

emoji_rate0.000

url_rate0.000

one_word_rate0.987

allcaps_rate3.26e-04

boilerplate_rate0.000

Show data table

Character-length distribution for pdf (mean: 23.73091976516634).
chars	count
10 – 11	2
11 – 12	0
12 – 12	0
12 – 13	1
13 – 14	0
14 – 15	4
15 – 15	0
15 – 16	4
16 – 17	0
17 – 18	14
18 – 19	49
19 – 19	139
19 – 20	277
20 – 21	0
21 – 22	438
22 – 22	473
22 – 23	391
23 – 24	0
24 – 25	275
25 – 26	229
26 – 26	159
26 – 27	134
27 – 28	0
28 – 29	112
29 – 29	86
29 – 30	72
30 – 31	0
31 – 32	60
32 – 32	50
32 – 33	31
33 – 34	27
34 – 35	0
35 – 36	13
36 – 36	8
36 – 37	6
37 – 38	0
38 – 39	3
39 – 39	2
39 – 40	4
40 – 41	3

Sample values (first 10)

2020.01.12-Malten.pdf
1900.09.05-Hartman.pdf
1872.11.30.R-MalayPirates.pdf
1885.04.16.R-GermanShip.pdf
1948.12.26-Keys.pdf
ND-0094-HaeNyeo.pdf
1947.05.13.R-Kenya.pdf
1857.05.05-Dunn.pdf
1884.08.18-Rylor.pdf
1971.11.25.R-Chan.pdf

href formula text

99.6% of rows are unique strings 98.8% rows are a single word 100.0% rows contain a URL 52.6% null

rows6,462

null3,400 (52.6%)

unique3,051

len_min64

len_max95

len_mean77.729

len_median77.000

len_p9585.000

word_mean1.019

word_median1.000

n_empty0

n_duplicates11

duplicate_rate3.59e-03

vocab_size3,089

readability_flesch_mean-820.177

emoji_rate0.000

url_rate1.000

one_word_rate0.988

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for href formula (mean: 77.72893533638145).
chars	count
64 – 65	1
65 – 66	0
66 – 66	0
66 – 67	1
67 – 68	0
68 – 69	4
69 – 69	0
69 – 70	5
70 – 71	0
71 – 72	14
72 – 73	49
73 – 73	139
73 – 74	276
74 – 75	0
75 – 76	437
76 – 76	473
76 – 77	392
77 – 78	0
78 – 79	275
79 – 80	228
80 – 80	159
80 – 81	134
81 – 82	0
82 – 83	112
83 – 83	86
83 – 84	72
84 – 85	0
85 – 86	60
86 – 86	48
86 – 87	31
87 – 88	27
88 – 89	0
89 – 90	13
90 – 90	8
90 – 91	6
91 – 92	0
92 – 93	3
93 – 93	2
93 – 94	4
94 – 95	3

Sample values (first 10)

http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.12-Malten.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1900.08.21-Burriss.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1872.11.30.R-MalayPirates.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1885.04.16.R-GermanShip.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1948.12.14.a-Jeppeson.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/ND-0094-HaeNyeo.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1947.04.06-Watt.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1856.11.25.R-Fiji.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1884.08.18-Rylor.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1971.09.25-Horner.pdf

href text

99.6% of rows are unique strings 98.8% rows are a single word 100.0% rows contain a URL 52.6% null

rows6,462

null3,400 (52.6%)

unique3,051

len_min64

len_max135

len_mean77.887

len_median77.000

len_p9586.000

word_mean1.020

word_median1.000

n_empty0

n_duplicates11

duplicate_rate3.59e-03

vocab_size3,091

readability_flesch_mean-824.407

emoji_rate0.000

url_rate1.000

one_word_rate0.988

allcaps_rate0.000

boilerplate_rate0.000

Show data table

Character-length distribution for href (mean: 77.88667537557153).
chars	count
64 – 66	1
66 – 68	1
68 – 69	4
69 – 71	19
71 – 73	49
73 – 75	415
75 – 76	908
76 – 78	666
78 – 80	224
80 – 82	290
82 – 84	199
84 – 85	131
85 – 87	80
87 – 89	27
89 – 91	21
91 – 92	9
92 – 94	6
94 – 96	3
96 – 98	0
98 – 100	0
100 – 101	0
101 – 103	0
103 – 105	0
105 – 107	0
107 – 108	0
108 – 110	0
110 – 112	0
112 – 114	0
114 – 115	0
115 – 117	0
117 – 119	0
119 – 121	0
121 – 123	0
123 – 124	0
124 – 126	0
126 – 128	0
128 – 130	1
130 – 131	4
131 – 133	2
133 – 135	2

Sample values (first 10)

http://sharkattackfile.net/spreadsheets/pdf_directory/2020.01.12-Malten.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1900.08.21-Burriss.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1872.11.30.R-MalayPirates.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1885.04.16.R-GermanShip.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1948.12.14.a-Jeppeson.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/ND-0094-HaeNyeo.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1947.04.06-Watt.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1856.11.25.R-Fiji.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1884.08.18-Rylor.pdf
http://sharkattackfile.net/spreadsheets/pdf_directory/1971.09.25-Horner.pdf

Case Number.1 text

99.7% of rows are unique strings 99.8% rows are a single word 79.0% rows are all-caps 52.6% null 95th-percentile length under 20 chars

rows6,462

null3,400 (52.6%)

unique3,054

len_min7

len_max18

len_mean10.591

len_median10.000

len_p9512.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates8

duplicate_rate2.61e-03

vocab_size3,057

readability_flesch_mean121.215

emoji_rate0.000

url_rate0.000

one_word_rate0.998

allcaps_rate0.790

boilerplate_rate0.000

Show data table

Character-length distribution for Case Number.1 (mean: 10.59079033311561).
chars	count
7 – 7	120
7 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 9	0
9 – 9	0
9 – 9	4
9 – 9	0
9 – 10	0
10 – 10	1884
10 – 10	0
10 – 11	0
11 – 11	0
11 – 11	7
11 – 11	0
11 – 12	0
12 – 12	0
12 – 12	1011
12 – 12	0
12 – 13	0
13 – 13	8
13 – 13	0
13 – 14	0
14 – 14	0
14 – 14	24
14 – 14	0
14 – 15	0
15 – 15	0
15 – 15	2
15 – 16	0
16 – 16	0
16 – 16	1
16 – 16	0
16 – 17	0
17 – 17	0
17 – 17	0
17 – 17	0
17 – 18	0
18 – 18	1

Sample values (first 10)

2020.01.12
1900.08.21
1872.11.30.R
1885.04.16.R
1948.12.14.a
ND.0094
1947.04.06
1856.11.25.R
1884.08.18
1971.09.25

Case Number.2 text

99.8% of rows are unique strings 99.8% rows are a single word 79.0% rows are all-caps 52.6% null 95th-percentile length under 20 chars

rows6,462

null3,400 (52.6%)

unique3,055

len_min7

len_max18

len_mean10.591

len_median10.000

len_p9512.000

word_mean1.002

word_median1.000

n_empty0

n_duplicates7

duplicate_rate2.29e-03

vocab_size3,058

readability_flesch_mean121.215

emoji_rate0.000

url_rate0.000

one_word_rate0.998

allcaps_rate0.790

boilerplate_rate0.000

Show data table

Character-length distribution for Case Number.2 (mean: 10.59079033311561).
chars	count
7 – 7	120
7 – 8	0
8 – 8	0
8 – 8	0
8 – 8	0
8 – 9	0
9 – 9	0
9 – 9	4
9 – 9	0
9 – 10	0
10 – 10	1884
10 – 10	0
10 – 11	0
11 – 11	0
11 – 11	7
11 – 11	0
11 – 12	0
12 – 12	0
12 – 12	1011
12 – 12	0
12 – 13	0
13 – 13	8
13 – 13	0
13 – 14	0
14 – 14	0
14 – 14	24
14 – 14	0
14 – 15	0
15 – 15	0
15 – 15	2
15 – 16	0
16 – 16	0
16 – 16	1
16 – 16	0
16 – 17	0
17 – 17	0
17 – 17	0
17 – 17	0
17 – 18	0
18 – 18	1

Sample values (first 10)

2020.01.12
1900.08.21
1872.11.30.R
1885.04.16.R
1948.12.14.a
ND.0094
1947.04.06
1856.11.25.R
1884.08.18
1971.09.25

original order numeric

52.6% null

rows6,462

null3,400 (52.6%)

unique3,061

min3.000

max6,502

mean1,564

median1,534

std988.410

q1768.250

q32,299

iqr1,530

skew0.988

kurtosis3.551

n_outliers27

outlier_rate8.82e-03

zero_rate0.000

Show data table

Histogram bins for original order (median: 1533.5).
bin	count
3 – 165.5	163
165.5 – 327.9	162
327.9 – 490.4	163
490.4 – 652.9	162
652.9 – 815.4	163
815.4 – 977.8	162
977.8 – 1140	163
1140 – 1303	162
1303 – 1465	163
1465 – 1628	162
1628 – 1790	163
1790 – 1953	162
1953 – 2115	163
2115 – 2278	162
2278 – 2440	163
2440 – 2603	162
2603 – 2765	163
2765 – 2928	162
2928 – 3090	110
3090 – 3252	0
3252 – 3415	0
3415 – 3577	0
3577 – 3740	0
3740 – 3902	0
3902 – 4065	0
4065 – 4227	0
4227 – 4390	0
4390 – 4552	0
4552 – 4715	0
4715 – 4877	0
4877 – 5040	0
5040 – 5202	0
5202 – 5365	0
5365 – 5527	0
5527 – 5690	0
5690 – 5852	0
5852 – 6015	0
6015 – 6177	0
6177 – 6340	0
6340 – 6502	27

Unnamed: 23 categorical

2 singleton categories 100.0% null

rows6,462

null6,460 (100.0%)

unique2

top_valueTeramo

top_rate0.500

cardinality2

entropy1.000

entropy_ratio1.000

Show data table

Top values for Unnamed: 23 (2 unique shown, of 2 total).
value	count	share
Teramo	1	0.0%
change filename	1	0.0%

Top values (rank 1–20)

Teramo — 1
change filename — 1