saturn

/home/coolhand/html/datavis/data_trove/cache/parking/parking_violations_sample_20260119.json 10,000 rows sample n=10,000 seed 42 2026-05-01T23:03:34+00:00

Overview

Source/home/coolhand/html/datavis/data_trove/cache/parking/parking_violations_sample_20260119.json
Total rows10,000
Profiled sample10,000
Columns40
Generated2026-05-01T23:03:34+00:00

Insights opt-in

Model-generated narrative. These are opinions, not facts — the stats below are what saturn measured. Generated by: anthropic:claude-opus-4-7.

Dataset high anthropic:claude-opus-4-7

This is a 10,000-row sample of NYC parking violations with 40 fields covering ticket metadata, vehicle attributes, and location/precinct codes. The violation mix is dominated by one category — 'PHTO SCHOOL ZN SPEED VIOLATION' accounts for 4,416 of the issued tickets (about 52% of non-null descriptions) — which also drives the issuing_agency skew toward 'V' and law_section '408'. Geographically, registration_state is heavily NY (6,935) with NJ and PA trailing, and violation_county splits across Queens, Manhattan, Brooklyn, and the Bronx but with inconsistent codes (e.g., 'QN' vs 'Qns', 'BK' vs 'Kings'). Watch out for heavy nulls and placeholder zeros: meter_number, unregistered_vehicle, time_first_observed, and violation_post_code are >70% null, while issuer_precinct, feet_from_curb, and street_code* are dominated by '0' sentinel values. Vehicle_color also has unnormalized variants ('BK'/'BLK'/'BLACK', 'GY'/'GREY'/'GRY') that will need cleanup before any analysis.

summons_number high anthropic:claude-opus-4-7

This is a unique 10-digit numeric identifier per row (likely a parking/traffic summons number), with all 10,000 values distinct and uniformly 10 characters long. There are no nulls, no duplicates, and every value is a single token, consistent with a primary key rather than a feature. The allcaps flag is a quirk of the detector treating digit-only strings as uppercase.

plate_id high anthropic:claude-opus-4-7

This is almost certainly a license/plate identifier: 9,519 unique values across 10,000 rows, all single-token, 99.99% uppercase, with lengths between 2 and 10 characters (mean 6.88). Notably, 'blankplate' appears 34 times and the duplicate rate is 4.81% (481 records), suggesting placeholder values and a small amount of legitimate plate reuse. No nulls or empties, but the placeholder token will pollute any join or uniqueness assumption.

registration_state high anthropic:claude-opus-4-7

Two-letter US state codes identifying where a vehicle is registered, dominated by NY at 69.35% of 10,000 rows with NJ, PA, CT, and FL trailing — consistent with an NYC-area enforcement dataset. Cardinality is 50 with entropy ratio 0.36, and one suspicious non-state token "99" appears 88 times, likely a sentinel for unknown/out-of-country plates.

plate_type high anthropic:claude-opus-4-7

This column encodes vehicle plate type codes (e.g., PAS, OMT, COM), almost certainly from a parking or traffic dataset. The distribution is heavily dominated by passenger plates, with PAS accounting for 9072 of 10000 rows (top_rate 0.9072) and entropy_ratio of just 0.146 across 27 categories. A small but notable 77 rows carry the placeholder code '999', which likely represents missing or unknown plate types despite null_rate being 0.

issue_date high anthropic:claude-opus-4-7

This is an issue_date column stored as ISO-8601 timestamps but profiled as categorical, with 687 distinct dates across 10,000 rows and no nulls. The distribution is severely concentrated: 65.42% of rows fall on 2025-12-28, with 2025-12-30 (1,594) and 2025-12-29 (356) accounting for most of the rest, leaving a long tail of 2026 dates with single- or low-double-digit counts. The clustering at end-of-2025 suggests a bulk-issue or backfill event rather than organic daily issuance.

violation_code high anthropic:claude-opus-4-7

Categorical violation_code with 62 distinct codes stored as strings and zero nulls across 10,000 rows. Distribution is heavily concentrated: code '36' alone accounts for 44.16% of records, followed by '21' at 14.63%, giving an entropy ratio of 0.52 — roughly half the maximum for this cardinality. The top 10 codes cover the bulk of traffic, leaving a long tail of rare codes.

vehicle_body_type high anthropic:claude-opus-4-7

Column holds short vehicle body-type codes (SUBN, 4DSD, SDN, VAN, PICK...) typical of DMV/parking-violation feeds. Distribution is heavily concentrated: SUBN alone covers 51.9% of rows and the top two codes account for roughly 72%, yet there are 81 distinct codes producing a long tail. Entropy ratio of 0.41 confirms the lopsided spread, and nulls are minor at 1.31%.

vehicle_make high anthropic:claude-opus-4-7

This is a categorical vehicle make field with 126 distinct values across 10,000 rows and a low 0.78% null rate. Values appear truncated to 5 characters (TOYOT, NISSA, ME/BE, CHEVR, HYUND, SUBAR), which will collide brands and complicate joins. HONDA leads at 13.4% followed closely by TOYOT at 13.0%, and entropy ratio of 0.67 indicates a long tail beyond the top 10.

issuing_agency high anthropic:claude-opus-4-7

Single-letter codes for the agency that issued each record, with 20 distinct values and no nulls across 10,000 rows. Distribution is heavily concentrated: 'V' alone accounts for 44.16% and the top four codes (V, T, S, P) cover the vast majority, while codes like M, O fall to single or low double digits. Entropy ratio of 0.46 confirms the imbalance.

street_code1 high anthropic:claude-opus-4-7

This is `street_code1`, a short numeric street identifier stored as text (len_max 5, one_word_rate 1.0, 1420 distinct codes across 10000 rows). The dominant surprise is that 4817 rows — nearly half — carry the placeholder value "0", driving the 0.858 duplicate_rate; the next most frequent code ("13610") only appears 52 times. No nulls, but the "0" sentinel effectively functions as a missing/unknown marker.

street_code2 high anthropic:claude-opus-4-7

This looks like a secondary street code stored as text, with 1349 unique short tokens (max length 5, mean 2.94 chars) and every value a single word. The column is dominated by '0', which accounts for 5005 of 10000 rows, with another 359 rows holding '40404'; combined with an 86.51% duplicate rate this leaves very little discriminating signal. No nulls, but the heavy '0' mass likely encodes a missing/sentinel state rather than a real code.

street_code3 high anthropic:claude-opus-4-7

Stored as text but the values are short numeric codes (len_max 5, one_word_rate 1.0, 1317 unique tokens), consistent with a street or address code lookup. The distribution is dominated by '0' (5235/10000) with '40404' a distant second at 359, driving an 86.83% duplicate_rate. The allcaps_rate of 0.4753 is an artifact of digit-only strings being counted as uppercase.

vehicle_expiration_date high anthropic:claude-opus-4-7

Stored as text but functionally a YYYYMMDD vehicle expiration date, with lengths clustered at 8-10 characters and one token per row. Over half the rows (5,492) are the sentinel '0.00000000' and another 556 are '88888888', plus 126 entries like '20258888' that mix a real year with a placeholder month/day — so roughly 61% of values are not real dates. Genuine dates such as 20260930 and 20260228 appear only in the dozens, and the column is 89.6% duplicates across just 1,040 unique values.

violation_location high anthropic:claude-opus-4-7

This appears to be a violation_location code, likely a precinct or zone identifier stored as a zero-padded string (e.g., '0018', '0047') mixed with non-padded variants ('115', '114'), suggesting inconsistent formatting across sources. Cardinality is 87 with very high entropy ratio (0.929), so values are spread evenly with no dominant location — the top code only accounts for 5.1% of rows. Most striking: 44.85% of rows are null, which is flagged as an alert and severely limits usability.

violation_precinct high anthropic:claude-opus-4-7

This column encodes the NYPD precinct where the violation occurred, stored as a string with 88 distinct codes. The dominant surprise is that 44.85% of rows carry the value '0', which is almost certainly a sentinel for missing/non-applicable precinct rather than a real precinct number; legitimate precincts like 18, 115, and 114 follow far behind. Entropy ratio of 0.66 reflects this heavy concentration on the sentinel.

issuer_precinct high anthropic:claude-opus-4-7

This is the precinct number of the issuing officer, with 121 distinct codes across 10,000 rows and no nulls. The distribution is dominated by the value "0" at 64.18% of rows, which is almost certainly a sentinel/placeholder rather than a real precinct, leaving genuine precincts (e.g., 18, 115, 103) as long-tail minorities. Entropy ratio of 0.446 confirms the heavy concentration on that single code.

issuer_code high anthropic:claude-opus-4-7

This is a categorical issuer identifier stored as text, with 1,420 distinct codes across 10,000 rows and an 85.8% duplicate rate. The dominant value '0' accounts for 5,314 rows (over half the column), suggesting it is a sentinel or 'unknown issuer' placeholder rather than a real code. Remaining values look like 6-digit numeric IDs (len_max 6, len_p95 6), but the median length of 1 confirms how heavily the '0' bucket skews the distribution.

issuer_command high anthropic:claude-opus-4-7

This column holds short alphanumeric codes (e.g., T302, T401, EPIU, MTTF) that look like issuer-side command or instruction tokens, with 228 distinct values across 10,000 rows. It is sparsely populated — 44.16% null — yet entropy_ratio of 0.83 over the non-null portion shows the codes spread fairly evenly, with the most common value T302 covering only 6.7%. No single code dominates, so the missingness is the more striking signal than concentration.

issuer_squad medium anthropic:claude-opus-4-7

A categorical code labelled issuer_squad with 23 distinct values, dominated by the placeholder-looking '0000' which accounts for 41.6% of non-nulls (1513 rows) while the rest are single-letter tags like F, N, A, L. Most striking: 63.6% of rows are null, and entropy_ratio of 0.73 suggests the remaining values are reasonably spread but anchored by that '0000' bucket which may represent unassigned issuers. The mix of a numeric-looking code with letter codes hints at inconsistent encoding conventions.

violation_time high anthropic:claude-opus-4-7

This column encodes the time of a violation as a fixed 5-character token like '0839A' or '1200P' (HHMM plus A/P meridiem indicator), with every value uppercase and exactly one word. Across 10,000 rows there are only 1,432 distinct values and an 85.7% duplicate rate, which is expected for clock times rather than a data quality issue. Null rate is negligible (0.0004) and the format is strikingly uniform — len_min, len_median, and len_max all equal 5.

violation_county high anthropic:claude-opus-4-7

This is a categorical county/borough code for NYC violations, with 13 distinct values across 10,000 rows and a 2.71% null rate. The top value 'QN' (Queens) covers 18.2% of rows, but the codes are inconsistent: Queens appears as both 'QN' and 'Q' (and 'Qns'), Brooklyn as 'BK', 'K', and 'Kings', and Bronx as 'BX' and 'Bronx'—so the true cardinality is lower than 13. Entropy ratio of 0.897 reflects this fragmentation rather than genuine diversity.

violation_in_front_of_or_opposite high anthropic:claude-opus-4-7

A categorical position code indicating where a violation occurred relative to a reference point — almost certainly 'F' (front), 'O' (opposite), 'I' (inside?), and a single 'R'. Nearly half the rows (46.85%) are null, and among the populated rows 'F' dominates at 66.85%. The 'R' code appears just once out of 10000, suggesting either a data-entry error or a rare legitimate category worth investigating.

street_name high anthropic:claude-opus-4-7

Street/intersection labels, mostly directional roadway descriptors like 'SB CROSS BAY BLVD @' with cardinal prefixes (sb/wb/nb/eb) and '@' separators indicating cross-streets — consistent with NYC traffic or transit data. Values are heavily duplicated (6881 of 10000, 3115 unique) and 77% all-caps, with strings capped at 20 chars suggesting upstream truncation. Language detection flags Japanese (614) and Chinese (59), but these are almost certainly false positives from short uppercase abbreviations rather than true multilingual content.

intersecting_street high anthropic:claude-opus-4-7

This column holds short uppercase street-name fragments (mean length 9.3 chars, ~2.4 words) describing an intersecting street, likely paired with a primary address elsewhere. Nearly half the rows are null (48.1%) and 72.8% of the non-null values are duplicates, with common tokens like 'ST' (189), 'AVE' (84), and 'RD' dominating — many top values are bare suffixes ('ST', 'AVE', 'T', 'E') suggesting truncation or partial captures. Vocabulary is small (1,053 words across 1,413 unique values) and 80.9% of entries are all-caps, consistent with a municipal data-entry convention.

date_first_observed high anthropic:claude-opus-4-7

Stored as a categorical YYYYMMDD string capturing the date a record was first observed, with 147 distinct values across 10,000 rows. The column is overwhelmingly dominated by the sentinel '0' (98.27%), leaving only ~173 rows with real dates clustered in late 2025 through 2026. Entropy ratio of 0.035 confirms almost no information content as-is.

law_section high anthropic:claude-opus-4-7

This is a binary categorical field encoding a law/statute section, taking only two codes ('408' and '1180') across all 10,000 rows with no nulls. The split is fairly balanced at 55.84%/44.16%, yielding near-maximal entropy (0.99 of the possible 1.0). Despite the numeric-looking values, with only 2 distinct levels it behaves as a flag rather than a continuous code.

sub_division high anthropic:claude-opus-4-7

Categorical sub-division code with 76 distinct values across 10,000 rows and almost no nulls (0.02%). The distribution is heavily concentrated: 'B' alone covers 44.4% of rows and the top two values ('B' and 'd1') together exceed 58%, giving an entropy ratio of just 0.52. The mix of single-letter codes ('B','C','D') with letter-digit codes ('d1','E2','C3') — including a lowercase 'd1' alongside uppercase letters — suggests inconsistent coding conventions worth normalising.

days_parking_in_effect high anthropic:claude-opus-4-7

This is a 7-character day-of-week mask indicating which days parking rules are in effect, with each position likely corresponding to Sun-Sat (Y=in effect, B=likely a holiday/exempt marker, space=off). Over half the rows are null (50.59%) and among the populated values 'YYYYYYY' (every day) dominates at 42.38% followed by 'BBBBBBB' at 1449 occurrences, giving low entropy ratio of 0.46 across 21 distinct patterns. The mix of Y, B, and spaces in the same field is unusual and suggests an encoded multi-flag rather than a clean categorical.

from_hours_in_effect high anthropic:claude-opus-4-7

Start time of a parking/traffic regulation's effective window, encoded as a clock string like '0830A' with a sentinel 'ALL' meaning 24-hour applicability. The column is null 68.3% of the time, and among the 3,170 populated rows 'ALL' alone accounts for 45.6% (1,445), so actual start times are a minority signal. Cardinality is just 26 with entropy ratio 0.63, and the populated values cluster heavily in morning hours (0700A–1130A).

to_hours_in_effect high anthropic:claude-opus-4-7

This appears to be the end-time of a parking or traffic regulation's effective window, encoded as a clock string (e.g., '0100P', '1100A') with 'ALL' meaning the rule applies all day. Two-thirds of rows are null (null_rate 0.683), and among the 3,170 populated values 'ALL' dominates at 45.6%, leaving the 30 actual time codes thinly spread (entropy_ratio 0.62). The mix of sentinel ('ALL') and time-of-day strings in the same field is the main gotcha.

vehicle_color high anthropic:claude-opus-4-7

This is a vehicle color code field, almost certainly from a parking or traffic citation feed. The encoding is inconsistent: short codes (GY, BK, WH) dominate but overlapping long forms (WHITE, BLACK, GREY) and alternate abbreviations (BLK, GRY) appear as separate categories, inflating cardinality to 99 across only 10000 rows. About 9.4% of values are null and the top code GY covers 22.95% of rows.

unregistered_vehicle high anthropic:claude-opus-4-7

This column appears to be a binary flag for whether a vehicle was unregistered, but it carries no information: 84.87% of rows are null and the remaining 1513 rows all hold the single value "0". Cardinality is 1 and entropy is 0, so the field is effectively constant where populated.

vehicle_year high anthropic:claude-opus-4-7

Vehicle model year stored as a categorical string with 39 distinct values across 10000 rows and no nulls. The dominant value is "0" at 23.29% of rows, which almost certainly encodes missing/unknown rather than a real year; legitimate years span recent ones with 2024 (759) and 2025 (696) leading. Entropy ratio of 0.78 reflects a fairly even spread across the remaining year codes.

meter_number high anthropic:claude-opus-4-7

Likely a utility meter identifier, but it is effectively empty: 84.79% of rows are null and 99.47% of the non-null values are the placeholder "-". Only 7 distinct values exist across 10,000 rows, with the genuine-looking numeric IDs (e.g. 309485, 103553) appearing at most twice. Entropy ratio of 0.022 confirms there is almost no information here.

house_number high anthropic:claude-opus-4-7

Almost certainly a street address house-number fragment, with values that are short single tokens (word_mean 1.00, len_mean 3.36, max length 7) and a vocabulary of 2,345 over 10,000 rows. Surprisingly, the most frequent values are cardinal-direction letters N/E/W/S (161/136/122/120) rather than digits — these likely belong in a separate directional prefix field. Nearly half the column is null (47.7%) and 55.1% of non-nulls duplicate, so it's sparse and low-cardinality.

time_first_observed high anthropic:claude-opus-4-7

Looks like a time-of-first-observation field encoded as HHMMx where x is A/P (e.g., '0230A', '0930P'). It is 78.45% null and, among the 2,155 non-null rows, the sentinel '00000' dominates at 1,926 occurrences (top_rate 0.8937), leaving only ~229 rows with real timestamps spread across 206 distinct values. Entropy ratio of 0.169 confirms almost no information content once nulls and the placeholder are removed.

violation_description high anthropic:claude-opus-4-7

This column records the parking/traffic violation type as a short text code, with 74 distinct values across 10,000 rows. It's heavily concentrated: 'PHTO SCHOOL ZN SPEED VIOLATION' alone covers 52% of non-null rows, and 15.13% of values are null. The label formatting is inconsistent — some entries are numbered codes like '14-No Standing' while others are free-form like 'Fire Hydrant' or 'Detached Trailer', suggesting multiple source systems or schema versions merged together.

violation_post_code high anthropic:claude-opus-4-7

This appears to be a violation post code categorical field, likely a sub-classifier within a citation or inspection record. It has 53 unique codes with high entropy (ratio 0.89), and the top code "99" only covers 14.5% of non-null rows. Two notable surprises: 78.74% of rows are null, and the code values are heterogeneous in format—numeric ("99", "01", "311"), single letters ("B", "U"), and tokens like "SPCL"—suggesting multiple coding schemes coexist.

violation_legal_code high anthropic:claude-opus-4-7

This column appears to be a violation legal code indicator, but it carries no information: every one of the 4,416 non-null rows contains the single value "T" (top_rate 1.0, cardinality 1, entropy 0.0). On top of that, 55.84% of rows are null. There is nothing here to discriminate records.

Errors during insight pass (1)
  • column:feet_from_curb:anthropic:claude-opus-4-7: ProviderRateLimitError — Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of concurrent connections has exceeded your rate limit. Please try again later or contact sales at https://claude.com/contact-sales to discuss your options for a rate limit increase.'}, 'request_id': 'req_011Cacguy5aoU3RjynzW1sTb'}

Languages detected

Per-string language detection across text columns (sampled).

summons_number text

100.0% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows10,000
null0 (0.0%)
unique10,000
len_min10
len_max10
len_mean10.000
len_median10.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates0
duplicate_rate0.000
vocab_size10,000
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 1499918446
  2. 4976273052
  3. 4976262716
  4. 4976263125
  5. 9252903896
  6. 4976252231
  7. 9253525745
  8. 4976257137
  9. 4976263332
  10. 1499713034

plate_id text

95.2% of rows are unique strings 100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars
rows10,000
null0 (0.0%)
unique9,519
len_min2
len_max10
len_mean6.879
len_median7.000
len_p957.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates481
duplicate_rate0.048
vocab_size9,519
readability_flesch_mean100.493
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. LPG5377
  2. NSXB18
  3. BL54422
  4. LKN2862
  5. MWJM2015
  6. M15UEZ
  7. MHC7554
  8. GJK2986
  9. JMT5468
  10. 85997MD

registration_state categorical

rows10,000
null0 (0.0%)
unique50
top_valueNY
top_rate0.694
cardinality50
entropy2.034
entropy_ratio0.360
Top values (rank 1–20)
  1. NY — 6,935
  2. NJ — 917
  3. PA — 470
  4. CT — 247
  5. FL — 235
  6. VA — 220
  7. ME — 125
  8. MA — 124
  9. 99 — 88
  10. GA — 65
  11. NC — 64
  12. TN — 56
  13. MD — 53
  14. TX — 36
  15. OH — 31
  16. IN — 30
  17. IL — 28
  18. CA — 27
  19. SC — 26
  20. MI — 24

plate_type categorical

rows10,000
null0 (0.0%)
unique27
top_valuePAS
top_rate0.907
cardinality27
entropy0.696
entropy_ratio0.146
Top values (rank 1–20)
  1. PAS — 9,072
  2. OMT — 330
  3. COM — 245
  4. SRF — 103
  5. OMS — 84
  6. 999 — 77
  7. ORG — 17
  8. TRL — 11
  9. MED — 8
  10. RGL — 8
  11. MOT — 7
  12. NYS — 6
  13. PSD — 6
  14. SPO — 4
  15. VAS — 3
  16. ITP — 3
  17. OMR — 2
  18. TRC — 2
  19. TOW — 2
  20. APP — 2

issue_date categorical

368 singleton categories
rows10,000
null0 (0.0%)
unique687
top_value2025-12-28T00:00:00.000
top_rate0.654
cardinality687
entropy2.765
entropy_ratio0.293
Top values (rank 1–20)
  1. 2025-12-28T00:00:00.000 — 6,542
  2. 2025-12-30T00:00:00.000 — 1,594
  3. 2025-12-29T00:00:00.000 — 356
  4. 2026-06-26T00:00:00.000 — 14
  5. 2026-09-27T00:00:00.000 — 13
  6. 2026-09-25T00:00:00.000 — 12
  7. 2025-12-31T00:00:00.000 — 12
  8. 2026-06-27T00:00:00.000 — 11
  9. 2026-10-25T00:00:00.000 — 10
  10. 2026-08-31T00:00:00.000 — 10
  11. 2026-08-27T00:00:00.000 — 10
  12. 2026-07-26T00:00:00.000 — 10
  13. 2026-06-30T00:00:00.000 — 10
  14. 2026-08-25T00:00:00.000 — 9
  15. 2026-10-29T00:00:00.000 — 8
  16. 2026-10-17T00:00:00.000 — 8
  17. 2026-10-05T00:00:00.000 — 8
  18. 2026-09-23T00:00:00.000 — 8
  19. 2026-07-27T00:00:00.000 — 8
  20. 2026-07-24T00:00:00.000 — 8

violation_code categorical

rows10,000
null0 (0.0%)
unique62
top_value36
top_rate0.442
cardinality62
entropy3.100
entropy_ratio0.521
Top values (rank 1–20)
  1. 36 — 4,416
  2. 21 — 1,463
  3. 40 — 862
  4. 14 — 858
  5. 46 — 360
  6. 20 — 244
  7. 98 — 242
  8. 19 — 168
  9. 16 — 133
  10. 66 — 131
  11. 71 — 116
  12. 74 — 114
  13. 70 — 104
  14. 50 — 96
  15. 17 — 79
  16. 78 — 75
  17. 51 — 66
  18. 80 — 56
  19. 67 — 47
  20. 13 — 38

vehicle_body_type categorical

rows10,000
null131 (1.3%)
unique81
top_valueSUBN
top_rate0.519
cardinality81
entropy2.599
entropy_ratio0.410
Top values (rank 1–20)
  1. SUBN — 5,120
  2. 4DSD — 2,067
  3. SDN — 670
  4. VAN — 275
  5. SPOR — 233
  6. PICK — 194
  7. TRLR — 175
  8. SEDA — 115
  9. UT — 111
  10. SW — 110
  11. 2DSD — 94
  12. 4D — 82
  13. DELV — 77
  14. SEMI — 53
  15. TAXI — 52
  16. SU — 52
  17. SD — 51
  18. P-U — 46
  19. TRAC — 41
  20. CONV — 26

vehicle_make categorical

rows10,000
null78 (0.8%)
unique126
top_valueHONDA
top_rate0.134
cardinality126
entropy4.661
entropy_ratio0.668
Top values (rank 1–20)
  1. HONDA — 1,331
  2. TOYOT — 1,302
  3. NISSA — 770
  4. FORD — 603
  5. BMW — 559
  6. ME/BE — 521
  7. JEEP — 450
  8. CHEVR — 449
  9. HYUND — 365
  10. SUBAR — 273
  11. KIA — 268
  12. LEXUS — 268
  13. MAZDA — 257
  14. AUDI — 242
  15. ACURA — 221
  16. VOLKS — 199
  17. DODGE — 175
  18. TESLA — 152
  19. INFIN — 151
  20. GMC — 134

issuing_agency categorical

rows10,000
null0 (0.0%)
unique20
top_valueV
top_rate0.442
cardinality20
entropy2.007
entropy_ratio0.464
Top values (rank 1–20)
  1. V — 4,416
  2. T — 2,131
  3. S — 1,946
  4. P — 1,325
  5. K — 41
  6. N — 38
  7. A — 23
  8. Y — 17
  9. M — 13
  10. O — 9
  11. C — 9
  12. 8 — 8
  13. 3 — 6
  14. X — 5
  15. 9 — 3
  16. W — 3
  17. L — 2
  18. R — 2
  19. F — 2
  20. U — 1

street_code1 text

100.0% rows are a single word 51.7% rows are all-caps 95th-percentile length under 20 chars 85.8% duplicate strings
rows10,000
null0 (0.0%)
unique1,420
len_min1
len_max5
len_mean3.026
len_median4.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,580
duplicate_rate0.858
vocab_size1,420
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.517
boilerplate_rate0.000
Sample values (first 10)
  1. 35115
  2. 0
  3. 0
  4. 0
  5. 34610
  6. 0
  7. 11010
  8. 0
  9. 0
  10. 18070

street_code2 text

100.0% rows are a single word 49.6% rows are all-caps 95th-percentile length under 20 chars 86.5% duplicate strings
rows10,000
null0 (0.0%)
unique1,349
len_min1
len_max5
len_mean2.938
len_median1.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,651
duplicate_rate0.865
vocab_size1,349
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.496
boilerplate_rate0.000
Sample values (first 10)
  1. 35290
  2. 0
  3. 0
  4. 0
  5. 10410
  6. 0
  7. 34390
  8. 0
  9. 0
  10. 40404

street_code3 text

100.0% rows are a single word 47.5% rows are all-caps 95th-percentile length under 20 chars 86.8% duplicate strings
rows10,000
null0 (0.0%)
unique1,317
len_min1
len_max5
len_mean2.850
len_median1.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,683
duplicate_rate0.868
vocab_size1,317
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.475
boilerplate_rate0.000
Sample values (first 10)
  1. 0
  2. 0
  3. 0
  4. 0
  5. 10510
  6. 0
  7. 34430
  8. 0
  9. 0
  10. 40404

vehicle_expiration_date text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 89.6% duplicate strings
rows10,000
null0 (0.0%)
unique1,040
len_min8
len_max10
len_mean9.098
len_median10.000
len_p9510.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,960
duplicate_rate0.896
vocab_size1,040
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 20260730
  2. 0.00000000
  3. 0.00000000
  4. 0.00000000
  5. 20260888
  6. 0.00000000
  7. 88888888
  8. 0.00000000
  9. 0.00000000
  10. 0.00000000

violation_location categorical

44.9% null
rows10,000
null4,485 (44.9%)
unique87
top_value0018
top_rate0.051
cardinality87
entropy5.988
entropy_ratio0.929
Top values (rank 1–20)
  1. 0018 — 284
  2. 115 — 220
  3. 114 — 178
  4. 103 — 162
  5. 0047 — 140
  6. 0075 — 139
  7. 0077 — 135
  8. 0062 — 130
  9. 0072 — 129
  10. 110 — 127
  11. 0001 — 125
  12. 112 — 124
  13. 0061 — 120
  14. 0084 — 106
  15. 0034 — 105
  16. 0020 — 104
  17. 109 — 102
  18. 0067 — 96
  19. 0073 — 96
  20. 108 — 93

violation_precinct categorical

rows10,000
null0 (0.0%)
unique88
top_value0
top_rate0.449
cardinality88
entropy4.295
entropy_ratio0.665
Top values (rank 1–20)
  1. 0 — 4,485
  2. 18 — 284
  3. 115 — 220
  4. 114 — 178
  5. 103 — 162
  6. 47 — 140
  7. 75 — 139
  8. 77 — 135
  9. 62 — 130
  10. 72 — 129
  11. 110 — 127
  12. 1 — 125
  13. 112 — 124
  14. 61 — 120
  15. 84 — 106
  16. 34 — 105
  17. 20 — 104
  18. 109 — 102
  19. 67 — 96
  20. 73 — 96

issuer_precinct categorical

rows10,000
null0 (0.0%)
unique121
top_value0
top_rate0.642
cardinality121
entropy3.088
entropy_ratio0.446
Top values (rank 1–20)
  1. 0 — 6,418
  2. 18 — 282
  3. 115 — 161
  4. 103 — 139
  5. 61 — 115
  6. 62 — 113
  7. 1 — 104
  8. 109 — 90
  9. 70 — 87
  10. 114 — 81
  11. 19 — 80
  12. 14 — 80
  13. 20 — 78
  14. 110 — 75
  15. 63 — 73
  16. 102 — 68
  17. 6 — 65
  18. 60 — 59
  19. 75 — 57
  20. 10 — 57

issuer_code text

100.0% rows are a single word 46.9% rows are all-caps 95th-percentile length under 20 chars 85.8% duplicate strings
rows10,000
null0 (0.0%)
unique1,420
len_min1
len_max6
len_mean3.338
len_median1.000
len_p956.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,580
duplicate_rate0.858
vocab_size1,420
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate0.469
boilerplate_rate0.000
Sample values (first 10)
  1. 976628
  2. 0
  3. 0
  4. 0
  5. 365882
  6. 0
  7. 376939
  8. 0
  9. 0
  10. 972724

issuer_command categorical

44.2% null
rows10,000
null4,416 (44.2%)
unique228
top_valueT302
top_rate0.067
cardinality228
entropy6.523
entropy_ratio0.833
Top values (rank 1–20)
  1. T302 — 374
  2. T401 — 301
  3. T103 — 230
  4. T402 — 223
  5. T106 — 187
  6. T301 — 169
  7. T102 — 145
  8. EPIU — 142
  9. MTTF — 133
  10. KN02 — 105
  11. T105 — 104
  12. KN08 — 95
  13. QW01 — 95
  14. MN12 — 88
  15. BX12 — 86
  16. T303 — 85
  17. T201 — 84
  18. MN07 — 83
  19. QW02 — 63
  20. KN04 — 63

issuer_squad categorical

63.6% null
rows10,000
null6,361 (63.6%)
unique23
top_value0000
top_rate0.416
cardinality23
entropy3.309
entropy_ratio0.731
Top values (rank 1–20)
  1. 0000 — 1,513
  2. F — 331
  3. N — 245
  4. A — 214
  5. L — 180
  6. J — 122
  7. M — 120
  8. Q — 110
  9. R — 100
  10. H — 99
  11. E — 99
  12. X — 95
  13. Y — 85
  14. S — 71
  15. B — 70
  16. P — 44
  17. U — 29
  18. D — 28
  19. C — 25
  20. G — 22

violation_time text

100.0% rows are a single word 100.0% rows are all-caps 95th-percentile length under 20 chars 85.7% duplicate strings
rows10,000
null4 (0.0%)
unique1,432
len_min5
len_max5
len_mean5.000
len_median5.000
len_p955.000
word_mean1.000
word_median1.000
n_empty0
n_duplicates8,564
duplicate_rate0.857
vocab_size1,433
readability_flesch_mean121.220
emoji_rate0.000
url_rate0.000
one_word_rate1.000
allcaps_rate1.000
boilerplate_rate0.000
Sample values (first 10)
  1. 0840P
  2. 0259P
  3. 1127A
  4. 1135A
  5. 0622P
  6. 0725A
  7. 0453P
  8. 0939A
  9. 1139A
  10. 1230P

violation_county categorical

rows10,000
null271 (2.7%)
unique13
top_valueQN
top_rate0.182
cardinality13
entropy3.320
entropy_ratio0.897
Top values (rank 1–20)
  1. QN — 1,773
  2. NY — 1,472
  3. BK — 1,233
  4. BX — 1,027
  5. K — 936
  6. Q — 902
  7. Kings — 737
  8. Qns — 414
  9. ST — 403
  10. Bronx — 395
  11. MN — 314
  12. R — 116
  13. Rich — 7

violation_in_front_of_or_opposite categorical

46.9% null
rows10,000
null4,685 (46.9%)
unique4
top_valueF
top_rate0.668
cardinality4
entropy1.229
entropy_ratio0.615
Top values (rank 1–20)
  1. F — 3,553
  2. O — 1,140
  3. I — 621
  4. R — 1

street_name text

28 languages detected in sample 77.3% rows are all-caps 68.8% duplicate strings
rows10,000
null4 (0.0%)
unique3,115
len_min2
len_max20
len_mean14.872
len_median16.000
len_p9520.000
word_mean3.513
word_median3.000
n_empty0
n_duplicates6,881
duplicate_rate0.688
vocab_size1,760
readability_flesch_mean62.546
emoji_rate0.000
url_rate0.000
one_word_rate0.014
allcaps_rate0.773
boilerplate_rate0.000
Sample values (first 10)
  1. BEACH 145 ST
  2. WB HYLAN BLVD @ LUTE
  3. NB KNAPP ST @ ALLEN
  4. EB NORTHERN BLVD @ A
  5. E 45th St
  6. EB FOREST AVE @ CRYS
  7. 10th Ave
  8. SB WATERS PL @ BRONX
  9. WB SEAGIRT BLVD @ B
  10. S/W C/O 173 ST

intersecting_street text

80.9% rows are all-caps 48.1% null 95th-percentile length under 20 chars 72.8% duplicate strings
rows10,000
null4,813 (48.1%)
unique1,413
len_min1
len_max20
len_mean9.271
len_median8.000
len_p9519.000
word_mean2.423
word_median2.000
n_empty0
n_duplicates3,774
duplicate_rate0.728
vocab_size1,053
readability_flesch_mean83.035
emoji_rate0.000
url_rate0.000
one_word_rate0.114
allcaps_rate0.809
boilerplate_rate0.000
Sample values (first 10)
  1. 42 ST
  2. I
  3. N ST
  4. OOD RD
  5. SHAD CREEK RD
  6. @ 27TH ST
  7. GERBOARD RD
  8. ALL AVE
  9. LAMEDA AVE
  10. STEINWAY

date_first_observed categorical

125 singleton categories top value is 98.3% of rows
rows10,000
null0 (0.0%)
unique147
top_value0
top_rate0.983
cardinality147
entropy0.249
entropy_ratio0.035
Top values (rank 1–20)
  1. 0 — 9,827
  2. 20251229 — 6
  3. 20261006 — 3
  4. 20260901 — 3
  5. 20261106 — 2
  6. 20261101 — 2
  7. 20260927 — 2
  8. 20260925 — 2
  9. 20260919 — 2
  10. 20260911 — 2
  11. 20260907 — 2
  12. 20260829 — 2
  13. 20250828 — 2
  14. 20260629 — 2
  15. 20240601 — 2
  16. 20260520 — 2
  17. 20260426 — 2
  18. 20260328 — 2
  19. 20260327 — 2
  20. 20260314 — 2

law_section categorical

rows10,000
null0 (0.0%)
unique2
top_value408
top_rate0.558
cardinality2
entropy0.990
entropy_ratio0.990
Top values (rank 1–20)
  1. 408 — 5,584
  2. 1180 — 4,416

sub_division categorical

rows10,000
null2 (0.0%)
unique76
top_valueB
top_rate0.444
cardinality76
entropy3.227
entropy_ratio0.517
Top values (rank 1–20)
  1. B — 4,444
  2. d1 — 1,428
  3. C — 781
  4. E2 — 779
  5. F1 — 314
  6. F2 — 240
  7. D — 222
  8. C3 — 183
  9. K2 — 124
  10. J2 — 123
  11. J6 — 122
  12. k4 — 120
  13. J3 — 109
  14. e2 — 94
  15. C4 — 84
  16. E5 — 81
  17. E3 — 61
  18. c — 61
  19. K6 — 57
  20. n8 — 56

days_parking_in_effect categorical

50.6% null
rows10,000
null5,059 (50.6%)
unique21
top_valueYYYYYYY
top_rate0.424
cardinality21
entropy2.025
entropy_ratio0.461
Top values (rank 1–20)
  1. YYYYYYY — 2,094
  2. BBBBBBB — 1,449
  3. Y Y — 771
  4. Y — 495
  5. Y Y Y — 25
  6. YYYYY — 23
  7. YYYYYY — 21
  8. YYYYYBB — 19
  9. YYYYYYB — 8
  10. Y Y — 7
  11. YY YY — 6
  12. BYBBYBB — 4
  13. BBBBBYB — 3
  14. YBBYBBB — 3
  15. Y Y — 3
  16. BYBBBBB — 2
  17. BBYBBBB — 2
  18. YBBBBBB — 2
  19. BBBYBBB — 2
  20. BYBYBYB — 1

from_hours_in_effect categorical

68.3% null
rows10,000
null6,830 (68.3%)
unique26
top_valueALL
top_rate0.456
cardinality26
entropy2.956
entropy_ratio0.629
Top values (rank 1–20)
  1. ALL — 1,445
  2. 1130A — 303
  3. 0830A — 242
  4. 0930A — 220
  5. 0900A — 161
  6. 0800A — 156
  7. 0730A — 119
  8. 0700A — 116
  9. 1200A — 102
  10. 1100A — 83
  11. 0300A — 61
  12. 0200P — 51
  13. 0900P — 31
  14. 1000A — 17
  15. 1200P — 14
  16. 0600A — 13
  17. 0400P — 8
  18. 1130P — 8
  19. 0200A — 8
  20. 0100P — 5

to_hours_in_effect categorical

68.3% null
rows10,000
null6,830 (68.3%)
unique31
top_valueALL
top_rate0.456
cardinality31
entropy3.079
entropy_ratio0.621
Top values (rank 1–20)
  1. ALL — 1,445
  2. 0100P — 299
  3. 1100A — 219
  4. 1000A — 212
  5. 1030A — 154
  6. 0800A — 118
  7. 0600A — 117
  8. 0300A — 97
  9. 0930A — 95
  10. 0700P — 85
  11. 1230P — 85
  12. 0830A — 45
  13. 0500A — 35
  14. 0900A — 27
  15. 1200A — 23
  16. 0600P — 22
  17. 0130P — 19
  18. 0400P — 15
  19. 0500P — 11
  20. 1000P — 11

vehicle_color categorical

rows10,000
null943 (9.4%)
unique99
top_valueGY
top_rate0.230
cardinality99
entropy3.676
entropy_ratio0.554
Top values (rank 1–20)
  1. GY — 2,079
  2. BK — 1,784
  3. WH — 1,579
  4. BL — 631
  5. RD — 348
  6. WHITE — 347
  7. BLK — 275
  8. BLACK — 273
  9. GREY — 239
  10. GRY — 167
  11. GR — 148
  12. BLUE — 131
  13. RED — 124
  14. GRAY — 113
  15. SILVE — 99
  16. WHT — 65
  17. YW — 62
  18. BR — 60
  19. WHI — 57
  20. BLU — 46

unregistered_vehicle categorical

84.9% null top value is 100.0% of rows
rows10,000
null8,487 (84.9%)
unique1
top_value0
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. 0 — 1,513

vehicle_year categorical

rows10,000
null0 (0.0%)
unique39
top_value0
top_rate0.233
cardinality39
entropy4.134
entropy_ratio0.782
Top values (rank 1–20)
  1. 0 — 2,329
  2. 2024 — 759
  3. 2025 — 696
  4. 2023 — 569
  5. 2019 — 512
  6. 2022 — 489
  7. 2021 — 472
  8. 2018 — 448
  9. 2020 — 442
  10. 2017 — 431
  11. 2016 — 372
  12. 2015 — 358
  13. 2014 — 285
  14. 2013 — 283
  15. 2012 — 250
  16. 2011 — 199
  17. 2010 — 178
  18. 2008 — 168
  19. 2026 — 136
  20. 2007 — 119

meter_number categorical

4 singleton categories 84.8% null top value is 99.5% of rows
rows10,000
null8,479 (84.8%)
unique7
top_value-
top_rate0.995
cardinality7
entropy0.061
entropy_ratio0.022
Top values (rank 1–20)
  1. - — 1,513
  2. 309485 — 2
  3. 103553 — 2
  4. 109720 — 1
  5. 424475 — 1
  6. 106506 — 1
  7. 101339 — 1

feet_from_curb categorical

top value is 97.8% of rows
rows10,000
null0 (0.0%)
unique11
top_value0
top_rate0.978
cardinality11
entropy0.217
entropy_ratio0.063
Top values (rank 1–20)
  1. 0 — 9,783
  2. 1 — 42
  3. 2 — 40
  4. 3 — 35
  5. 5 — 30
  6. 4 — 18
  7. 6 — 15
  8. 10 — 12
  9. 8 — 12
  10. 7 — 10
  11. 9 — 3

house_number text

99.8% rows are a single word 78.9% rows are all-caps 47.7% null 95th-percentile length under 20 chars 55.1% duplicate strings
rows10,000
null4,771 (47.7%)
unique2,350
len_min1
len_max7
len_mean3.360
len_median3.000
len_p955.000
word_mean1.002
word_median1.000
n_empty0
n_duplicates2,879
duplicate_rate0.551
vocab_size2,345
readability_flesch_mean116.144
emoji_rate0.000
url_rate0.000
one_word_rate0.998
allcaps_rate0.789
boilerplate_rate0.000
Sample values (first 10)
  1. 25-94
  2. 1615
  3. 2250
  4. 37-12
  5. 71-17
  6. 9224
  7. 18
  8. 227
  9. 20
  10. 900

time_first_observed categorical

187 singleton categories 78.5% null
rows10,000
null7,845 (78.5%)
unique207
top_value00000
top_rate0.894
cardinality207
entropy1.299
entropy_ratio0.169
Top values (rank 1–20)
  1. 00000 — 1,926
  2. 0230A — 3
  3. 0930P — 3
  4. 0505P — 3
  5. 1115P — 3
  6. 0900P — 2
  7. 1150P — 2
  8. 0920A — 2
  9. 1000A — 2
  10. 0235A — 2
  11. 0645P — 2
  12. 0252A — 2
  13. 0110A — 2
  14. 0955A — 2
  15. 0905P — 2
  16. 0456P — 2
  17. 1149A — 2
  18. 1251A — 2
  19. 1030P — 2
  20. 0119A — 2

violation_description categorical

rows10,000
null1,513 (15.1%)
unique74
top_valuePHTO SCHOOL ZN SPEED VIOLATION
top_rate0.520
cardinality74
entropy2.808
entropy_ratio0.452
Top values (rank 1–20)
  1. PHTO SCHOOL ZN SPEED VIOLATION — 4,416
  2. No Parking Street Cleaning — 1,428
  3. 14-No Standing — 598
  4. 40-Fire Hydrant — 463
  5. 20A-No Parking (Non-COM) — 131
  6. 19-No Stand (bus stop) — 123
  7. 16A-No Std (Com Veh) Non-COM — 117
  8. 46A-Double Parking (Non-COM) — 106
  9. Detached Trailer — 105
  10. Fire Hydrant — 94
  11. 71A-Insp Sticker Expired (NYS) — 77
  12. 70A-Reg. Sticker Expired (NYS) — 66
  13. No Standing — 61
  14. Missing Equipment — 56
  15. 50-Crosswalk — 53
  16. 74-Missing Display Plate — 39
  17. 13-No Stand (taxi stand) — 38
  18. 17-No Stand (exc auth veh) — 38
  19. Double Parking — 32
  20. 98-Obstructing Driveway — 31

violation_post_code categorical

78.7% null
rows10,000
null7,874 (78.7%)
unique53
top_value99
top_rate0.145
cardinality53
entropy5.089
entropy_ratio0.889
Top values (rank 1–20)
  1. 99 — 309
  2. 01 — 207
  3. 311 — 135
  4. SPCL — 85
  5. 06 — 71
  6. 10 — 66
  7. B — 57
  8. 17 — 56
  9. U — 53
  10. 15 — 49
  11. 04 — 48
  12. 05 — 48
  13. 16 — 45
  14. 11 — 42
  15. A — 40
  16. I — 35
  17. 10-P — 35
  18. 38 — 34
  19. 03-A — 34
  20. 12 — 33

violation_legal_code categorical

55.8% null top value is 100.0% of rows
rows10,000
null5,584 (55.8%)
unique1
top_valueT
top_rate1.000
cardinality1
entropy-0.000
entropy_ratio0.000
Top values (rank 1–20)
  1. T — 4,416