api auth
Reading
This dataset contains 502 API request logs across 11 columns, capturing usage telemetry like response time, status code, endpoint, and user agent. Traffic is dominated by a single API ('linguistic-api' at 99.6%) and a single method (GET), with all requests coming from one IP (127.0.0.1), so the interesting variation lives in endpoint, response_time_ms, status_code, and user_agent. Response times are heavily skewed: the median is just 3ms but the mean is 163ms with a max of 1238ms and 78 outliers (~24%), plus a 34% null rate worth investigating. Status codes split between 200 and 429, hinting at rate-limiting behavior. The endpoint column has a long tail of 209 distinct paths, with /api/languages and /api/search leading.
citing: response_time_ms · api_name · endpoint · status_code · user_agent · method · ip_address · timestamp
Charts the summary said to look at first
Show data table
| bin | count |
|---|---|
| 2 – 70.67 | 253 |
| 70.67 – 139.3 | 20 |
| 139.3 – 208 | 5 |
| 208 – 276.7 | 0 |
| 276.7 – 345.3 | 0 |
| 345.3 – 414 | 0 |
| 414 – 482.7 | 0 |
| 482.7 – 551.3 | 0 |
| 551.3 – 620 | 0 |
| 620 – 688.7 | 1 |
| 688.7 – 757.3 | 0 |
| 757.3 – 826 | 0 |
| 826 – 894.7 | 0 |
| 894.7 – 963.3 | 36 |
| 963.3 – 1032 | 9 |
| 1032 – 1101 | 2 |
| 1101 – 1169 | 1 |
| 1169 – 1238 | 2 |
Show data table
| bin | count |
|---|---|
| 200 – 210.4 | 329 |
| 210.4 – 220.8 | 0 |
| 220.8 – 231.2 | 0 |
| 231.2 – 241.6 | 0 |
| 241.6 – 252 | 0 |
| 252 – 262.5 | 0 |
| 262.5 – 272.9 | 0 |
| 272.9 – 283.3 | 0 |
| 283.3 – 293.7 | 0 |
| 293.7 – 304.1 | 0 |
| 304.1 – 314.5 | 0 |
| 314.5 – 324.9 | 0 |
| 324.9 – 335.3 | 0 |
| 335.3 – 345.7 | 0 |
| 345.7 – 356.1 | 0 |
| 356.1 – 366.5 | 0 |
| 366.5 – 377 | 0 |
| 377 – 387.4 | 0 |
| 387.4 – 397.8 | 0 |
| 397.8 – 408.2 | 0 |
| 408.2 – 418.6 | 0 |
| 418.6 – 429 | 173 |
Show data table
| value | count | share |
|---|---|---|
| /api/languages | 58 | 11.6% |
| /api/search | 36 | 7.2% |
| /api/stats | 24 | 4.8% |
| /api/languages/macroareas | 18 | 3.6% |
| /api/languages/by_macroarea/Africa | 16 | 3.2% |
| /api/languages/eng | 13 | 2.6% |
| /api/languages/spa/phonemes | 12 | 2.4% |
| /api/families/indo1319 | 12 | 2.4% |
| /api/typology/wals/parameters | 12 | 2.4% |
| /api/languages/NOPE | 11 | 2.2% |
| /api/languages/eng/features | 11 | 2.2% |
| /api/languages/cmn | 10 | 2.0% |
| /api/languages/eng/phonemes | 10 | 2.0% |
| /api/languages/NOPE/phonemes | 10 | 2.0% |
| /api/languages/NOPE/features | 10 | 2.0% |
| /api/families/NOPE | 10 | 2.0% |
| /api/typology/wals/map/81A | 8 | 1.6% |
| /api/typology/phonology/summary | 7 | 1.4% |
| /api/typology/phonology/inventory/eng | 7 | 1.4% |
| /api/typology/types/by_family | 6 | 1.2% |
Show data table
| value | count | share |
|---|---|---|
| Werkzeug/3.1.4 | 215 | 42.8% |
| Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; GoogleOther) | 196 | 39.0% |
| Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/) | 26 | 5.2% |
| curl/7.81.0 | 24 | 4.8% |
| Mozilla/5.0 (iPhone; CPU iPhone OS 26_5_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/147.0.7727.99 Mobile/15E148 Safari/604.1 | 14 | 2.8% |
| Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot) | 14 | 2.8% |
| Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.7680.177 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | 7 | 1.4% |
| Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/136.0.0.0 Safari/537.36 | 2 | 0.4% |
| Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:146.0) Gecko/20100101 Firefox/146.0 | 1 | 0.2% |
| Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot) | 1 | 0.2% |
| Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.116 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | 1 | 0.2% |
| Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.7727.137 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) | 1 | 0.2% |
Show data table
| value | count | share |
|---|---|---|
| linguistic-api | 500 | 99.6% |
| blissAPI | 2 | 0.4% |
Schema
11 columns| Alerts | ||||
|---|---|---|---|---|
| usage_id | numeric | 0.0% | 502 |
|
| key_id | unknown | 0.0% | — |
skipped
|
| api_name | categorical | 0.0% | 2 |
imbalance
|
| endpoint | categorical | 0.0% | 209 |
long_tail
|
| method | categorical | 0.0% | 1 |
imbalance
|
| status_code | numeric | 0.0% | 2 |
|
| response_time_ms | numeric | 34.5% | 83 |
null_rate
outliers
|
| cache_hit | numeric | 0.0% | 1 |
constant
|
| ip_address | categorical | 0.0% | 1 |
imbalance
|
| user_agent | categorical | 0.0% | 12 |
|
| timestamp | categorical | 0.0% | 299 |
long_tail
|
usage_id
numeric identifierA monotonic surrogate key: 502 unique values across 502 rows with no nulls, ranging from 1 to 502 and a perfectly symmetric mean and median of 251.5. Skew is 0.0 and there are no outliers, consistent with a sequential row identifier rather than a measured quantity. Treatment: Drop from modelling; retain as a join key.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 502
- min
- 1
- max
- 502
- mean
- 251.5
- median
- 251.5
- std
- 145.1
- q1
- 126.2
- q3
- 376.8
- iqr
- 250.5
- skew
- 0
- kurtosis
- -1.2
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
key_id
unknown identifier skippedThe column 'key_id' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 502 and a null rate of 0.0. The name suggests an identifier, but without n_unique or sample values this cannot be confirmed from the evidence. Treatment: Re-profile with type inference enabled before deciding; if confirmed unique, use as a join key and exclude from modelling.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- —
api_name
categorical metadata imbalanceThis is a categorical API identifier with only 2 distinct values across 502 rows. It is overwhelmingly dominated by 'linguistic-api' (500 rows, 99.6%), with 'blissAPI' appearing just twice, yielding near-zero entropy (0.037). The column is effectively a constant with two anomalous records. Treatment: Drop as a near-constant feature, or isolate the 2 'blissAPI' rows for inspection.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 2
- top_value
- linguistic-api
- top_rate
- 0.996
- cardinality
- 2
- entropy
- 0.0375
- entropy_ratio
- 0.0375
endpoint
categorical feature long_tailThis column records API endpoint paths, with 209 unique routes across 502 requests and no nulls. Traffic is spread fairly evenly (entropy ratio 0.828), though /api/languages leads at 11.6% of hits, followed by /api/search (36) and /api/stats (24); a long tail of rarely-hit routes triggers the alert. Notably /api/languages/NOPE appears 11 times, suggesting either a probing client or a broken reference worth investigating. Treatment: Group rare endpoints into an 'other' bucket before encoding, and inspect the /NOPE hits separately.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 209
- top_value
- /api/languages
- top_rate
- 0.1155
- cardinality
- 209
- entropy
- 6.383
- entropy_ratio
- 0.8282
method
categorical metadata imbalanceThis column records the HTTP method, but every one of the 502 rows is "GET" — cardinality is 1 and entropy is 0. It carries no information for any downstream model or segmentation. Treatment: Drop; constant column with a single value.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- GET
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
status_code
numeric featureThis column holds HTTP status codes, taking only 2 distinct values across 502 rows: 200 (success) and 429 (rate-limited), with 200 as the median and 429 as Q3. The mean of 278.9 implies roughly a third of requests were throttled, which is a notable failure rate worth investigating. No nulls or outliers, and the bimodal shape is reflected in the negative kurtosis (-1.57). Treatment: Recode as a binary success/throttled flag before modelling.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 2
- min
- 200
- max
- 429
- mean
- 278.9
- median
- 200
- std
- 108.9
- q1
- 200
- q3
- 429
- iqr
- 229
- skew
- 0.6539
- kurtosis
- -1.572
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 0
response_time_ms
numeric feature null_rate outliersThis column captures response times in milliseconds for 502 records. The distribution is severely right-skewed (skew 1.90, kurtosis 1.74): the median is just 3 ms and Q3 is 21 ms, yet the mean is 162.84 ms and the max reaches 1238 ms, with std at 345.34. Two analyst-relevant flags: 34.46% of rows are null, and 23.71% (78 values) fall outside the IQR fence. Treatment: Log-transform and impute the 34% nulls before modelling.
- n
- 502
- nulls
- 173 (34.5%)
- unique
- 83
- min
- 2
- max
- 1,238
- mean
- 162.8
- median
- 3
- std
- 345.3
- q1
- 3
- q3
- 21
- iqr
- 18
- skew
- 1.903
- kurtosis
- 1.744
- n_outliers
- 78
- outlier_rate
- 0.2371
- zero_rate
- 0
cache_hit
numeric feature constantThis is a numeric flag named cache_hit, presumably a 0/1 indicator of whether a cache lookup succeeded. Across all 502 rows it is constant at 0 (zero_rate 1.0, n_unique 1, std 0.0), meaning no cache hit was ever recorded. That is either a broken instrumentation path or a workload where caching is disabled. Treatment: Drop; provides no signal while constant.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 1
- min
- 0
- max
- 0
- mean
- 0
- median
- 0
- std
- 0
- q1
- 0
- q3
- 0
- iqr
- 0
- skew
- 0
- kurtosis
- 0
- n_outliers
- 0
- outlier_rate
- 0
- zero_rate
- 1
ip_address
categorical metadata imbalanceThis column records an IP address but holds the loopback value 127.0.0.1 for all 502 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and looks like a placeholder or a logging artefact rather than a real client IP. Treatment: Drop, constant column with no signal.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 1
- top_value
- 127.0.0.1
- top_rate
- 1
- cardinality
- 1
- entropy
- 0
- entropy_ratio
- 0
user_agent
categorical metadataHTTP User-Agent strings from request logs, with only 12 distinct values across 502 rows and no nulls. Traffic is dominated by non-browser clients: Werkzeug/3.1.4 (the Flask dev server's default UA) at 215 hits and GoogleOther at 196, together covering ~82% of requests, with curl and assorted bots (Bytespider, Applebot, Googlebot, bingbot, GPTBot) making up most of the rest. Genuine human browser traffic appears negligible — only a single Firefox hit and a handful of mobile Safari/Chrome entries. Treatment: Parse into client-family/bot-flag features rather than using the raw string.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 12
- top_value
- Werkzeug/3.1.4
- top_rate
- 0.4283
- cardinality
- 12
- entropy
- 1.962
- entropy_ratio
- 0.5473
timestamp
categorical timestamp long_tailThis is a timestamp column stored as strings, with 299 unique values across 502 rows and no nulls. The distribution is unusually clumpy for a timestamp: six values on 2026-04-17 each repeat 14-38 times, accounting for roughly 189 rows, while most other timestamps appear only a handful of times. That burst pattern suggests batched events or a logging artifact rather than free-flowing event time. Treatment: Parse to datetime and derive features (hour, day, gap-to-previous); investigate the 2026-04-17 burst before using as an index.
- n
- 502
- nulls
- 0 (0.0%)
- unique
- 299
- top_value
- 2026-04-17 10:06:57
- top_rate
- 0.0757
- cardinality
- 299
- entropy
- 6.98
- entropy_ratio
- 0.8488