saturn·

api auth

source /home/coolhand/data/api_auth.db 502 rows 11 columns profiled 2026-05-01 raw JSON static .html .ipynb Report Notebook

Reading

dataset summary · high confidence anthropic:claude-opus-4-7

This dataset contains 502 API request logs across 11 columns, capturing usage telemetry like response time, status code, endpoint, and user agent. Traffic is dominated by a single API ('linguistic-api' at 99.6%) and a single method (GET), with all requests coming from one IP (127.0.0.1), so the interesting variation lives in endpoint, response_time_ms, status_code, and user_agent. Response times are heavily skewed: the median is just 3ms but the mean is 163ms with a max of 1238ms and 78 outliers (~24%), plus a 34% null rate worth investigating. Status codes split between 200 and 429, hinting at rate-limiting behavior. The endpoint column has a long tail of 209 distinct paths, with /api/languages and /api/search leading.

citing: response_time_ms · api_name · endpoint · status_code · user_agent · method · ip_address · timestamp

Schema

11 columns
Per-column summary. Click column name to jump to its detail.
Alerts
usage_id numeric 0.0% 502
key_id unknown 0.0%
skipped
api_name categorical 0.0% 2
imbalance
endpoint categorical 0.0% 209
long_tail
method categorical 0.0% 1
imbalance
status_code numeric 0.0% 2
response_time_ms numeric 34.5% 83
null_rate outliers
cache_hit numeric 0.0% 1
constant
ip_address categorical 0.0% 1
imbalance
user_agent categorical 0.0% 12
timestamp categorical 0.0% 299
long_tail

usage_id

numeric identifier
A monotonic surrogate key: 502 unique values across 502 rows with no nulls, ranging from 1 to 502 and a perfectly symmetric mean and median of 251.5. Skew is 0.0 and there are no outliers, consistent with a sequential row identifier rather than a measured quantity. Treatment: Drop from modelling; retain as a join key. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
502
min
1
max
502
mean
251.5
median
251.5
std
145.1
q1
126.2
q3
376.8
iqr
250.5
skew
0
kurtosis
-1.2
n_outliers
0
outlier_rate
0
zero_rate
0

key_id

unknown identifier skipped
The column 'key_id' was skipped by the profiler, so no type, uniqueness, or distributional statistics are available beyond a row count of 502 and a null rate of 0.0. The name suggests an identifier, but without n_unique or sample values this cannot be confirmed from the evidence. Treatment: Re-profile with type inference enabled before deciding; if confirmed unique, use as a join key and exclude from modelling. low · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique

api_name

categorical metadata imbalance
This is a categorical API identifier with only 2 distinct values across 502 rows. It is overwhelmingly dominated by 'linguistic-api' (500 rows, 99.6%), with 'blissAPI' appearing just twice, yielding near-zero entropy (0.037). The column is effectively a constant with two anomalous records. Treatment: Drop as a near-constant feature, or isolate the 2 'blissAPI' rows for inspection. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
2
top_value
linguistic-api
top_rate
0.996
cardinality
2
entropy
0.0375
entropy_ratio
0.0375

endpoint

categorical feature long_tail
This column records API endpoint paths, with 209 unique routes across 502 requests and no nulls. Traffic is spread fairly evenly (entropy ratio 0.828), though /api/languages leads at 11.6% of hits, followed by /api/search (36) and /api/stats (24); a long tail of rarely-hit routes triggers the alert. Notably /api/languages/NOPE appears 11 times, suggesting either a probing client or a broken reference worth investigating. Treatment: Group rare endpoints into an 'other' bucket before encoding, and inspect the /NOPE hits separately. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
209
top_value
/api/languages
top_rate
0.1155
cardinality
209
entropy
6.383
entropy_ratio
0.8282

method

categorical metadata imbalance
This column records the HTTP method, but every one of the 502 rows is "GET" — cardinality is 1 and entropy is 0. It carries no information for any downstream model or segmentation. Treatment: Drop; constant column with a single value. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
1
top_value
GET
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

status_code

numeric feature
This column holds HTTP status codes, taking only 2 distinct values across 502 rows: 200 (success) and 429 (rate-limited), with 200 as the median and 429 as Q3. The mean of 278.9 implies roughly a third of requests were throttled, which is a notable failure rate worth investigating. No nulls or outliers, and the bimodal shape is reflected in the negative kurtosis (-1.57). Treatment: Recode as a binary success/throttled flag before modelling. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
2
min
200
max
429
mean
278.9
median
200
std
108.9
q1
200
q3
429
iqr
229
skew
0.6539
kurtosis
-1.572
n_outliers
0
outlier_rate
0
zero_rate
0

response_time_ms

numeric feature null_rate outliers
This column captures response times in milliseconds for 502 records. The distribution is severely right-skewed (skew 1.90, kurtosis 1.74): the median is just 3 ms and Q3 is 21 ms, yet the mean is 162.84 ms and the max reaches 1238 ms, with std at 345.34. Two analyst-relevant flags: 34.46% of rows are null, and 23.71% (78 values) fall outside the IQR fence. Treatment: Log-transform and impute the 34% nulls before modelling. high · anthropic:claude-opus-4-7
n
502
nulls
173 (34.5%)
unique
83
min
2
max
1,238
mean
162.8
median
3
std
345.3
q1
3
q3
21
iqr
18
skew
1.903
kurtosis
1.744
n_outliers
78
outlier_rate
0.2371
zero_rate
0

cache_hit

numeric feature constant
This is a numeric flag named cache_hit, presumably a 0/1 indicator of whether a cache lookup succeeded. Across all 502 rows it is constant at 0 (zero_rate 1.0, n_unique 1, std 0.0), meaning no cache hit was ever recorded. That is either a broken instrumentation path or a workload where caching is disabled. Treatment: Drop; provides no signal while constant. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
1
min
0
max
0
mean
0
median
0
std
0
q1
0
q3
0
iqr
0
skew
0
kurtosis
0
n_outliers
0
outlier_rate
0
zero_rate
1

ip_address

categorical metadata imbalance
This column records an IP address but holds the loopback value 127.0.0.1 for all 502 rows, with zero nulls and cardinality of 1. Entropy is 0.0, so the field carries no information and looks like a placeholder or a logging artefact rather than a real client IP. Treatment: Drop, constant column with no signal. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
1
top_value
127.0.0.1
top_rate
1
cardinality
1
entropy
0
entropy_ratio
0

user_agent

categorical metadata
HTTP User-Agent strings from request logs, with only 12 distinct values across 502 rows and no nulls. Traffic is dominated by non-browser clients: Werkzeug/3.1.4 (the Flask dev server's default UA) at 215 hits and GoogleOther at 196, together covering ~82% of requests, with curl and assorted bots (Bytespider, Applebot, Googlebot, bingbot, GPTBot) making up most of the rest. Genuine human browser traffic appears negligible — only a single Firefox hit and a handful of mobile Safari/Chrome entries. Treatment: Parse into client-family/bot-flag features rather than using the raw string. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
12
top_value
Werkzeug/3.1.4
top_rate
0.4283
cardinality
12
entropy
1.962
entropy_ratio
0.5473

timestamp

categorical timestamp long_tail
This is a timestamp column stored as strings, with 299 unique values across 502 rows and no nulls. The distribution is unusually clumpy for a timestamp: six values on 2026-04-17 each repeat 14-38 times, accounting for roughly 189 rows, while most other timestamps appear only a handful of times. That burst pattern suggests batched events or a logging artifact rather than free-flowing event time. Treatment: Parse to datetime and derive features (hour, day, gap-to-previous); investigate the 2026-04-17 burst before using as an index. high · anthropic:claude-opus-4-7
n
502
nulls
0 (0.0%)
unique
299
top_value
2026-04-17 10:06:57
top_rate
0.0757
cardinality
299
entropy
6.98
entropy_ratio
0.8488