Summary confidence: high
This dataset is a 2,000,072-row movie tag log from MovieLens (ml-32m/tags.csv) with four columns: a free-text tag, a timestamp, a userId, and a movieId. The tag column is the most interesting feature — it has only 140,981 unique values across 2M rows (a 92.95% duplicate rate) and 52.47% of tags are a single word, with 'sci-fi', 'atmospheric', and 'action' leading the list. The timestamp column is left-skewed (skew −1.22) toward more recent activity, suggesting tagging picked up in later years, and userId shows that tagging is concentrated among a subset of users (only 15,848 distinct userIds for 2M rows). Start by looking at the top tags and the timestamp distribution to understand what users tag and when.
citing: row_count · column_count · columns.tag.n_unique · columns.tag.stats.duplicate_rate · columns.tag.stats.one_word_rate · columns.tag.top_values · columns.timestamp.stats.skew · columns.timestamp.stats.median · columns.userId.n_unique · columns.movieId.n_unique