3103. 查找热门话题标签 II 🔒

题目描述

表：Tweets

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| user_id     | int     |
| tweet_id    | int     |
| tweet_date  | date    |
| tweet       | varchar |
+-------------+---------+
tweet_id 是这张表的主键 (值互不相同的列)。
这张表的每一行都包含 user_id, tweet_id, tweet_date 和 tweet。

编写一个解决方案来找到 2024 年二月的前 3 热门话题标签。一条推文可能含有 多个标签。

返回结果表，根据标签的数量和名称降序排序。

结果格式如下所示。

示例 1：

输入：

Tweets 表：

+---------+----------+------------------------------------------------------------+------------+
| user_id | tweet_id | tweet                                                      | tweet_date |
+---------+----------+------------------------------------------------------------+------------+
| 135     | 13       | Enjoying a great start to the day. #HappyDay #MorningVibes | 2024-02-01 |
| 136     | 14       | Another #HappyDay with good vibes! #FeelGood               | 2024-02-03 |
| 137     | 15       | Productivity peaks! #WorkLife #ProductiveDay               | 2024-02-04 |
| 138     | 16       | Exploring new tech frontiers. #TechLife #Innovation        | 2024-02-04 |
| 139     | 17       | Gratitude for today's moments. #HappyDay #Thankful         | 2024-02-05 |
| 140     | 18       | Innovation drives us. #TechLife #FutureTech                | 2024-02-07 |
| 141     | 19       | Connecting with nature's serenity. #Nature #Peaceful       | 2024-02-09 |
+---------+----------+------------------------------------------------------------+------------+

输出：

+-----------+-------+
| hashtag   | count |
+-----------+-------+
| #HappyDay | 3     |
| #TechLife | 2     |
| #WorkLife | 1     |
+-----------+-------+

解释：

#HappyDay：在 ID 为 13，14，17 的推文中出现，总共提及 3 次。
#TechLife：在 ID 为 16，18 的推文中出现，总共提及 2 次。
#WorkLife：在 ID 为 15 的推文中出现，总共提及 1 次。

注意：输出表分别按 count 和 hashtag 降序排序。

解法

方法一：正则匹配

我们可以使用正则表达式来匹配每条推文中的所有标签，然后统计每个标签的出现次数。最后，我们可以按标签出现的次数降序排序，如果出现次数相同，则按标签名称降序排序，返回前三个标签。

Python3

import pandas as pd


def find_trending_hashtags(tweets: pd.DataFrame) -> pd.DataFrame:
    # Filter tweets for February 2024
    tweets_feb_2024 = tweets[tweets["tweet_date"].between("2024-02-01", "2024-02-29")]

    # Extract hashtags from tweets
    hashtags = tweets_feb_2024["tweet"].str.findall(r"#\w+")

    # Flatten list of hashtags
    all_hashtags = [tag for sublist in hashtags for tag in sublist]

    # Count occurrences of each hashtag
    hashtag_counts = pd.Series(all_hashtags).value_counts().reset_index()
    hashtag_counts.columns = ["hashtag", "count"]

    # Sort by count of hashtag in descending order
    hashtag_counts = hashtag_counts.sort_values(
        by=["count", "hashtag"], ascending=[False, False]
    )

    # Get top 3 trending hashtags
    top_3_hashtags = hashtag_counts.head(3)

    return top_3_hashtags

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

3103. 查找热门话题标签 II 🔒

题目描述

解法

方法一：正则匹配

Python3

Files

README.md

Latest commit

History

README.md

File metadata and controls

3103. 查找热门话题标签 II 🔒

题目描述

解法

方法一：正则匹配

Python3