【哈希表, 二分查找】匹配子序列的单词数-ガヴのサイト

题目

给定字符串 s 和字符串数组 words, 返回 words[i] 中是s的子序列的单词个数 。

字符串的 子序列 是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。

例如， “ace” 是 “abcde” 的子序列。

示例 1:

输入: s = "abcde", words = ["a","bb","acd","ace"]
输出: 3
解释: 有三个是 s 的子序列的单词: "a", "acd", "ace"。

Example 2:

输入: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
输出: 2

提示:

1 <= s.length <= 5 * 10^4
1 <= words.length <= 5000
1 <= words[i].length <= 50
words[i]和 s 都只由小写字母组成。

解题

方法一：哈希表二分查找

思路

一种朴素的做法是：枚举 words 中每个字符串，每次从头开始在 s 中匹配子序列。这样做的时间复杂度是 $O(NM)$ （ $N$ 是 words 的长度， $M$ 是 s 的长度），会超时。

要求出 words 中是 s 子序列的字符串数量势必要遍历 words，那么优化就只能从匹配子序列的过程下手了，我们发现遍历匹配子序列的时候会在无用字符的身上浪费大量时间，比如 s="axxxxxxxxb"，words[i]="ab"，可以判断子序列的过程中枚举到了中间的 $8$ 个 'x' 。要优化这一点我们可以维护一个哈希表(mp)，然后预处理 s，记录每个字符在 s 中出现的下标，然后在匹配子序列时二分查找当前字符在 s 中出现的大于「上一个字符在 s 中的下标」的第一个下标，如果找不到就说明匹配失败，直接跳到下一个字符串匹配。

代码

class Solution {
    Map<Character, List<Integer>> mp = new HashMap<>(); 

    public int numMatchingSubseq(String s, String[] words) {
        for (int i = 0; i < s.length(); ++i) {
            char ch = s.charAt(i);
            mp.putIfAbsent(ch, new ArrayList<>());
            mp.get(ch).add(i);
        }
        int cnt = 0;
        outer: for (String word : words) {
            int prev = -1;
            for (char ch : word.toCharArray()) {
                if (mp.get(ch) == null) continue outer;
                int idx = geq(mp.get(ch), prev + 1);
                if (idx == -1) continue outer;
                prev = idx;
            }
            ++cnt;
        }
        return cnt;
    }

    int geq(List<Integer> lst, int x) {
        int l = 0, r = lst.size() - 1;
        while (l < r) {
            int mid = l + r >> 1;
            if (lst.get(mid) >= x) r = mid;
            else l = mid + 1;
        }
        return lst.get(l) >= x ? lst.get(l) : -1;
    }
}

目录CONTENT

【哈希表, 二分查找】匹配子序列的单词数

题目

解题

方法一：哈希表二分查找

思路

代码

评论区

【哈希表, 二分查找】匹配子序列的单词数

题目

解题

方法一：哈希表 二分查找

思路

代码

评论区

方法一：哈希表二分查找