资讯专栏INFORMATION COLUMN

[LeetCode/LintCode] Top K Frequent Words

0x584a / 3128人阅读

LeetCode version Problem

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:
Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]
Explanation: "i" and "love" are the two most frequent words.

Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:
Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]
Explanation: "the", "is", "sunny" and "day" are the four most frequent words,

with the number of occurrence being 4, 3, 2 and 1 respectively.

Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Input words contain only lowercase letters.
Follow up:
Try to solve it in O(n log k) time and O(n) extra space.

Solution
class Solution {
    public List topKFrequent(String[] words, int k) {
        List res = new ArrayList<>();
        if (words.length < k) return res;
        Map map = new HashMap<>();
        for (String word: words) {
            if (!map.containsKey(word)) map.put(word, 1);
            else map.put(word, map.get(word)+1);
        }
        PriorityQueue> queue = new PriorityQueue<>(
            (a, b) -> a.getValue() == b.getValue() ? b.getKey().compareTo(a.getKey()) : a.getValue() - b.getValue()
        );
        for (Map.Entry entry: map.entrySet()) {
            queue.offer(entry);
            if (queue.size() > k) queue.poll();
        }
        while (!queue.isEmpty()) {
            res.add(0, queue.poll().getKey());
        }
        return res;
    }
}
LintCode version Problem

Find top k frequent words with map reduce framework.

The mapper"s key is the document id, value is the content of the document, words in a document are split by spaces.

For reducer, the output should be at most k key-value pairs, which are the top k words and their frequencies in this reducer. The judge will take care about how to merge different reducers" results to get the global top k frequent words, so you don"t need to care about that part.

The k is given in the constructor of TopK class.

Notice

For the words with same frequency, rank them with alphabet.

/**
 * Definition of OutputCollector:
 * class OutputCollector {
 *     public void collect(K key, V value);
 *         // Adds a key/value pair to the output buffer
 * }
 * Definition of Document:
 * class Document {
 *     public int id;
 *     public String content;
 * }
 */
Example

Given document A =

lintcode is the best online judge
I love lintcode
and document B =

lintcode is an online judge for coding interview
you can test your code online at lintcode
The top 2 words and their frequencies should be

lintcode, 4
online, 3

Tags

Map Reduce

Solution
// Use Pair to store k-v pair
class Pair {
    String key;
    int value;

    Pair(String k, int v) {
        this.key = k;
        this.value = v;
    }
}

public class TopKFrequentWords {

    public static class Map {
        public void map(String _, Document value,
                        OutputCollector output) {
            // Output the results into output buffer.
            // Ps. output.collect(String key, int value);
            
            String content = value.content;
            String[] words = content.split(" ");
            for (String word : words) {
                if (word.length() > 0) {
                    output.collect(word, 1);
                }
            }
        }
    }

    public static class Reduce {
        private PriorityQueue Q = null;
        private int k;

        private Comparator pairComparator = new Comparator() {
            public int compare(Pair o1, Pair o2) {
                if (o1.value != o2.value) {
                    return o1.value - o2.value;
                }
                //if the values are equal, compare keys
                return o2.key.compareTo(o1.key);
            }
        };

        public void setup(int k) {
            // initialize your data structure here
            this.k = k;
            Q = new PriorityQueue(k, pairComparator);
        }

        public void reduce(String key, Iterator values) {
            int sum = 0;
            while (values.hasNext()) {
                    sum += values.next();
            }

            Pair pair = new Pair(key, sum);
            if (Q.size() < k) {
                Q.add(pair);
            } else {
                Pair peak = Q.peek();
                if (pairComparator.compare(pair, peak) > 0) {
                    Q.poll();
                    Q.add(pair);
                }
            }
        }

        public void cleanup(OutputCollector output) {
            // Output the top k pairs  into output buffer.
            // Ps. output.collect(String key, Integer value);
            List pairs = new ArrayList();
            while (!Q.isEmpty()) {
                pairs.add(Q.poll());
            }

            // reverse result
            int n = pairs.size();
            for (int i = n - 1; i >= 0; --i) {
                Pair pair = pairs.get(i);
                output.collect(pair.key, pair.value);
            }
            
            // while (!Q.isEmpty()) {
            //     Pair pair = Q.poll();
            //     output.collect(pair.key, pair.value);
            // }
        }
    }
}

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/68159.html

相关文章

  • [LeetCode/LintCode] Sentence Similarity

    Problem Given two sentences words1, words2 (each represented as an array of strings), and a list of similar word pairs pairs, determine if two sentences are similar. For example, great acting skills a...

    dreamtecher 评论0 收藏0
  • [LeetCode/LintCode] Word Ladder

    摘要:使用,利用其按层次操作的性质,可以得到最优解。这样可以保证这一层被完全遍历。每次循环取出的元素存为新的字符串。一旦找到和相同的字符串,就返回转换序列长度操作层数,即。 Problem Given two words (start and end), and a dictionary, find the length of shortest transformation sequence...

    张金宝 评论0 收藏0
  • LeetCode 347. Top K Frequent Elements

    摘要:描述给定一个非空的整数数组,返回其中出现频率前高的元素。然后以元素出现的次数为值,统计该次数下出现的所有的元素。从最大次数遍历到次,若该次数下有元素出现,提取该次数下的所有元素到结果数组中,知道提取到个元素为止。 Description Given a non-empty array of integers, return the k most frequent elements. E...

    elva 评论0 收藏0
  • [LeetCode] Top K Frequent Elements

    Problem Given a non-empty array of integers, return the k most frequent elements. Example Given [1,1,1,2,2,3] and k = 2, return [1,2]. Note You may assume k is always valid, 1 ≤ k ≤ number of unique e...

    jkyin 评论0 收藏0
  • leetcode347. Top K Frequent Elements

    摘要:题目要求假设有一个非空的整数数组,从中获得前个出现频率最多的数字。先用来统计出现次数,然后将其丢到对应的桶中,最后从最高的桶开始向低的桶逐个遍历,取出前个频率的数字。 题目要求 Given a non-empty array of integers, return the k most frequent elements. For example, Given [1,1,1,2,2,...

    imccl 评论0 收藏0

发表评论

0条评论

最新活动
阅读需要支付1元查看
<