资讯专栏INFORMATION COLUMN

1093-大样本统计

3fuyu / 3644人阅读

摘要:前言的大样本统计我们对到之间的整数进行采样,并将结果存储在数组中就是整数的采样个数。我们以浮点数数组的形式,分别返回样本的最小值最大值平均值中位数和众数。

前言

Weekly Contest 142的 大样本统计:

我们对 0255 之间的整数进行采样,并将结果存储在数组 count 中:count[k] 就是整数 k 的采样个数。

我们以 浮点数 数组的形式,分别返回样本的最小值、最大值、平均值、中位数和众数。其中,众数是保证唯一的。

我们先来回顾一下中位数的知识:

如果样本中的元素有序,并且元素数量为奇数时,中位数为最中间的那个元素;

如果样本中的元素有序,并且元素数量为偶数时,中位数为中间的两个元素的平均值。

示例1:

输入:count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
输出:[1.00000,3.00000,2.37500,2.50000,3.00000]

示例2:

输入:count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
输出:[1.00000,4.00000,2.18182,2.00000,1.00000]

提示:

count.length == 256

1 <= sum(count) <= 10^9

计数表示的众数是唯一的

答案与真实值误差在 10^-5 以内就会被视为正确答案

解题思路

本地难度为中等,首先需要读懂题目意思,本题的入参数组count其实算是一个压缩数据后的数组。

我们对 0255 之间的整数进行采样,并将结果存储在数组 count 中:count[k] 就是整数 k 的采样个数。

简单来说就是,数组count的第k个元素就是k在压缩前的数组中出现count[k]个。以示例1count为例

[0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]

解压的过程如下:

第0个元素为0,则解压后的数组为[]
第1个元素为4,则解压后的数组为[1,1,1,1]
第2个元素为3,则解压后的数组为[1,1,1,1,2,2,2]
第3个元素为2,则解压后的数组为[1,1,1,1,2,2,2,3,3]
第4个元素为2,则解压后的数组为[1,1,1,1,2,2,2,3,3,4,4]
......
省略后续步骤

搞清楚count的数据特征后,选择使用TreeMapcount进行处理,将有效数字及其出现个数存储起来(有效数字指的是count[k]不为0的元素)。根据就是根据题目要求分别处理以下指标:

最小值:TreeMap中第一个key

最大值:TreeMap中最后一个key

平均值:TreeMapkey之和除以value之和

中位数:

计算出数组实际的元素个数(即value之和)

根据元素个数的奇偶性,获取对应的值

众数:出现次数最多的数字,即TreeMapvalue最大的键值对的key

实现代码
    /**
     * 1093. 大样本统计
     *
     * @param count
     * @return
     */
    public double[] sampleStats(int[] count) {
        // 使用TreeMap有序存储数字及其出现次数
        TreeMap countMap = new TreeMap<>();
        double[] result = new double[5];
        // 总和
        double sum = 0L;
        // 数字出现总次数
        double total = 0L;
        // 最大出现次数
        long maxTimes = 0;
        // 最小值
        double min;
        // 最大值
        double max;
        // 平均值
        double average;
        // 中位数
        double middle = 0;
        // 众数,出现次数最多的数字
        double mode = 0;
        for (int i = 0; i < count.length; i++) {
            if (count[i] != 0) {
                countMap.put(i, count[i]);
                sum = sum + i * count[i];
                total += count[i];
                if (count[i] > maxTimes) {
                    maxTimes = count[i];
                    mode = i;
                }
            }
        }
        min = countMap.firstKey().doubleValue();
        max = countMap.lastKey().doubleValue();
        average = sum / total;
        // 是否为奇数
        boolean odd = total % 2 != 0;
        // 中位数索引
        int middleIndex = (int) ((total - 1) / 2);
        int index = -1;
        Iterator> it = countMap.entrySet().iterator();
        while (it.hasNext()) {
            Map.Entry entry = it.next();
            int num = entry.getKey();
            int times = entry.getValue();
            index += times;
            if (index > middleIndex) {
                middle = num;
                break;
            } else if (index == middleIndex) {
                if (odd) {
                    middle = num;
                    break;
                } else {
                    middle = (num + it.next().getKey()) / 2.0;
                    break;
                }
            }
        }
        result[0] = min;
        result[1] = max;
        result[2] = average;
        result[3] = middle;
        result[4] = mode;
        return result;
    }

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/77875.html

相关文章

  • 理论到实践,A/B测试不得不直面的4个统计学问题

    摘要:样本均值的方差是总体方差的为样本容量,这个结论是针对有放回抽样的。某些情况下配对样本比较难实现,比如药物双盲试验,患者不能既服用安慰剂又服用药物。样本方差和总体方差的比值,符合分布。 有放回?无放回? 从总体中随机抽取一个容量为n的样本,当样本容量 n足够大(通常要求n ≥30)时,无论总体是否符合正态分布,样本均值都会趋于正态分布。期望和总体相同,方差为总体的1/n。这即是中心极限定...

    snifes 评论0 收藏0
  • 模型评价(一) AUC

    摘要:问题是什么能拿来干什么如何求解深入理解是什么混淆矩阵混淆矩阵是理解大多数评价指标的基础,毫无疑问也是理解的基础。内容的召回往往是根据的排序而决定的。 问题: AUC是什么 AUC能拿来干什么 AUC如何求解(深入理解AUC) AUC是什么 混淆矩阵(Confusion matrix) 混淆矩阵是理解大多数评价指标的基础,毫无疑问也是理解AUC的基础。丰富的资料介绍着混淆矩阵的概念,...

    SoapEye 评论0 收藏0
  • 分享一个超详细的数据分析案例【Python】附ABTest详细介绍

    摘要:确定分流方案使用各类平台分配流量。备择假设与零假设相反,即实验者希望证实的假设。虽然该数据集的统计结果与支付宝的实际规模有偏差,但不影响解决方案的适用性。选定统计方法由于样本较大,故采用检验。 ...

    3fuyu 评论0 收藏0

发表评论

0条评论

3fuyu

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<