ai算法 测试_我们如何消除AI算法的偏见? 笔测试宣言

2023-10-28 08:50

本文主要是介绍ai算法 测试_我们如何消除AI算法的偏见? 笔测试宣言,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

ai算法 测试

信用评分AI中的性别偏见? (Gender bias in credit scoring AI?)

A few months ago a number of Apple Card users in the US were reporting that they and their partners have been allocated vastly different credit limits on the branded credit card, despite having the same income and credit score (see BBC article). Steve Wozniak, a co-founder of Apple, tweeted that his credit limit on the card was ten times higher than his wife’s, despite the couple having the same credit limit on all their other cards.

几个月前,美国的许多Apple Card用户报告说,尽管他们和其合作伙伴的收入和信用评分相同,但已为品牌信用卡分配了极为不同的信用额度(请参阅BBC文章 )。 苹果公司的联合创始人史蒂夫·沃兹尼亚克(Steve Wozniak)发推文说,尽管夫妻俩在所有其他卡上的信用额度都相同,但他在卡上的信用额度却比妻子高十倍。

The Department of Financial Services in New York, a financial services regulator, is investigating allegations that the users’ gender may be the base of the disparity. Apple is keen to point out that Goldman Sachs is responsible for the algorithm, seemingly at odds with Apple’s marketing slogan ‘Created by Apple, not a bank’.

纽约金融服务部(金融服务监管机构)正在调查有关用户性别可能是差距悬殊的指控。 苹果公司渴望指出,高盛是该算法的责任者,这似乎与苹果公司的营销口号“由苹果公司而不是银行创造”背道而驰。

Since the regulator’s investigation is ongoing and no bias has yet been proven, I am writing only in hypotheticals in this article.

由于监管机构的调查仍在进行中,尚未发现任何偏见,因此,本文仅以假设为基础进行撰写。

司法系统中使用AI的偏见 (Bias in AI used in the justice system)

hotrod die cast model on board 1422673
Pexels Pexels

The Apple Card story isn’t the only recent example of algorithmic bias hitting the headlines. In July last year, the NAACP (National Association for the Advancement of Colored People) in the US signed a statement requesting a moratorium on the use of automated decision-making tools, since some of them have been shown to have a racial bias when used to predict recidivism — in other words, how likely an offender is to re-offend.

Apple Card的故事并不是最新出现的算法偏差例子。 去年7月,美国全国有色人种促进协会(NAACP)签署了一项声明,要求暂停使用自动决策工具,因为其中一些工具在使用时已显示出种族偏见。预测累犯,换句话说,罪犯再次犯罪的可能性。

In 2013, Eric Loomis was sentenced to six years in prison, after the state of Wisconsin used a program called COMPAS to calculate his odds of committing another crime. COMPAS is a proprietary algorithm whose inner workings are known only to its vendor Equivant. Loomis attempted to challenge the use of the algorithm in Wisconsin’s Supreme Court but his challenge was ultimately denied.

2013年,在威斯康星州使用名为COMPAS的程序来计算其再次犯罪的几率之后,埃里克·洛米斯(Eric Loomis)被判入狱六年。 COMPAS是一种专有算法,其内部工作方式仅由其供应商Equivant知道。 鲁米斯试图在威斯康星州最高法院对算法的使用提出质疑,但他的挑战最终被拒绝。

Unfortunately, incidents such as these are only worsening the widely held perception of AI as a dangerous tool, opaque, under-regulated, capable of encoding the worst of society’s prejudices.

不幸的是,诸如此类的事件只会使人们普遍认为人工智能是一种危险的工具,它是不透明的,监管不足的,能够反映社会最严重的偏见。

如何使AI产生偏见,种族主义或偏见? 什么地方出了错? (How can an AI be prejudiced, racist, or biased? What went wrong?)

I will focus here on the example of a loan application, since it is a simpler problem to frame and analyse, but the points I make are generalisable to any kind of bias and protected category.

我这里将重点介绍贷款申请的示例,因为这是框架和分析的简单问题,但是我提出的观点可以推广到任何类型的偏见和受保护的类别。

I would like to point out first that I strongly doubt that anybody at Apple or Goldman Sachs has sat down and created an explicit set of rules that take gender into account for loan decisions.

首先,我要指出的是,我强烈怀疑苹果公司或高盛公司的任何人坐下来制定了一套明确的规则,将性别因素纳入贷款决策考虑之中。

Let us, first of all, imagine that we are creating a machine learning model that predicts the probability of a person defaulting on a loan. There are a number of ‘protected categories’, such as gender, which we are not allowed to discriminate on.

首先,让我们假设我们正在创建一个机器学习模型,该模型可以预测某人拖欠贷款的可能性。 有很多“受保护的类别”(例如性别)是不允许我们区分的。

Developing and training a loan decision AI is that kind of ‘vanilla’ data science problem that routinely pops up on Kaggle (a website that lets you participate in data science competitions) and which aspiring data scientists can expect to be asked about in job interviews. The recipe to make a robot loan officer is as follows:

开发和培训贷款决策AI是Kaggle(可让您参加数据科学竞赛的网站)上经常弹出的“香草”数据科学问题,有抱负的数据科学家可能会在面试中被问到。 制作机器人信贷员的方法如下:

Imagine you have a large table of 10 thousand rows, all about loan applicants that your bank has seen in the past:

假设您有一张大表,该表有1万行,所有这些都是您银行过去看到的贷款申请者:

Image for post
An example table of data about potential loan applicants.
有关潜在贷款申请人的数据示例表。

The final column is what we want to predict.

最后一列是我们要预测的。

You would take this data and split the rows into three groups, called the training set, the validation set, and the test set.

您将获取这些数据并将行分为三组,分别是训练集,验证集和测试集。

You then pick a machine learning algorithm, such as Linear Regression, Random Forest, or Neural Networks, and let it ‘learn’ from the training rows without letting it see the validation rows. You then test it on the validation set. You rinse and repeat for different algorithms, tweaking the algorithms each time, and the model you will eventually deploy is the one that scored the highest on your validation rows.

然后,您选择一种机器学习算法,例如线性回归,随机森林或神经网络,然后让它从训练行中“学习”,而不会让其看到验证行。 然后,您在验证集上对其进行测试。 您冲洗并重复使用不同的算法,每次都对算法进行调整,最终将要部署的模型是在验证行中得分最高的模型。

When you have finished you are allowed to test your model on the test dataset and check its performance.

完成后,您可以在测试数据集上测试模型并检查其性能。

删除列并期望AI消失的谬误 (The fallacy of removing a column and expecting bias to disappear from the AI)

Now obviously if the ‘gender’ column was present in the training data, then there is a risk of building a biased model.

现在很明显,如果训练数据中存在“性别”列,则存在建立偏差模型的风险。

However, the Apple/Goldman data scientists probably removed that column from their dataset at the outset.

但是,Apple / Goldman数据科学家可能在一开始就从其数据集中删除了该列。

So how can the digital money lender still be gender-biased? Surely there’s no way for our algorithm to be sexist, right? After all, it doesn’t even know an applicant’s gender!

那么,数字货币放贷者又如何能保持性别偏见呢? 当然,我们的算法无法做到性别歧视,对吗? 毕竟,它甚至不知道申请人的性别!

Unfortunately and counter-intuitively, it is still possible for bias to creep in!

不幸的是,与直觉相反,偏差仍然有可能蔓延!

There might be information in our dataset that is a proxy for gender. For example: tenure in current job, salary and especially job title could all correlate with our applicant being male or female.

我们的数据集中可能存在可以替代性别的信息。 例如:当前工作的任期,薪水,尤其是职务都可能与我们的申请人是男性还是女性有关。

If it’s possible to train a machine learning model on your sanitised dataset to predict the gender with any degree of accuracy, then you are running the risk of your model accidentally being gender-biased. Your loan prediction model could learn to use the implicit hints about gender in the dataset, even if it can’t see the gender itself.

如果有可能在经过清理的数据集上训练机器学习模型,以任何准确度预测性别,那么您将冒着因性别造成偏见的风险。 您的贷款预测模型可以学习使用数据集中有关性别的隐式提示,即使它看不到性别本身。

公正的AI宣言 (A manifesto for unbiased AI)

I would like to propose an addition to the workflow of AI development: we should attack our AI from different angles, attempting to discover any possible bias, before deploying it.

我想对AI开发的工作流程提出一个补充:我们应该从不同的角度来攻击我们的AI,在部署它之前尝试发现任何可能的偏差。

It’s not enough just to remove the protected categories from your dataset, dust off your hands and think ‘job done’.

仅从数据集中删除受保护的类别,尘土飞扬并认为“工作完成”还不够。

AI偏差笔测 (AI bias pen-test)

We also need to play devil’s advocate when we develop an AI, and instead of just attempting to remove causes of bias, we should attempt to prove the presence of bias.

开发AI时,我们还需要扮演魔鬼的拥护者,而不仅仅是尝试消除偏见的原因,我们还应尝试证明偏见的存在。

If you are familiar with the field of cybersecurity, then you will have heard of the concept of a pen-test or penetration test. A person who was not involved in developing your system, perhaps an external consultant, attempts to hack your system to discover vulnerabilities.

如果您熟悉网络安全领域,那么您将听说过笔测试或渗透测试的概念。 一个没有参与开发系统的人(也许是外部顾问)试图入侵您的系统以发现漏洞。

I propose that we should introduce AI pen-tests: an analogy to the pen-test for uncovering and eliminating AI bias:

我建议我们应该引入AI笔测:类似于笔测的发现和消除AI偏见:

AI笔测将涉及什么 (What an AI pen-test would involve)

To pen-test an AI for bias, either an external person, or an internal data scientist who was not involved in the algorithm development, would attempt to build a predictive model to reconstruct the removed protected categories.

为了对AI进行偏差测试,外部人员或不参与算法开发的内部数据科学家都将尝试建立一个预测模型来重建移除的受保护类别。

So returning to the loan example, if you have scrubbed out the gender from your dataset, the pen-tester would try his or her hardest to make a predictive model to put it back. Perhaps you should pay them a bonus if they manage to reconstruct the gender with any degree of accuracy, reflecting the money you would otherwise have spent on damage control, had you unwittingly shipped a sexist loan prediction model.

因此,回到贷款示例,如果您从数据集中清除了性别,则笔测试人员将尽最大努力创建一个预测模型以将其放回去。 如果他们设法以某种程度的准确性重建性别,那么也许您应该向他们支付奖金,这反映出如果您不知不觉地推出了性别歧视贷款预测模型,您本来会花费在损害控制上的钱。

进一步的AI偏压力测试 (Further AI bias stress tests)

In addition to the pen-test above, I suggest the following further checks:

除了上面的笔测验外,我还建议进行以下进一步检查:

  • Segment the data into genders.

    将数据细分为性别。
  • Evaluate the accuracy of the model for each gender.

    评估每种性别模型的准确性。
  • Identify any tendency to over and underestimate probability of default for either gender.

    找出任何可能高估或低估两种性别违约概率的趋势。
  • Identify any difference in model accuracy by gender.

    根据性别识别模型准确性的任何差异。

进一步措施 (Further measures)

I have not covered some of the more obvious causes of AI bias. For example it is possible that the training data itself is biased. This is highly likely in the case of some of the algorithms used in the criminal justice system.

我还没有介绍AI偏见的一些更明显原因。 例如,训练数据本身可能有偏差。 在刑事司法系统中使用某些算法的情况下,这很有可能。

如果发现偏见怎么办? (What to do if you have discovered a bias?)

Let’s assume that you have discovered that the algorithm you have trained does indeed exhibit a bias for a protected category such as gender. Your options to mitigate this are:

假设您已经发现,您训练过的算法确实确实表现出对诸如性别之类的受保护类别的偏见。 您可以选择的缓解方法是:

If the pen-test showed that another input parameter, such as job title, is serving as a proxy for gender, you can remove it, or attempt to obfuscate the gender-related aspects of it or sanitise the data further until the pen-tester is unable to reconstruct the gender you can reverse engineer the result of the pen-test to artificially morph your training data, until the gender is no longer discoverable. you can manually correct the inner workings of your model to compensate for the bias you can check your training table for bias. If your AI is learning from biased data then we cannot expect it to be unbiased. if your predictions are less accurate for females than for males, it’s likely that you have e.g. more training data for men than for women. In these cases you can use data augmentation: you duplicate every female entry in your data until your training dataset is balanced. you can also go out of your way to collect extra training data for underrepresented categories. you can try to make your model explainable and identify where the bias is creeping in. If you are interested in going into more detail about machine learning explainability, I invite you to also read my earlier post about explainable AI.

如果笔式测试表明另一个输入参数(例如职称)充当性别的代理,则可以将其删除,或尝试混淆与性别有关的方面或进一步净化数据,直到笔测器如果无法重建性别,则可以对笔测结果进行逆向工程以人为改变您的训练数据,直到无法再发现性别为止。 您可以手动校正模型的内部运作以补偿偏差,您可以检查训练表是否存在偏差。 如果您的AI正在从有偏见的数据中学习,那么我们不能指望它是无偏见的。 如果您的预测对女性的准确性不如对男性的预测,那么您可能拥有比男性更多的训练数据。 在这些情况下,您可以使用数据扩充:您复制数据中的每个女性条目,直到训练数据集达到平衡。 您也可以尽力为不足的类别收集额外的培训数据。 您可以尝试使模型易于解释,并确定偏差在何处蔓延。如果您有兴趣了解有关机器学习可解释性的更多详细信息,我也请您阅读我之前有关可解释AI的文章 。

撇开……招聘方面的偏见? (An aside… bias in recruitment?)

One application of this approach that I would be interested in investigating further, is how to eliminate bias if you are using machine learning for recruitment. Imagine you have an algorithm matching CVs to jobs. If it inadvertently spots gaps in people’s CVs that correspond to maternity leave and therefore gender, we run the risk of a discriminatory AI. I imagine this could be compensated for by some of the above suggestions, such as tweaking the training data and artificially removing this kind of signal. I think that the pen-test would be a powerful tool for this challenge.

我想进一步研究这种方法的一个应用是,如果您正在使用机器学习进行招聘,那么如何消除偏差。 假设您有一个将简历与工作匹配的算法。 如果它在不经意间发现人们的简历中与产假以及性别相对应的差距,我们将面临歧视性AI的风险。 我想可以通过上面的一些建议来弥补这一点,例如调整训练数据并人为地消除这种信号。 我认为,笔测将是应对这一挑战的有力工具。

公司如何避免重新出现偏见? (How can companies avoid bias re-appearing?)

Today large companies are very much aware of the potential for bad PR to go viral. So if the Apple Card algorithm is indeed biased I am surprised that nobody checked the algorithm more thoroughly before shipping it.

如今,大公司已经非常意识到不良公关传播的可能性。 因此,如果Apple Card算法确实存在偏见,令我感到惊讶的是,没有人在发货之前对其进行了更彻底的检查。

A loan limit differing by a factor of 10 depending on gender is an egregious error.

根据性别不同,贷款限额相差十倍是一个巨大的错误。

Had the data scientists involved in the loan algorithm, or indeed the recidivism prediction algorithm used by the state of Wisconsin, followed my checklist above for pen-testing and stress testing their algorithms, I imagine they would have spotted the PR disaster before it had a chance to make headlines.

如果数据科学家参与了放款算法,或者确实是威斯康星州使用的累犯预测算法,则按照我上面的清单进行笔测试和压力测试他们的算法,我想他们会在PR灾难发生之前发现它成为头条新闻的机会。

Of course, it is easy to point fingers after the fact, and the field of data science in big industry is as yet in its infancy. Some would call it a Wild West of under-regulation.

当然,事实很容易指责,而且大行业中的数据科学领域还处于起步阶段。 有人会称其为监管不严的狂野西部。

I think we can also be glad that some conservative industries such as healthcare have not yet adopted AI for important decisions. Imagine the fallout if a melanoma-analysing algorithm, or amniocentesis decision-making model, turned out to have a racial bias.

我想我们也可以为一些保守的行业(例如医疗保健)尚未采用AI做出重要决定而感到高兴。 想象一下,如果发现黑色素瘤分析算法或羊膜穿刺术决策模型会产生种族偏见,后果将是如此。

For this reason, I would strongly recommend that large companies releasing algorithms into the wild to take important decisions start to segregate out a team of data scientists whose job is not to develop algorithms, but to pen-test and stress test them.

出于这个原因,我强烈建议大公司疯狂地发布算法以做出重要决定,开始隔离一个数据科学家团队,他们的工作不是开发算法,而是对其进行笔测试和压力测试。

The data scientists developing the models are under too much time pressure to be able to do this themselves, and as the cybersecurity industry has discovered through years of experience, sometimes it is best to have an external person play devil’s advocate and try to break your system.

开发模型的数据科学家承受着太多的时间压力,无法自行执行此操作,并且由于网络安全行业已经通过多年的经验发现,有时最好让外部人员扮演魔鬼的拥护者并尝试破坏您的系统。

翻译自: https://towardsdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto-4b09974e8378

ai算法 测试


http://www.taodudu.cc/news/show-8075544.html

相关文章:

  • 35岁的软件测试工程师,月薪不足2W,辞职又怕找不到工作,该何去何从?
  • FlowMan设计实例一:请假申请流程
  • InfoPath无代码解决方案实战——请假申请
  • 拆解PowerApps - 请假申请 -3
  • 网络互联究竟是需要什么协议相同,什么协议不同?
  • 网络层(二)虚拟互连网络
  • Accp8.0HTML标签
  • accp6.0 使用java理解程序逻辑14 上机练习_ACCP学年 学期 册 使用Java理解程序逻辑...
  • accp6.0使用java理解程序逻辑_Accp6.0 - S1.使用Java理解程序逻辑(解析版)
  • accp c语言 pdf,C语言ACCP教程编程题参考答案(18页)-原创力文档
  • accp和java区别_ACCP8.0 第一学期java课程-关于类和对象
  • ACCP软件开发初级程序员(使用JAVA理解程序逻辑1~9章总结)
  • 安装tensorflow2
  • dom的cud操作
  • vue的CUD+表单验证
  • vue之ElementUI之CUD+表单验证
  • 开源数据2
  • Jetson TX2 安装tensorflow
  • 61.index CUD
  • 倒计时||亚太元宇宙新纪元峰会出席大厂抢先看
  • Unity+vuforia虚拟按键连接MQTT(EMQX)实现AR+IOT(Unity C#代码实现)(二)(虚拟按键的按键监听+发布MQTT信息)
  • 技术分享 | App常见bug解析
  • 3. 系统调用
  • 基于Linux操作系统
  • 【连载】从单片机到操作系统③——走进FreeRTOS
  • 从零开始的操作系统60分速成
  • linux各操作系统间的区别
  • 金秋十月 聚首金陵 操作系统技术论坛圆满举行
  • 如何从网页扒图片,寻找前端界面素材
  • Eagle for Mac图片素材管理工具
  • 这篇关于ai算法 测试_我们如何消除AI算法的偏见? 笔测试宣言的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



    http://www.chinasem.cn/article/292099

    相关文章

    Ubuntu系统怎么安装Warp? 新一代AI 终端神器安装使用方法

    《Ubuntu系统怎么安装Warp?新一代AI终端神器安装使用方法》Warp是一款使用Rust开发的现代化AI终端工具,该怎么再Ubuntu系统中安装使用呢?下面我们就来看看详细教程... Warp Terminal 是一款使用 Rust 开发的现代化「AI 终端」工具。最初它只支持 MACOS,但在 20

    Python中的随机森林算法与实战

    《Python中的随机森林算法与实战》本文详细介绍了随机森林算法,包括其原理、实现步骤、分类和回归案例,并讨论了其优点和缺点,通过面向对象编程实现了一个简单的随机森林模型,并应用于鸢尾花分类和波士顿房... 目录1、随机森林算法概述2、随机森林的原理3、实现步骤4、分类案例:使用随机森林预测鸢尾花品种4.1

    如何测试计算机的内存是否存在问题? 判断电脑内存故障的多种方法

    《如何测试计算机的内存是否存在问题?判断电脑内存故障的多种方法》内存是电脑中非常重要的组件之一,如果内存出现故障,可能会导致电脑出现各种问题,如蓝屏、死机、程序崩溃等,如何判断内存是否出现故障呢?下... 如果你的电脑是崩溃、冻结还是不稳定,那么它的内存可能有问题。要进行检查,你可以使用Windows 11

    Ilya-AI分享的他在OpenAI学习到的15个提示工程技巧

    Ilya(不是本人,claude AI)在社交媒体上分享了他在OpenAI学习到的15个Prompt撰写技巧。 以下是详细的内容: 提示精确化:在编写提示时,力求表达清晰准确。清楚地阐述任务需求和概念定义至关重要。例:不用"分析文本",而用"判断这段话的情感倾向:积极、消极还是中性"。 快速迭代:善于快速连续调整提示。熟练的提示工程师能够灵活地进行多轮优化。例:从"总结文章"到"用

    AI绘图怎么变现?想做点副业的小白必看!

    在科技飞速发展的今天,AI绘图作为一种新兴技术,不仅改变了艺术创作的方式,也为创作者提供了多种变现途径。本文将详细探讨几种常见的AI绘图变现方式,帮助创作者更好地利用这一技术实现经济收益。 更多实操教程和AI绘画工具,可以扫描下方,免费获取 定制服务:个性化的创意商机 个性化定制 AI绘图技术能够根据用户需求生成个性化的头像、壁纸、插画等作品。例如,姓氏头像在电商平台上非常受欢迎,

    不懂推荐算法也能设计推荐系统

    本文以商业化应用推荐为例,告诉我们不懂推荐算法的产品,也能从产品侧出发, 设计出一款不错的推荐系统。 相信很多新手产品,看到算法二字,多是懵圈的。 什么排序算法、最短路径等都是相对传统的算法(注:传统是指科班出身的产品都会接触过)。但对于推荐算法,多数产品对着网上搜到的资源,都会无从下手。特别当某些推荐算法 和 “AI”扯上关系后,更是加大了理解的难度。 但,不了解推荐算法,就无法做推荐系

    性能测试介绍

    性能测试是一种测试方法,旨在评估系统、应用程序或组件在现实场景中的性能表现和可靠性。它通常用于衡量系统在不同负载条件下的响应时间、吞吐量、资源利用率、稳定性和可扩展性等关键指标。 为什么要进行性能测试 通过性能测试,可以确定系统是否能够满足预期的性能要求,找出性能瓶颈和潜在的问题,并进行优化和调整。 发现性能瓶颈:性能测试可以帮助发现系统的性能瓶颈,即系统在高负载或高并发情况下可能出现的问题

    从去中心化到智能化:Web3如何与AI共同塑造数字生态

    在数字时代的演进中,Web3和人工智能(AI)正成为塑造未来互联网的两大核心力量。Web3的去中心化理念与AI的智能化技术,正相互交织,共同推动数字生态的变革。本文将探讨Web3与AI的融合如何改变数字世界,并展望这一新兴组合如何重塑我们的在线体验。 Web3的去中心化愿景 Web3代表了互联网的第三代发展,它基于去中心化的区块链技术,旨在创建一个开放、透明且用户主导的数字生态。不同于传统

    字节面试 | 如何测试RocketMQ、RocketMQ?

    字节面试:RocketMQ是怎么测试的呢? 答: 首先保证消息的消费正确、设计逆向用例,在验证消息内容为空等情况时的消费正确性; 推送大批量MQ,通过Admin控制台查看MQ消费的情况,是否出现消费假死、TPS是否正常等等问题。(上述都是临场发挥,但是RocketMQ真正的测试点,还真的需要探讨) 01 先了解RocketMQ 作为测试也是要简单了解RocketMQ。简单来说,就是一个分

    康拓展开(hash算法中会用到)

    康拓展开是一个全排列到一个自然数的双射(也就是某个全排列与某个自然数一一对应) 公式: X=a[n]*(n-1)!+a[n-1]*(n-2)!+...+a[i]*(i-1)!+...+a[1]*0! 其中,a[i]为整数,并且0<=a[i]<i,1<=i<=n。(a[i]在不同应用中的含义不同); 典型应用: 计算当前排列在所有由小到大全排列中的顺序,也就是说求当前排列是第