ai人工智能面相测试_确定在AI原型中要测试的内容

本文主要是介绍ai人工智能面相测试_确定在AI原型中要测试的内容，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

ai人工智能面相测试

Deciding what to test is the first, and most important, step in defining an AI prototype. This decision shapes all other decisions in designing the prototype.

确定要测试的内容是定义AI原型的第一步，也是最重要的一步。该决策将影响设计原型时的所有其他决策。

Defining the hypothesis under test is important because prototypes are messy. And messy experiments give muddled results; hiding the relevant amongst the incidental.

定义被测假设很重要，因为原型很乱。 混乱的实验给出了混乱的结果。将相关的东西隐藏起来。

Prototypes are broad brush-stroked approximations of the final product. The learnings from a prototype can be game-changing, intriguing, and wholly surprising. But to learn from a prototype with confidence, the effect or insight will need to be large.

原型是最终产品的大致笔触近似值。从原型中学到的知识可以改变游戏规则，吸引人，并且完全令人惊讶。但是，要有信心从原型中学习，效果或洞察力将需要很大。

It is very easy to take a finding from a prototype and generalise it, only to later find that the learning was tied directly to some imperfection in the prototype itself. Minor differences between the prototype and end-product can and do impact the learnings. Details such as how fast an element loads, or being constrained to a few user journeys have a very real effect on how the user responds.

从原型中得出发现并进行概括非常容易，后来才发现学习与原型本身的某些缺陷直接相关。原型和最终产品之间的细微差异可以而且确实会影响学习。诸如元素加载的速度或受限于几次用户旅程之类的细节对用户的响应方式具有非常真实的影响。

With prototypes, we’re looking for big effects. Things that are obvious once our attention is drawn to them. Not optimisations. For optimisations, do this later in the design process and consider A/B or multi-variate testing on large user groups.

借助原型，我们正在寻找更大的效果。 一旦吸引我们注意，这些事情就显而易见了。没有优化。为了进行优化，请在设计过程的后期进行此操作，并考虑对大型用户组进行A / B或多变量测试。

With many elements under test, the feedback will be noisy. It is difficult to untangle the causes and effects of what our users tell and show us.

在测试许多元素的情况下，反馈会很嘈杂。很难弄清用户告诉我们并告诉我们的原因和结果。

The types of things we might want to test include:

我们可能要测试的事物类型包括：

The technical details

技术细节

The performance of the model.
模型的性能。
The speed of delivering the model results.
交付模型结果的速度。
The rate of feedback from a model and whether a user can visibly ‘teach’ the system.
来自模型的反馈率以及用户是否可以可视地“教”系统。

The interface

介面

How interactive is the AI feature.
AI功能的互动性。
Are there separate elements for the AI feature; how are these delineated from the rest of the system.
AI功能是否有单独的元素？这些与系统其余部分的区别

The messaging

消息传递

Explaining the AI algorithm; what it does and how it learns.
解释AI算法；它做什么以及如何学习。
Teaching the user how to make the product learn.
教用户如何使产品学习。
How numeric the model results are; how numerate is the user expected to be.
模型结果的数值如何；用户期望的数字。
Whether and how we communicate error messages.
我们是否以及如何传达错误消息。

Error correction

纠错

How to put fail-safes in place in case of error.
发生错误时如何设置故障保护。
How to determine if the model has broken down.
如何确定模型是否已损坏。
What we do when the model breaks down.
模型崩溃时我们该怎么做。
How to recover from catastrophic error.
如何从灾难性错误中恢复。

Image for post — simonoregan.com simonoregan.com测试AI原型

Separating these tests is important. For testing the user impact of technical details it is best to have arrived at a finalised design for the interface, messaging and error communication.

分开这些测试很重要。 为了测试用户对技术细节的影响，最好确定接口，消息传递和错误通信的最终设计。

Messaging is closely tied to the interface and error-handling and often won’t be tested alone. Instead, the interface and messaging or the error-handling and messaging will be tested in pairs.

消息传递与接口和错误处理紧密相关，通常不会单独进行测试。相反，将成对测试接口和消息传递或错误处理和消息传递。

The important thing to bear in mind is that we don’t want to be rapidly swapping these permutations in the hope that we’ll observe fine differences in user responses to help us determine the optimal combination. With small user groups the results will certainly not be statistically significant, nor usually generalisable and relevant.

要记住的重要一点是，我们不想Swift交换这些排列，希望我们会观察到用户响应中的细微差别以帮助我们确定最佳组合。对于较小的用户组，结果肯定不会具有统计意义，也通常不会具有概括性和相关性。

Instead, choose a configuration with clearly defined upfront assumptions and observe whether the user behaves as expected, and if not, why not.

相反，请选择具有明确定义的前期假设的配置，并观察用户的行为是否符合预期，如果不是，则为什么。

谢谢您阅读🙏🏻 (Thank you for Reading 🙏🏻)

This was originally published on simonoregan.com.

它最初在simonoregan.com上发布。

If you enjoyed this, you might like The Deployment Age — a weekly update of tools and musings that shine some light on the emerging technologies and trends of the 2020s.

如果喜欢这个，您可能会喜欢“部署时代” -每周更新工具和思路的最新信息，以期了解2020年代的新兴技术和趋势。