Artificial Intelligence in Evaluation, Validation, Testing and Certification

本文主要是介绍Artificial Intelligence in Evaluation, Validation, Testing and Certification,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Everybody seems to jump on the AI bandwagon these days, “enhancing” their products and services with “AI.” It sounds, however, a bit like the IoT hype from the last decade when your coffee machine desperately needed Internet access. This time, though, there’s also some Armageddon undertone, claiming that AI would make our jobs obsolete and completely transform all sorts of businesses, including ours.So, it comes as no surprise that atsec gets asked by customers, government agencies, and almost  everybody communicating with us how we position ourselves on the use of AI in our work and how we deal with AI being used in the IT security environment of our customers and in all sorts of other areas as well.


First answer: Unfortunately, we don’t yet use it for authoring blog entries, so musing about the benefits and drawbacks of AI in our work still can ruin your weekend. 🙁

Second answer: For an excellent overview of how we deal with AI and what we expect from this technology, there is a brilliant interview with Rasma Araby, Managing Director  of atsec AB Sweden:

https://www.atsec.cn/downloads/blog/Does_AI_pass_the_EU_cybersecurity_test.mp4

Of course, AI is discussed within atsec frequently, as we are a tech company by nature. We analyze IT technologies for impacts on IT security and are eager to deploy new technologies for ourselves or introduce them to our customers if we believe they will be beneficial.

atsec’s AI policy foundation

Recently, we defined some basic policies on the use of AI within atsec. Those policies have two cornerstones:

First and foremost, we are committed to protecting  all sensitive information we deal with, especially any information entrusted to us by our customers. We will not share such information and data with third parties and thus will not supply any such information in publicly available AI tools.


There are several reasons for this: Obviously, we would violate our NDAs with our customers if we send their information to a public server. Also, there is currently no robust way to establish trust in these tools, and nobody could tell you how such information would be dealt with. So, we must assume that we would push that information directly into the public domain. Even if we tried to “sanitize” some of the information, I would be skeptical that an AI engine would not be able to determine which customer and product our chat was about. The only way to find out would be to risk disaster, and we’re not in for that. Furthermore, sanitizing the information would probably require more effort than writing up the information ourselves.

The second cornerstone is not different from our use of any other technology: any technology is only a tool supporting  our work. It won’t take any responsibility for the results.


We are using many tools to help our work, for example, to help us author our evaluation reports and to keep track of our work results, evidence database, etc.  Such tools could be marketed easily as AI, but as the saying goes: “A fool with a tool is still a fool.” Our evaluators take responsibility for their work products, and our quality assurance will not accept errors being blamed on a tool. Tools are always treated with a good dose of mistrust. We always have humans to verify that our reports are correct and to assume responsibility for their contents. This will not be different with an  AI tool. At atsec, our evaluators and testers  will always be in ultimate control of our work.


With this framework, we are in a good position to embrace AI tools where they make sense and do not violate our policies. We are aware that we cannot completely avoid AI anyway, for example, when it “creeps” into standard software tools like word processors. AI-based tools helping our techies to re-phrase their texts for readability and better understanding might sometimes  be an improvement cherished by our customers. 😀


We expect AI tools to help, for example, with code reviews and defining meaningful penetration tests in the foreseeable future . However, we currently do not encounter such tools that could be run in controlled, isolated environments to fulfill our AI policy requirements.

Correctness of AI, trust in AI engines

As already stated, we do not treat current AI engines as trusted tools we can blindly rely upon. This is based on the fact that the “intelligence” displayed in the communication by these engines comes mostly from their vast input, which is absorbed into a massive network with billions, even trillions of nodes. Most of the large language models used in the popular AI engines are fed by the Common Crawl database of Internet contents (refined into Google’s Colossal Clean Crawled Corpus), which increases by about 20 terabytes per month. This implies that input for the training of the engines cannot be fully curated (i.e., fact-checked) by humans, and it leaves lots of loopholes to inject disinformation into the models. I  guess that every troll farm on the planet is busy doing exactly that.

The developers of these AI engines try to fight this, but filtering out documents containing “dirty naughty obscene and otherwise bad words” won’t do the trick. If your favorite AI engine doesn’t have quotes from Leslie Nielsen’s “The Naked Gun” handy, that’s probably why. Checking the AI’s “Ground Truths” against Wikipedia has its shortcomings, too.


Therefore, the AI engine companies use different benchmarks to test the AI engine output, with many of those outputs checked by humans. However, the work conditions of those “clickworkers” are often at a sweatshop level, which does not help to establish our trust in the accuracy and truthfulness of the results.

Therefore, if atsec would use such engines in its core business of assessing IT products and technology, we would not be able to put a reasonable amount of trust in the output obtained from these engines and it would require us to fact-check each statement made by the AI. This might easily result in more effort than writing the reports ourselves and trusting our own judgment.


Note that the accuracy of AI answers being between 60 and 80 percent depending on the subject tested in the benchmarks, together with the problems of poisoning the input, how to establish “truthfulness” of the AI, and ethical and philosophical questions about which information to provide are topics in the EU and US efforts to regulate and possibly certify AI engines.

Unfortunately, while the problems are well known, their solutions are mostly not. AI researchers across the globe are busily working on those subjects, but my guess is that those issues may be intrinsic to today’s large language models and cannot be solved in the near future.

Offensive AI

A common Armageddon scenario pushed by AI skeptics is that big AI engines like the ones from OpenAI, Microsoft, Google, Meta, and others will help the evil guys  find vulnerabilities and mount attacks against IT infrastructures much easier than ever. After almost 40 years in IT security, that doesn’t scare me anymore. IT security has been an arms race between the good and bad guys from the very beginning, with the bad guys having an advantage as they only need to find one hole in a product, while the good guys have the task of plugging all holes.

As history teaches us, the tools used by the bad guys can and will be used by the good guys too. Tools searching for flaws have been used by hackers and developers alike, although developers were at times more reluctant to adopt them. AI will be no different, and maybe it will help developers to write more robust code, for example, by taking on the tedious tasks of thorough input and error checking, which are still among the most prominent causes of software flaws. Will atsec deploy those tools as well for their evaluations and testing? While we will certainly familiarize ourselves with those tools and might add them to our arsenal, it will be much more beneficial for developers to integrate those tools in their development and test processes,  subjecting all of their code to that scrutiny as soon as the code is written or modified, rather than having a lab like atsec deploying those tools when the product may already be in use by customers.

We have always advocated, in standards bodies and other organizations creating security criteria, that the search for flaws should be conducted within the developer’s processes and that the lab should verify that these searches for flaws and vulnerabilities are performed effectively in the development environment. This is also true for AI tools.

Summary

The hype about AI tools that started with the public availability of ChatGPT less than a year ago has already reached its “Peak of Inflated Expectations” (according to Gartner’s “hype cycle” model) and is on its way to the “Trough of Disillusionment.” The yet-to-come “Slope of Enlightenment” will lead to the “Plateau of Productivity,” when we finally have robust AI tools at our disposal, hopefully, combined with a certification that provides sufficient trust for their efficient deployment. In any case, atsec will monitor the development closely and offer to participate in the standardization and certification efforts. AI will become an integral part of our lives, and atsec is committed to helping make this experience as secure as possible.

这篇关于Artificial Intelligence in Evaluation, Validation, Testing and Certification的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/701531

相关文章

spring 参数校验Validation示例详解

《spring参数校验Validation示例详解》Spring提供了Validation工具类来实现对客户端传来的请求参数的有效校验,本文给大家介绍spring参数校验Validation示例详... 目录前言一、Validation常见的校验注解二、Validation的简单应用三、分组校验四、自定义校

spring数据校验Validation

文章目录 需要的依赖创建校验对象Validator 需要的依赖 <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-validation</artifactId></dependency> 创建校验对象Validator 测试的实体类 //创建实体

AI基础 L1 Introduction to Artificial Intelligence

什么是AI Chinese Room Thought Experiment 关于“强人工智能”的观点,即认为只要一个系统在行为上表现得像有意识,那么它就真的具有理解能力。  实验内容如下: 假设有一个不懂中文的英语说话者被关在一个房间里。房间里有一本用英文写的中文使用手册,可以指导他如何处理中文符号。当外面的中文母语者通过一个小窗口传递给房间里的人一些用中文写的问题时,房间里的人能够依

liferay集成jQuery Validation Engine 表单验证及ajax的运用

jQuery Validation Engine是一款基于Jquery的js表单验证插件。相对于之前的传统表单验证工具,其优点是自定义验证内容更广泛以及与AJAX的方便整合。适用于日常的 E-mail、电话号码、网址等验证等及 Ajax 验证,除自身拥有丰富的验证规则外,还可以添加自定义的验证规则。兼容 IE 6+, Chrome,Firefox,,Safari, Opera 10+ 个人感觉

android-Intent,Injector,Template,Adapter,Validation,Gesture,Game,Game Engine,Bluetooth...

Intent Intent PhotoPicker 图片选择 & 图片预览https://github.com/donglua/PhotoPicker Injector AndroidAnnotations Fast Android Development. Easy maintainance. https://github.com/excilys/androidannotations

具身智能(Embodied Intelligence)概述

目录 一、引言  二、具身感知 三、具身交互 四、具身智能体   五、虚拟到现实  一、引言  最近无论是斯坦福机器人炒虾,还是特斯拉官宣机器人进厂,都赚足了眼球,实力证明了具身智能(Embodied Intelligence)的火爆。 先不说具身智能是实现AGI的关键环节,也是未来研究的重要方向,从发论文的角度来看,今年的各大顶会,比如CVPR,具身智能就排了热门研究领域

三种评估金融风险的方法的具体Python实现:Stress Testing、Scenario Analysis和Sensitivity Analysis

Stress Testing、Scenario Analysis和Sensitivity Analysis是金融领域中用于评估风险和确定模型或系统在极端条件下表现的三种分析方法。 1. Stress Testing(压力测试):    - 压力测试是一种评估金融模型、投资组合或金融机构在极端市场条件下表现的方法。    - 它通常用于识别潜在的风险点,确保在市场压力下,资产或机构能够维持其功能。

Certum Domain Validation CA SHA2

Certum是波兰的一家数字证书厂家,该机构也是目前世界第四家兼容性在99%机构(包括历史版本浏览器),目前在国内有授权提供商:Gworg提供签发和认证,拥有二级代理划分,适合长期做SSL证书业务或者集成提供商合作。 Certum在国内贴牌做交叉根证书,所以国内很多SSL证书提供商有合作,甚至一些大的公司都用他们做贴牌,比如:广东CA、上海CA、北京等一些公司。 Certum好处是对于OV、E

MS COCO数据集目标检测评估(Detection Evaluation)

MS COCO (Microsoft Common Objects in Context) 是一个广泛应用于计算机视觉领域的数据集和评估平台,尤其是在目标检测、分割和人体关键点检测等任务中。COCO数据集和其评估方法被广泛用于学术研究和工业应用。以下是对MS COCO数据集目标检测评估、人体关键点评估、输出数据的结果格式以及如何参加比赛的详细阐述和总结。 1. MS COCO数据集目标检测评估(

【从Qwen2,Apple Intelligence Foundation,Gemma 2,Llama 3.1看大模型的性能提升之路】

从早期的 GPT 模型到如今复杂的开放式 LLM,大型语言模型 (LLM) 的发展已经取得了长足的进步。最初,LLM 训练过程仅侧重于预训练,但后来扩展到包括预训练和后训练。后训练通常包括监督指令微调和校准,这是由 ChatGPT 推广的。 自 ChatGPT 首次发布以来,训练方法已不断发展。在本文中,我回顾了训练前和训练后方法的最新进展,特别是最近几个月取得的进展。 概述 LLM 开发和培