AI内容检测有效性究竟如何？

濺龜 · 发表于昨天 21:42

我使用知网以及GPTZero对我筹备好的几个文章分袂作了检测，得出的成果差强人意。1.全GPT撰写的文章AI检测成果是70%-90%；2.人机混合内容检测成果是49%；3.全人工撰写也是49%。 AI 内容检测到底可不成靠？

_空鬼_ · 发表于昨天 21:42

Use AI to Detect AI-Generated Text (9) Results (Testbed5)

特别鸣谢（Special Thanks）：在读论文的过程中，有几小点疑惑，所以当时请教了论文作者。非常非常感谢作者耐心又细致的答疑解惑，受益匪浅。While reading the paper, we had a few small questions, therefore we consulted the author. We are extremely grateful for the author&#39;s patient and detailed explanations, which were very enlightening.

如果需要这一系列或者其他文章的PPT可编辑源文件（免费），微信公众号私信发送“获取”即可。To request the editable Slides (free) of this or other articles, send a private message with &#34;Slides&#34;.

本文微信公众号版本：

用AI察觉AI生成的文本（9）检测效果到底好不好呢？(Testbed5)
目录(Table of Contents)：

<hr/>检测效果到底好不好呢？（Results）

在前几篇中，我们介绍了Detector在testbed1~4这4种场景下的表现会如何。看过前面文章的小伙伴们可能会得到这样一个印象：“我总感觉testbed1~4有一点放水的意思”。
In the previous articles, we discussed how the Detector performs in four different scenarios: testbed1 to testbed4. Readers of the earlier articles might have gotten the impression that &#34;testbed1 to testbed4 seem intentionally easy.&#34;
为什么会造成这种印象呢？因为在前4个场景下，我们确实“放水”了。Detector在测试阶段面对的文本所属的领域（Domain），或者它面对的某些AI模型（Model）（这些Model用于提供的AI生成的文本），在训练阶段都有过接触。所以，在测试阶段，Detector面对来自熟悉的领域的文本，或者熟悉的AI模型生成的文本的时候，就表现的不会差。
Why does it give this impression? Because in the first four scenarios, we did indeed &#34;go easy.&#34; During the testing phase, the Detector encountered texts from domains it had seen before, or texts generated by AI models it had been trained on. Therefore, when faced with texts from familiar domains or generated by familiar AI models during the testing phase, the Detector performed well.

那如果我们对Detector的测试更严格一些，结果会如何呢？这就是我们接下来几篇要看的内容了。
What if we test the Detector more rigorously? What would the results be? This is what we will explore in the next few posts.
Testbed5

快速回顾：在训练和测试阶段，Detector都会接触到来自Domain1~10的文本，但是在测试阶段Detector会面临陌生的AI模型生成的文本。
Quick Review: During both the training and testing phases, the Detector encounters texts from Domains 1 to 10. However, in the testing phase, the Detector will face texts generated by unfamiliar AI models.

不出所料的，Detector的性能再次下降了，毕竟这次Detector真的碰到了以前没有遇到过的情况（即陌生AI模型生成的文本）。
As expected, the Detector&#39;s performance dropped again, given that this time it truly encountered new situations (i.e., texts generated by unfamiliar AI models).

但是你也注意到了，即使这样，基于Longformer的Detector的效果还是非常不错的。这其实会给落地真实应用场景带来一些鼓舞，因为在现实中就是会不断的涌现出新的AI模型，这个结果告诉我们，虽然新的AI模型Detector没有接触过，但还是可以识别出来AI生成的文本啊。
But you also noticed that, even under these conditions, the Longformer-based Detector still performed quite well. This is actually encouraging for real-world applications, because new AI models are constantly emerging. This result tells us that, even though the Detector hasn&#39;t encountered these new AI models before, it can still identify AI-generated texts.

当然，在积极乐观中我们也要保持冷静和理智，虽然这个Detector的得分确实不错，但它应该有所长，亦有所短。在上图中我们可以看出，它最不擅长识破的就是来自OpenAI（s）生成的文本，并且识破率并不算高。
Of course, while we remain optimistic, we must also stay calm and rational. Although the Detector&#39;s score is indeed impressive, it has its strengths and weaknesses. As shown in the figure, it is least effective at detecting texts generated by OpenAI(s), and the detection rate is not very high.

这里的（s）是什么含义呢？“s”意味着在使用OpenAI的模型生成这个文本时，采用了特别设计（specified）的prompt。在这个prompt里面，我们特意指定了文本的风格（如下图所示）。如果你想了解更多在准备AI生成的文本时使用了什么类型的prompt可以参考论文。What does the (s) mean here? The &#34;s&#34; indicates that a specially designed (specified) prompt was used when generating the text with OpenAI&#39;s model. In this prompt, we specifically defined the style of the text (as shown in the figure). If you want to learn more about the types of prompts used to prepare AI-generated texts, you can refer to the paper.

总之，上面的结果意味着什么？这意味着如果有人用OpenAI的模型，并搭配特定的prompt来生成文本，很可能会骗过Detector的眼睛。这仍然是一个不小的挑战。
In summary, what do the above results mean? This means that if someone uses OpenAI&#39;s model with a specific prompt to generate text, it is likely to fool the Detector. This remains a significant challenge.
在后面，我们会看到Detector在更难的场景下表现如何。
We will see how the Detector performs in more challenging scenarios.
(未完待续, To be continued)

小提醒：在公众号菜单模式，选择“所有文章”可以查看最新的所有文章列表，选择“版权声明”查看如何在其他场合使用此文章的内容。If you like the slides for this series or any other articles, please follow my wechat publich account and leave me message &#34;Slides&#34;. I understand you may not have a wechat account. Leaving messages via Github also works. To check the completed list of all the published articles (In English), please visit https://createmomo.github.io/

西門冠希 · 发表于昨天 21:42

可以来试试，滴滴

零落枫伤 · 发表于昨天 21:43

生成式人工智能（Generative AI）在文本生成方面的能力目前已经很强大，如何区分AI生成的内容和人类创作的内容是相当有挑战的，特别是人机协同时鉴别更难了。
就最近一年的时间，由OpenAI的ChatGPT（特别是GPT4.0支持的）支持AI写作技术如取得显著进步，能够生成越来越自然、流畅、甚至难以与人类写作区分的文本内容。AI文本生成不仅被应用于自动化内容创作，如新闻报导、社交媒体帖子、甚至学术文章，也被用于辅助人类作者进行写作。
在学术领域，区分AI生成的内容和原创研究至关重要，以保持研究的真实性和可靠性。对于出版和媒体行业，区分人工和AI创作的内容有助于维护作者的版权和创作权。而信息真实性在新闻和社交媒体领域尤为重要，准确检测AI生成的内容可以帮助打击假新闻和误导性信息的传播。
但目前AI内容检测难度还是很大的。尽管存在多种AI内容检测工具，如GPTZero等，它们仍然面临着准确性和可靠性的挑战。不同的检测工具使用不同的方法和标准，这可能导致检测结果的差异。文本的类型、风格和领域可能影响检测的准确性。例如，技术或科学文本可能因其特定的语言风格而被误判为AI生成。特别是人机当文本是由人类和AI共同创作时，区分以及如何定那就更困难了。
如此棘手的问题，如何解决呢？期待随着AI技术进步，有新的AI检测工具来识别诸如AI写出来的内容特别八股文、缺乏情感、不够具体的，案例流于形式不落地等等问题。针对目前的情况，建议多种方法和工具来进交叉行检测，提高准确性，当然结合人类专家审查也是必要的。
尽管AI内容检测工具在某些方面显示出潜力，但它们在准确性和可靠性方面仍面临挑战。理解这些工具的局限性，多工具并用，并结合人类的判断和审查，是目前的最佳实践。

徐韵汉风 · 发表于昨天 21:43

我觉得AI内容检测说到底是一些策略（关键字、语式）+AI自动分析，纯AI撰写的文章比较好识别，不管是我们以前玩的“以鲁迅或者谁谁的口吻写一篇xxx的文章”，还是让AI自由发挥写的文章，语言形式还是有很大的共同之处；而人机混合的内容和全人工的内容，主要区别就是在人工写的那部分，所以大概率会识别成AI检测结果分值相同。

瓮中瓮 · 发表于昨天 21:44

很明确的告诉你，现在的GPT4写作水平已经不亚于普通本科生，如果你有能力写出更优质的prompt，他们输出的文字完全可以达到博士能力。
关键现在AI技术还在疯狂的往前发展，可以预想2024年AI将会给我们生活带来什么变化了

个人观点，仅供参考

数字时代，内容为王，AI持续产出高质量、引人入胜的文案

		自动登录	找回密码
密码			立即注册

AI内容检测有效性究竟如何？

本帖子中包含更多资源

本帖子中包含更多资源