|
Use AI to Detect AI-Generated Text (9) Results (Testbed5)
特别鸣谢(Special Thanks):在读论文的过程中,有几小点疑惑,所以当时请教了论文作者。非常非常感谢作者耐心又细致的答疑解惑,受益匪浅。While reading the paper, we had a few small questions, therefore we consulted the author. We are extremely grateful for the author's patient and detailed explanations, which were very enlightening.
如果需要这一系列或者其他文章的PPT可编辑源文件(免费),微信公众号私信发送“获取”即可。To request the editable Slides (free) of this or other articles, send a private message with "Slides".
本文微信公众号版本: 用AI察觉AI生成的文本(9)检测效果到底好不好呢?(Testbed5)
目录(Table of Contents):
<hr/>检测效果到底好不好呢?(Results)
在前几篇中,我们介绍了Detector在testbed1~4这4种场景下的表现会如何。看过前面文章的小伙伴们可能会得到这样一个印象:“我总感觉testbed1~4有一点放水的意思”。
In the previous articles, we discussed how the Detector performs in four different scenarios: testbed1 to testbed4. Readers of the earlier articles might have gotten the impression that &#34;testbed1 to testbed4 seem intentionally easy.&#34;
为什么会造成这种印象呢?因为在前4个场景下,我们确实“放水”了。Detector在测试阶段面对的文本所属的领域(Domain),或者它面对的某些AI模型(Model)(这些Model用于提供的AI生成的文本),在训练阶段都有过接触。所以,在测试阶段,Detector面对来自熟悉的领域的文本,或者熟悉的AI模型生成的文本的时候,就表现的不会差。
Why does it give this impression? Because in the first four scenarios, we did indeed &#34;go easy.&#34; During the testing phase, the Detector encountered texts from domains it had seen before, or texts generated by AI models it had been trained on. Therefore, when faced with texts from familiar domains or generated by familiar AI models during the testing phase, the Detector performed well.
那如果我们对Detector的测试更严格一些,结果会如何呢?这就是我们接下来几篇要看的内容了。
What if we test the Detector more rigorously? What would the results be? This is what we will explore in the next few posts.
Testbed5
快速回顾:在训练和测试阶段,Detector都会接触到来自Domain1~10的文本,但是在测试阶段Detector会面临陌生的AI模型生成的文本。
Quick Review: During both the training and testing phases, the Detector encounters texts from Domains 1 to 10. However, in the testing phase, the Detector will face texts generated by unfamiliar AI models.
不出所料的,Detector的性能再次下降了,毕竟这次Detector真的碰到了以前没有遇到过的情况(即陌生AI模型生成的文本)。
As expected, the Detector&#39;s performance dropped again, given that this time it truly encountered new situations (i.e., texts generated by unfamiliar AI models).
但是你也注意到了,即使这样,基于Longformer的Detector的效果还是非常不错的。这其实会给落地真实应用场景带来一些鼓舞,因为在现实中就是会不断的涌现出新的AI模型,这个结果告诉我们,虽然新的AI模型Detector没有接触过,但还是可以识别出来AI生成的文本啊。
But you also noticed that, even under these conditions, the Longformer-based Detector still performed quite well. This is actually encouraging for real-world applications, because new AI models are constantly emerging. This result tells us that, even though the Detector hasn&#39;t encountered these new AI models before, it can still identify AI-generated texts.
当然,在积极乐观中我们也要保持冷静和理智,虽然这个Detector的得分确实不错,但它应该有所长,亦有所短。在上图中我们可以看出,它最不擅长识破的就是来自OpenAI(s)生成的文本,并且识破率并不算高。
Of course, while we remain optimistic, we must also stay calm and rational. Although the Detector&#39;s score is indeed impressive, it has its strengths and weaknesses. As shown in the figure, it is least effective at detecting texts generated by OpenAI(s), and the detection rate is not very high.
这里的(s)是什么含义呢?“s”意味着在使用OpenAI的模型生成这个文本时,采用了特别设计(specified)的prompt。在这个prompt里面,我们特意指定了文本的风格(如下图所示)。如果你想了解更多在准备AI生成的文本时使用了什么类型的prompt可以参考论文。What does the (s) mean here? The &#34;s&#34; indicates that a specially designed (specified) prompt was used when generating the text with OpenAI&#39;s model. In this prompt, we specifically defined the style of the text (as shown in the figure). If you want to learn more about the types of prompts used to prepare AI-generated texts, you can refer to the paper.
总之,上面的结果意味着什么?这意味着如果有人用OpenAI的模型,并搭配特定的prompt来生成文本,很可能会骗过Detector的眼睛。这仍然是一个不小的挑战。
In summary, what do the above results mean? This means that if someone uses OpenAI&#39;s model with a specific prompt to generate text, it is likely to fool the Detector. This remains a significant challenge.
在后面,我们会看到Detector在更难的场景下表现如何。
We will see how the Detector performs in more challenging scenarios.
(未完待续, To be continued)
小提醒:在公众号菜单模式,选择“所有文章”可以查看最新的所有文章列表,选择“版权声明”查看如何在其他场合使用此文章的内容。If you like the slides for this series or any other articles, please follow my wechat publich account and leave me message &#34;Slides&#34;. I understand you may not have a wechat account. Leaving messages via Github also works. To check the completed list of all the published articles (In English), please visit https://createmomo.github.io/
|
本帖子中包含更多资源
您需要 登录 才可以下载或查看,没有账号?立即注册
×
|