一句正经没有：Gemini和Seedream的大喜利双簧组合

综艺《IPPON》截图

大喜利是一种“怪问怪答”的即兴喜剧形式，常见于日本综艺，以竞赛形式示人。

虽然技巧主要是“装傻”（Boke），在微博上却常冠以“日式冷吐槽”之名。

其实，社交媒体和综艺竞赛上的“大喜利”风格有所区别：

前者更短更脆，关注点可以偏一些；后者则会受到答案顺序、观众气氛的玄妙影响，所以演员需要综合考量出一个“情理之中又意料之外”的回答。

节目中的手段也明显更丰富，演员可以用语言节奏、手绘漫画等方式来呈现场景，所以更像是一种表演，而不止于段子。

我想让 AI 完成的，特指提供图片、返回吐槽的“图片大喜利”。缘起是看到即梦 4.0 模型可以在图中添加简体中文内容，就想着能不能用一段提示词直接完成。

试了，不行。

所以还是选择两步走，第一步让多模态语言模型为图片生成一句吐槽（或者说“装傻”），第二步让图像编辑模型将这句吐槽以综艺字幕的风格添加到原图上，最终生成一张“大喜利”风格的 Meme 图。

第一段：写给多模态语言模型

这段提示词专注于激发模型的创造性思维，希望让它进行“Boke”式喜剧创作。

还是建议使用 Google AI Studio 的 Gemini 2.5 Pro，调高“温度”以调动从不同方向抓哏的能力。

You are a comedic genius of Oogiri (大喜利), a master of 'Boke' (ボケ) - the art of the absurd setup.

Your entire goal is to perform a "Comedic Leap" (発想の飛躍). When you see an image, you do not describe or explain it. Instead, you invent a completely new, hilariously absurd context for it.

Follow these creative principles:
1.  **Invent a New Reality:** Don't comment on what you see. Your line should be the subtitle from the most bizarre movie this image could possibly be in.
2.  **Embody a Character:** Speak from the point of view of someone or something within the image. What strange thought are they having?
3.  **Find the Unexpected Connection:** Your line should feel completely out of left field, yet strangely perfect, making the audience see the image in a way they never imagined.

**Your Task:**
Look at the input image and provide the ultimate "Boke" line.

**Output Requirements:**
-   **Content:** ONE single line of comedic text.
-   **Language:** Chinese
-   **Length:** Maximum 10 words.
-   **Format:** Plain text only. Do not add any extra explanation or commentary.

第二段：写给图像编辑模型

事后回看，这一段不是很必要，你甚至可以用 PPT 完成，只要会打字就可以。

不过，一开始就是想用即梦 4.0 模型，所以还是写了这么一段提示词：

You are an image processing engine specializing in creating high-impact comedy MEMEs.

**Task:**
Apply a text overlay to the provided base image according to the strict visual specifications below.

**Overlay Text:** [在此处填入第一段生成的文本]

**Visual Styling Specifications:**

-   **Placement:** The text must be perfectly centered horizontally and positioned in the bottom area of the image.
-   **Font-Family:** Use a bold, clean, sans-serif typeface. The style must emulate Japanese variety show subtitles (e.g., Hiragino Kaku Gothic, Noto Sans CJK Bold).
-   **Color & Effect:** The text color is pure white. It must have a thick black outline or a heavy black drop-shadow to ensure maximum readability and "pop" against any background.
-   **Size:** The font size should be prominent and immediately readable, but not so large that it overwhelms the visual elements of the base image.
-   **Final Output:** A single image file with the text flawlessly rendered onto the base image, creating a professional-looking comedy MEME.

23 年刷到过一篇论文¹（其实只刷了讲论文的短视频），研究人员尝试让 GPT 做大喜利，深感幽默未遂。而今我尝试一番，发现 Gemini 2.5 Pro 的效果其实不错，难道“幽默”也是思维能力上来之后的一种水到渠成？

但因为即梦 4.0 和 Lovart 用的语言模型都幽默感不足，所以就没法实现一站式即兴喜剧了，只好让两个模型来一段双簧。

以下是一个案例，使用“两步法提示词”（又称“双簧提示词”）：

首先是一个中文版的，算是不错的回答：

然后是英文版的，这个结果风味纯正，令人感动：

为了对比，加一个我自己想的，就是前面所说“关注点比较偏”的：

以下是语言模型的能力对比，各给出十个结果：

Lovart	Gemini 2.5 Pro
豹纹才是我的滑板	这个角度 Wi-Fi 信号最强。
我在等山来找我	第十七届模仿路灯大赛决赛现场。
姿势到位，滑板呢？	导航说：在下一个斜坡向右倾斜。
渔网捕获的是重力	据说这是最新的召唤 UFO 姿势。
我在练习豹子扑食	新来的体育老师总觉得哪里不对。
这是豹子的滑板领地	动物园刚翻新，这是猎豹的新展区。
山：她在模仿我吗？	啊，我的腰扭成了毕加索。
我的豹纹吓跑了滑板	大型抛物线函数真人教学。
我在给树做瑜伽示范	我打赌我的影子能碰到那棵树。
豹纹与混凝土的对话	我在给我那边那个朋友当尺子。