BeautifulPromptPAI推出自研Prompt美化器，赋能AIGC一键出美图

阿里云大数据AI技术

2024-07-13 帮助2人

背景

Stable Diffusion（SD）是一种流行的AI生成内容（AI Generated Content，AIGC）模型，能在文字输入的基础上生成各种风格多样的图像。在目前的AIGC方向，SD是开源社区最热门的模型。然而，SD能够生成高颜值的图像，非常依赖于用户提供的Prompt。如果没有好的Prompt，SD往往无法生成用户预期的图像，极大的影响用户的使用体验。在先前的工作中，阿里云机器学习PAI团队在AIGC方向做了很多探索，包括PAI-Diffusion中文模型的开源、基于Blade的推理优化等，并且推出一系列行业解决方案。为了提升SD系列模型的易用性、降低使用门槛、释放AI模型的创造潜力，我们提出并训练完成面向SD自动Prompt美化器，使得用户只要输入一个极其简单的Prompt，就可以得到一系列经过语言模型优化过的、细节满满的Prompt，帮助您更简单地生成高颜值图像。在下文中，我们详细介绍PAI自动Prompt生成模型的功能和背后的技术干货。

一键体验Prompt自动生成

在详细介绍PAI自动Prompt生成模型前，我们首先给出体验Prompt生成效果。在以下的示例中，我们分别对比了原始Prompt和我们生成Prompt在Stable Diffusion v1.5底座模型上生成图像的效果。对于每个Prompt，我们随机生成两张图片进行对比。

原始Prompt，及Stable Diffusion生成图像效果	PAI-Prompt生成模型结果，及Stable Diffusion生成图像效果
例1 a painting of a native indian on horseback	例1 A beautiful portrait of an indigenous female riding a horse in the jungle, detailed, soft lighting, realistic oil painting by Seb McKinnon

例2 A large cabin on top of a sunny mountain in the style of Dreamworks, artstation	例2 A stunning digital painting by Dreamworks depicting a massive cabin situated atop a picturesque mountain range. The artwork is highly detailed, with each rock and crevice in the terrain rendered with breathtaking realism. Paying attention to the gentle curves of the mountains and the folds of the forest canopy, the artist has successfully captured the mystique and serenity of the mountain landscape. This artwork is trending on Artstation and is a must-see for anyone who appreciates the art of digital painting.

例3 A majestic sailing ship	例3 a massive sailing ship, epic, cinematic, artstation, greg rutkowski, james gurney, sparth

例4 Digital painting of a girl with candy hat.	例4 Portrait of a girl wearing a Candy Halet hat, face, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by Krenz Cushart and Artem Demura and alphonse mucha

例5 An ornate antique mirror with a gilded frame	例5 a beautiful ornate antique mirror by philippe starck and victoria hagan, cinematic lighting, soft lighting, soft details, painting oil on canvas by edmund blair leighton and charlie bowater octane render, hdr, trending on artstation, 4 k, 8 k, hd

例6 a cute girl	例6 a beautiful very cute girl sitting in a trenchcoat with golden collar photo by Annie Leibovitz, highly detailled, dramatic lighting, trending on artstation

我们也在ModelScope上构建了一个Demo以供用户体验（链接）。只要输入一个简单的Prompt，我们的模型就可以扩写成一个细节描述详细的Prompt，然后使用Stable Diffusion进行文图生成。

学新通

Prompt美化器背后的技术

在本节中，我们详细介绍如何训练一个优秀的Prompt美化器。整体架构如下图所示：

学新通

底座模型

PAI-Prompt生成模型底座基于BLOOM（BigScience Language Open-science Open-access Multilingual），由BigScience训练并且开源。BLOOM具有Decoder-only模型架构，和GPT-3非常相似，最大具有1760亿参数。在我们的模型训练时，我们选择了具有11亿参数的BLOOM模型进行继续训练，其原因在于它的参数量不大，当它用于线上部署时，推理速度较快，而且训练和推理成本都相当可控，具有很高的实用价值。

无需数据标注的SFT

由于模型的训练需要高质量和低质量的Prompt对，这些数据一般很难直接去做标注。因此我们使用如下方法自动话地搜集训练数据。

摘要生成：首先，我们搜集开源的高质量Prompt数据集，作为语言模型生成的Target。在这种场景下，由于缺乏低质量的Prompt，我们可以使用ChatGPT等大模型生成Summary作为低质量的Prompt。以下是一个Summary的例子:

Instruction: Summarize this image description in 10 words or less and ignore words like archdaily, wallpaper, highly detailed, 8k, [r/earthporn]. Check English. Ignore modifiers 'by xxx', 'with xxx' or 'in xxx'. Ignore adjective.
Input: a beautiful very detailed illustration of abandoned urbex unfinished building city nature industrial architecture architecture building spaceport by caspar david friedrich, scumm bar meadow nature synthwave, archdaily, wallpaper, highly detailed, trending on artstation.
Output: of abandoned urban building in nature.

Input: portrait painting of a lost boy by cedric peyravernay and greg ruthkowski, in the style of dishonored concept art, concept design, trending on artstation \n
Output:

Prompt扩展：利用低质量的Prompt，使用ChatGPT生成更高质量的Prompt。以下是一个Prompt生成的例子:

Instruction: create a detailed and creative description of the 'input'. Your response should include specific details about the colors, textures, and overall composition of the painting, as well as any unique features or elements that make it stand out.
Please provide a clear and concise response that captures the essence of the painting while also encouraging creativity and originality in your description. You may consider describing the setting or environment depicted in the painting.
Input: Digital painting of a girl with candy hat.

图像标题生成：我们搜集了高质量的图文对，对图像进行image captioning，生成更多可供训练模型的Prompt。

最终，得到的数据会进行美观值和一致性筛选，我们保留质量较高的数据用于SFT。

面向SD的强化学习优化

RLHF（Reinforcement Learning from Human Feedback）对ChatGPT等大模型的效果提升有重要的作用。在我们的应用中，我们设计了面向Stable Diffusion的强化学习算法，优化Prompt生成模型。

对于Reward Model，我们在得到图文对数据基础上，使用美学值评分模型来给图片打分，并使用一个语言模型来拟合对应Pprompt->美学值评分，将此作为我们的打分模型。此外，我们还采用最先进的强化学习算法PPO来进一步优化模型，奖励函数使用打分模型和一致性得分加权:

reward = a * score_model(prompt)   b * consistency_model(raw_prompt, prompt)

这样可以进一步加强我们生成Prompt的美观性和图文一致性。在完成了上述三阶段训练以后，我们的模型在小参数规模下（1.1B）的效果不亚于ChatGPT生成Prompt的效果，示例如下：

例1：科幻风格
原始Prompt	Iron Man in the rain
ChatGPT	Capture the highly detailed and dramatic image of Iron Man in vivid colors, highlighting his sleek armor and technological superhero persona in the rain. Use a fast shutter speed to freeze the raindrops and a polarizing filter to reduce glare, creating a captivating and memorable image. Isolate the character with a telephoto lens and experiment with the composition to create a powerful image.
PAI	A stunning photograph of Iron Man standing under heavy rain, with intricate shading and contrast between the dark shadows and bright sun light. This image captures the fearless hero's intense focus, as well as the resilience of the cityscape. Shot with a professional camera and lens, the image highlights every detail of Iron Man's suit and equipment, while also showcasing the rain's cascading down behind him. Perfect for fans of action movies, this image is sure to impress with its HDR and UHD features.

例2：自然风光
原始Prompt	A vast desert landscape, with sand dunes and a blazing sun.
ChatGPT	Witness the awe-inspiring beauty of a vast desert landscape with towering sand dunes and a blazing sun in the background in a highly detailed, professional photograph. The use of vivid colors and HDR can enhance the mood and make this image realistic and breathtakingly beautiful. This photograph could be an art form trending on ArtStation, and the Unreal Engine could be utilized to create a realistic landscape, transporting viewers to another world. The addition of shadows and a high-resolution scan can capture every detail of this stunning desert landscape, making it a captivating work of art.
PAI	A stunning photograph capturing a unique and breathtaking view of a vast desert landscape with thick shimmering sand dunes stretching towards the horizon. The warm glow of the burning sun adds depth and drama to this surreal scene, making it a stunning addition to any photography collection. Shot with a high-resolution camera and perfectly focused on the details of the sand dunes, this image is a perfect example of vivid colors, realistic textures and breathtaking lighting.

例3：艺术画作
原始Prompt	A painting depicting a fierce medieval war, featuring soldiers and some horses from both sides of the conflict. The style is realistic.
ChatGPT	As a painter, your skills and techniques will be put to the test as you create this masterpiece. You must carefully use shading and highlights to add depth and texture to the painting, which will bring the scene to life. The horses should be depicted in motion, their muscles and mane appearing in quite vivid detail.
PAI	A stunning painting depicting a fierce medieval war with soldiers charging across battlefield, beautiful horses mounted on horseback between the tanks and explosions, trending on artstation, oil on canvas, hd, 4k, digital art

例4：人文景观
原始Prompt	A bunch of flying balloons against the blue sky with white clouds
ChatGPT	A bunch of colorful balloons soaring high against the blue sky with white clouds. The vivid and bright colors of the balloons stand out against the blue sky, and the shallow depth of field creates bokeh in the background, bringing out the sharp and highly detailed features of the balloons. It's a breathtaking and joyful image that captures the essence of freedom and joy.
PAI	A stunning image of a group of brightly colored balloons soaring high in the sky, against a beautiful blue sky dotted with wispy white clouds. The attention to detail and vibrant colors make this image incredibly vivid and awe-inspiring. Captured with professional studio lighting and highly detailed 64-percent HDR resolution, this image captures the beauty and freedom of flying.

例5：动物世界
原始Prompt	A little pig running under the setting sun
ChatGPT	The image was captured with a professional camera and lens, using studio lighting to accentuate the warm tones of the sunset. The pig is captured in motion, with a shallow depth of field creating bokeh in the background, and the warm glow of the sunset illuminating its face. The result is a highly detailed and vivid image of a little pig running under the sunset.
PAI	a little pig is running like a lion under the bright sun, digital art, intricate, realistic, cinematic lighting, hyper detailed, trending on artstation

模型调用

如果想快速体验模型效果，可以访问我们在ModelScope社区的创空间页面链接。同时，我们也在huggingface等开源社区上架了这一模型，使用接口如下：

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd')
model = AutoModelForCausalLM.from_pretrained('alibaba-pai/pai-bloom-1b1-text2prompt-sd').eval（).cuda()

raw_prompt = '1 girl'
input = f'Instruction: Give a simple description of the image to generate a drawing prompt.\nInput: {raw_prompt}\nOutput:'
input_ids = tokenizer.encode(input, return_tensors='pt').cuda()

outputs = model.generate(
    input_ids,
    max_length=384,
    do_sample=True,
    temperature=1.0,
    top_k=50,
    top_p=0.95,
    repetition_penalty=1.2,
    num_return_sequences=5)

prompts = tokenizer.batch_decode(outputs[:, input_ids.size(1):], skip_special_tokens=True)
prompts = [p.strip() for p in prompts]
print(prompts)

未来展望

在这一期的工作中，我们提出并训练完成面向SD自动Prompt美化器，使得用户只要输入一个极其简单的Prompt，就可以得到一系列经过语言模型优化过的Prompt，帮助您更简单地生成高颜值图像。在未来，我们计划增加这一类模型对各种类SD模型的适配，丰富PAI-AIGC的算法和产品能力。

阿里灵杰回顾

免费领取交互式建模PAI-DSW、模型训练PAI-DLC 5000CU*H计算资源包，以及价值500元模型在线服务 PAI-EAS 抵扣包。

这篇好文章是转载于：学新通技术网

BeautifulPromptPAI推出自研Prompt美化器，赋能AIGC一键出美图

背景

一键体验Prompt自动生成

Prompt美化器背后的技术

底座模型

无需数据标注的SFT

面向SD的强化学习优化

模型调用

未来展望

阿里灵杰回顾

photoshop保存的图片太大微信发不了怎么办

photoshop扩展功能面板显示灰色怎么办

word里面弄一个表格后上面的标题会跑到下面怎么办

《学习通》视频自动暂停处理方法

TikTok加速器哪个好免费的TK加速器推荐

Android 11 保存文件到外部存储，并分享文件

excel图片置于文字下方的方法

excel下划线不显示怎么办

微信公众号没有声音提示怎么办

微信运动停用后别人还能看到步数吗