Stable Diffusion is the most powerful free AI image generator available. It is also the one with the steepest learning curve — because unlike DALL-E or Midjourney, it rewards technical understanding rather than just clever phrasing.
The good news: you do not need to understand the mathematics behind it. You need to understand a handful of practical concepts — negative prompts, weighting, model selection, and prompt structure — that most guides either skip or bury in technical jargon.
This guide covers everything from your first prompt to intermediate techniques, with 50+ copy-paste prompts sorted by style and use case.
How Stable Diffusion Differs from DALL-E and Midjourney
Understanding the differences saves significant frustration:
| Feature | Stable Diffusion | DALL-E 3 | Midjourney v6 |
|---|---|---|---|
| Cost | Free (self-hosted) or low-cost cloud | ChatGPT Plus ($20/mo) | Subscription from $10/mo |
| Prompt style | Keyword-focused, comma-separated | Natural language sentences | Both work; keywords preferred |
| Negative prompts | Explicit separate field | Embedded in main prompt | --no parameter |
| Customisation | Maximum — models, LoRAs, embeddings | Limited | Moderate — style parameters |
| Technical knowledge needed | Medium-high | Low | Low-medium |
| Commercial use | Depends on model licence | Yes (per OpenAI ToS) | Yes (paid tiers) |
Stable Diffusion rewards those who invest time learning it with a level of control and customisation that the other tools simply cannot match.
Stable Diffusion Prompt Structure
Unlike DALL-E which handles flowing prose, Stable Diffusion responds best to a structured keyword approach. The general format:
[Subject description], [style/medium], [lighting], [composition],
[quality modifiers], [artist reference if applicable]
Words at the beginning of the prompt carry more weight. Put your most important elements first.
Standard quality booster keywords
These terms reliably improve output quality across most SD checkpoints. Add them at the end of your positive prompt:
masterpiece, best quality, highly detailed, sharp focus, 8k uhd,
professional photography, award winning, intricate details
Full structure example
Portrait of a weathered sea captain, 60s, silver beard,
intense blue eyes, wearing an oilskin coat,
oil painting, Rembrandt lighting, dramatic shadows,
three-quarter view, dark nautical background,
masterpiece, best quality, highly detailed,
in the style of John Singer Sargent
Negative Prompts: The Most Underused Tool
Negative prompts tell Stable Diffusion what to avoid generating. They are as important as the positive prompt and most beginners ignore them entirely, which is why they get distorted hands, blurry faces, and overexposed outputs.
Universal negative prompt (use as a starting baseline for everything)
ugly, bad anatomy, bad proportions, blurry, cloned face, cropped,
deformed, dehydrated, disfigured, duplicate, error, extra arms,
extra fingers, extra legs, extra limbs, fused fingers,
gross proportions, jpeg artifacts, long neck, low quality,
lowres, malformed limbs, missing arms, missing legs,
morbid, mutated hands, mutation, mutilated,
out of frame, poorly drawn face, poorly drawn hands,
signature, text, too many fingers, ugly, username,
watermark, worst quality
For photorealistic outputs, add:
cartoon, anime, illustration, painting, drawing, sketch,
unreal engine, render, 3d, cgi
For artistic/painted outputs, add:
photorealistic, photograph, photo, realistic, hyperrealistic
Prompt Weighting and Emphasis
Stable Diffusion (particularly with A1111/ComfyUI) supports syntax for emphasising or de-emphasising specific terms:
| Syntax | Effect | Example |
|---|---|---|
(word) |
Increase emphasis by 1.1x | (golden light) |
((word)) |
Increase emphasis by 1.21x | ((sharp focus)) |
(word:1.5) |
Increase emphasis by 1.5x (custom weight) | (dramatic lighting:1.4) |
[word] |
Decrease emphasis by 0.9x | [background details] |
(word:0.5) |
Decrease emphasis to 0.5x | (smile:0.5) |
Use weighting sparingly. Heavily weighted terms can produce unnatural-looking results. A weight of 1.2-1.5 is usually enough to meaningfully shift the output.
Which Model to Use
Stable Diffusion is a family of models, not a single tool. The checkpoint (base model) you use determines the fundamental aesthetic range of your outputs.
| Model / Checkpoint | Best For | Notes |
|---|---|---|
| SDXL (Stable Diffusion XL) | General use, highest base quality | Current recommended starting point |
| Realistic Vision | Photorealistic portraits and scenes | Excellent skin tones and realistic faces |
| DreamShaper | Versatile — art and realism both | Good default for general use |
| Deliberate | Portraits, artistic styles | Strong aesthetic sensibility |
| AbsoluteReality | Photographic realism | Very strong for environmental shots |
| Anything v5 | Anime and manga style | Widely used for illustration work |
Photorealistic Prompts
Environmental portrait
Positive:
portrait of a middle-aged woman, botanist, surrounded by tropical plants,
natural light from skylights above, wearing a linen shirt,
warm expression, slightly off-camera gaze,
shallow depth of field, 85mm lens, Fujifilm GFX aesthetic,
soft bokeh background, muted greens and earth tones,
masterpiece, best quality, highly detailed, 8k
Negative:
ugly, bad anatomy, blurry, watermark, text, extra limbs,
deformed hands, low quality, cartoon, illustration
Urban architecture at night
Positive:
Tokyo street at night, neon reflections on wet asphalt,
steam rising from manhole covers, lone figure with umbrella,
blue hour atmosphere, anamorphic lens flare,
film photography, Kodak Vision3 500T,
cinematic composition, rule of thirds,
masterpiece, photorealistic, award winning photography
Negative:
blurry, low quality, cartoon, anime, 3d render,
overexposed, oversaturated, watermark, ugly
Golden hour landscape
Positive:
Scottish Highlands at golden hour, ancient stone wall in foreground,
rolling moorland stretching to misty mountains,
a single gnarled tree silhouetted against an amber sky,
(volumetric light rays:1.3), dramatic cloud formations,
shot on Sony A7R IV 24mm, landscape photography,
Peter Lik style, ultra wide, best quality, 8k uhd
Negative:
oversaturated, ugly, blurry, watermark, text,
low quality, people, buildings, modern elements
Artistic and Painterly Prompts
Oil painting portrait
Positive:
oil painting portrait, young nobleman in 17th century attire,
white lace collar, pensive expression, dark background,
(Rembrandt lighting:1.3), visible brushwork,
impasto technique, craquelure texture,
museum quality, Old Masters style,
Vermeer and Caravaggio influence, best quality, masterpiece
Negative:
photorealistic, photograph, digital art, smooth,
low quality, deformed, watermark, text
Impressionist landscape
Positive:
impressionist painting, a garden in full summer bloom,
dappled sunlight through chestnut trees,
a woman in a white dress reading under a parasol,
loose confident brushwork, warm palette,
(Monet and Renoir influence:1.3),
oil on canvas texture, gallery quality
Negative:
photorealistic, sharp edges, digital, low quality,
ugly, watermark, text, modern elements
Dark fantasy concept art
Positive:
dark fantasy concept art, ancient ruined temple swallowed by jungle,
(bioluminescent flora illuminating the scene:1.4),
crumbling stone columns, mysterious altar at center,
dramatic contrast, deep shadows,
atmospheric perspective, otherworldly,
Greg Rutkowski and Craig Mullins style,
masterpiece, highly detailed, 4k
Negative:
low quality, blurry, ugly, watermark, text,
modern elements, people (unless relevant)
Character Prompts
Fantasy character full body
Positive:
full body character concept, female elven ranger,
leather armour with silver filigree, auburn hair in a loose braid,
green eyes, longbow across back,
forest background, dappled morning light,
(rim lighting from behind:1.2),
character design sheet, game art,
Artgerm and Wlop influence, best quality, masterpiece
Negative:
bad anatomy, extra limbs, deformed hands, ugly proportions,
blurry, low quality, watermark, text,
multiple characters (unless intended)
Cyberpunk character portrait
Positive:
cyberpunk portrait, male, late 30s,
chrome cybernetic left arm, facial tattoos,
wearing a worn leather jacket with neon accents,
(neon pink and cyan lighting:1.3),
rain-soaked background, Tokyo megacity,
close-up, dramatic angle, cinematic,
Blade Runner 2049 aesthetic, best quality, 8k
Negative:
blurry, bad anatomy, low quality, watermark, cartoon, anime,
extra limbs, deformed, ugly
Environment and Landscape Prompts
Cosy interior
Positive:
cosy cottage interior, winter evening,
stone fireplace crackling with warm fire,
overstuffed armchair with throw blanket,
bookshelves covering every wall, a tabby cat sleeping,
warm amber light, snow falling outside window,
photorealistic, 35mm, shallow depth of field,
best quality, masterpiece, highly detailed
Negative:
people, ugly, distorted, low quality, blurry,
watermark, text, modern technology
Alien world
Positive:
alien planet landscape, twin moons rising over crystal spires,
(bioluminescent vegetation covering the ground:1.4),
rust-red sky with green auroras,
ancient ruins half-buried in glowing soil,
science fiction concept art, epic scale,
John Harris and Chesley Bonestell influence,
best quality, highly detailed, 4k
Negative:
Earth-like vegetation, humans (unless small for scale),
blurry, low quality, watermark, text, overexposed
Product and Commercial Prompts
Luxury product photography
Positive:
luxury product photography, [product: glass perfume bottle /
leather wallet / ceramic coffee cup],
on a dark polished marble surface,
single key light from upper left,
(sharp specular highlight:1.3),
soft shadow below,
pure black background,
commercial photography, studio lighting,
best quality, photorealistic, 8k
Negative:
blurry, low quality, ugly, watermark, text,
multiple products (unless intended),
harsh shadows, overexposed, amateur lighting
Sampler and Settings Notes
Beyond the prompt, these settings have the most impact on output quality:
| Setting | Recommended Value | Notes |
|---|---|---|
| Sampler | DPM++ 2M Karras or Euler a | Most reliable for quality and speed balance |
| Steps | 20-30 | More steps = more detail up to a point; diminishing returns after 40 |
| CFG Scale | 7-9 | Higher = more prompt adherence but less creativity; lower = looser but more natural |
| Resolution | 512×768 (SD 1.5) or 1024×1024 (SDXL) | Use native model resolution to avoid artifacts |
| Seed | -1 (random) | Fix a seed once you find a composition you like, then iterate the prompt |
Common Mistakes
Not using a negative prompt at all
This is the single most impactful omission. Without a negative prompt, Stable Diffusion has no constraint on what to avoid. The universal negative prompt above fixes the majority of common quality issues immediately.
Using the wrong model for the task
A photorealism model will fight against anime-style prompts. An anime model will struggle with photorealistic scenes. Match the checkpoint to the output you want before spending time on prompt refinement.
Too many conflicting style references
Stacking five artist names into one prompt averages them together into an incoherent style. Pick one or two references that work in the same direction. Two artists with similar aesthetics reinforce each other; five artists with different aesthetics cancel each other out.
CFG scale too high
A CFG scale above 12 tends to produce oversaturated, high-contrast, almost cartoon-like results even with realistic prompts. Keep it between 7-9 for most use cases and only push higher for very specific stylization.
Frequently Asked Questions
Is Stable Diffusion free to use?
The model weights are free and open source. Running it locally requires a GPU — an NVIDIA card with at least 6GB VRAM for SD 1.5, or 8-12GB for SDXL. Cloud-based interfaces like Google Colab (free tier with limitations), RunDiffusion, and Stable Diffusion Web are alternatives if local running is not viable.
What is the best interface for Stable Diffusion?
Automatic1111 (A1111) is the most widely used and supported, with the largest ecosystem of extensions. ComfyUI is preferred for advanced users building complex workflows. For beginners who want to run Stable Diffusion without technical setup, Invoke AI has one of the cleanest interfaces.
How do LoRAs work and are they worth using?
LoRAs (Low-Rank Adaptation) are small fine-tuned model add-ons that specialise in generating specific styles, characters, or concepts that the base model handles inconsistently. For example, a LoRA trained on a specific art style will produce that style far more reliably than a style prompt alone. They are absolutely worth using once you are comfortable with the basics — Civitai.com hosts thousands of free ones.
More Image Prompt Resources
For a broader guide to image prompts across all AI tools — including lighting modifiers, style references, and photography prompts — see our complete AI image prompts guide. For ChatGPT-specific image prompts using DALL-E, the ChatGPT image prompts guide covers that use case specifically. And for the prompting principles that underpin all of this, our guide to prompt engineering techniques covers the foundational methods.
More AI image resources and prompt libraries at Promptorix.






Leave a Reply