AI video, but actually steerable

An agent that turns stories into films.Directed by you.

  • Any length
  • Local first
  • Open core
Left

One autonomous pipeline run.

Right

The same film after the human directed it shot by shot — without leaving the chat.

The agent does the work. You direct it.

BeforeAutonomous, unattended run
After9 directions later

An agentic workspace for video projects.

Case study · Better Image

A 60-second film, directed shot by shot — through the chat.

We gave Dhee a story. About thirty minutes later, the pipeline produced a first cut on its own. We directed the result across the rest of the day, in short bursts — checking in between other things, not babysitting. Below: the unattended cut, the conversation that reshaped it, and the directed cut.

Cut 01

First cut

What the pipeline produced unattended — fifteen shots, one full run, plot → frames → video assembly. Everything below is the result of directing this.

The session

  1. 11:03 IST — the pipeline finishes its first end-to-end run. The film exists. Now the direction begins.

  2. Direction11:28

    In shot 4, we are observing that even though the dialogue is for Sera, Malachor is lip-synching for it. We should fix the motion directive for Malachor's lip to be still while the dialogue is being delivered in the background.

  3. Agent11:29excerpt

    I can see the problem clearly. The motion directive says "Sera says … the camera continues its gradual push in to a close-up of Malachor's face as she speaks" — which tells the generator to animate Malachor's face during dialogue, so it's syncing his lips to her line.

  4. Direction11:30

    Apply it.

  5. Agent11:31

    Motion directive updated — Malachor's lips are now explicitly still during Sera's off-screen dialogue. Now I'll invalidate just the shot video (consumer, so my edit survives) and re-render.

  6. Scene 1 · Shot 4
    BeforeBefore
    AfterAfter

    Frames unchanged. Only the motion directive node was edited.

  7. Direction11:55

    Shot 1 in scene 1 is too short for the narration. Increase its duration to 5 seconds.

  8. Agent12:10excerpt

    Found it — the timeline still has shot 1 at duration 3. The video was re-rendered at 5 seconds, but the timeline that controls final assembly never got updated. I also need to reflow all downstream segments' start/end times. Delta = +2s across shots 2–15.

  9. The agent edits three separate config files, recomputes every downstream timestamp, and re-stitches the assembly. No prompt is regenerated.

  10. Direction13:50

    For shot 8 FF, can we use the same image as shot 7 LF?

  11. Direction13:51

    Don't regenerate the image, just copy everything over.

  12. Agent13:54

    Done. Shot 8's first frame now contains shot 7's last frame image. The project.json reference still points to the same filename, so everything links up without needing regeneration.

  13. Scene 1 · Shot 8
    BeforeBefore
    AfterAfter

    No new generation. An existing frame was copied across the production graph.

  14. Don't regenerate the image. Just copy everything over.

    Direction, 13:51 IST
  15. Direction14:08

    In shot 10, the question "How long since…" is being asked by Sera. This is out of character. The commander is the guy. So the question should be asked by him. We can keep the same FF, LF. Only change the motion directive.

  16. Agent14:10excerpt

    The dialogue assignment is wrong for the character. The story has Sera asking, but you want Malachor to be the one asking. Since you want to keep the same first/last frames (Sera with datapad), the motion directive needs to reframe it as Malachor's off-screen voice while Sera reacts.

  17. Scene 1 · Shot 10
    BeforeBefore · Sera asks
    AfterAfter · Malachor asks

    Same frames, different speaker. The agent edited the dialogue assignment and motion directive only.

  18. Direction14:48

    For shot 11, we can reuse the LF from shot 10. And yeah Sera saying "Four hours" now fits with the LF of shot 10 and it can be an i2v. Matches perfectly.

  19. Agent14:54excerpt

    Image: copied shot 10's last frame → shot 11's first frame (Sera with the cyan datapad glow on her face). Audio: MALACHOR: Four hours. → SERA: Four hours. Strategy: i2v. Invalidating only the video — keeping the copied image untouched.

  20. Scene 1 · Shot 11
    BeforeBefore · independent shot
    AfterAfter · continuous from shot 10

    Reused asset, reassigned speaker, switched generation technique to i2v. Three graph edits in one instruction.

  21. The same conversation, in the app. Shot strip below, chat on the right, the production graph behind it all.
  22. Direction22:30

    OK let's edit shot 12. In shot 11 ending the woman has gotten up from the chair. So we need to reframe the FF of shot 12 where she is not in her chair.

  23. Direction22:31

    Also, the empty chair image is perfect with Sera off-screen. Change the dialogue to "Wake me in three and a half." A small pause. Then "By the way, nice coat."

  24. Agent22:36

    Love that — the casual "By the way, nice coat" after the order is a great character beat for Malachor.

  25. Scene 1 · Shot 12
    BeforeBefore · Sera in chair
    AfterAfter · empty chair, off-screen line

    Continuity reasoning across shots, applied to a single first-frame and the dialogue track.

  26. Direction20:45

    Shot 13 should be Sera walking off from the deck of the spaceship. We don't have a setting file for this. Can we generate one, based on the current setting — the other side of the deck, maybe a door which leads to crew quarters?

  27. Agent20:54excerpt

    Registered the exit corridor as a new sub-setting and generated its image with keepPrompt so the producer LLM doesn't rewrite the prompt file. Referenced the new setting from shot 13's image prompt; only the image and downstream video nodes regenerated.

  28. Scene 1 · Shot 13
    BeforeBefore · no exit setting
    AfterAfter · Sera walks off the deck

    A new sub-setting is added as a graph node. Shot 13 picks it up. The rest of the production graph is untouched.

  29. Nine iterations later, the cut you saw at the top of this page is reached. No re-runs. No prompt rewrites. Every change above produced one of the differences between the first cut and the final cut.

Cut 14

Final cut

The same story after nine directed iterations. One shot dropped, frames reused across shots, dialogue reassigned, a new setting added — and the agent never started over.
Different project, different style, same agent. Mid-run interrupt, local pipeline. LLM · COMFY · VLM all local.

Direct your own film.

Dhee Desktop runs this same agent locally. Start with a story and step in whenever you want — interrupt the pipeline mid-run, edit any node directly, or let it produce a draft and direct from there. The conversation is the work, whenever you choose to have it.

Download Dhee Desktop