Screen Vision

Give Command Mode visual context by capturing a screenshot of your active window, so your voice commands understand what's on screen.

What Is Screen Vision?

Screen Vision is an optional enhancement to Command Mode. When enabled, LotusQ captures a screenshot of your focused window the moment you activate Command Mode. This screenshot is sent alongside your voice command to an AI vision model, giving it visual context about what you're looking at.

This means you can say things like “fix the typo in the second paragraph” or “move the title above the image” and the AI can see exactly what you mean.

💡 Screen Vision is a Pro feature and is disabled by default. You can enable it in Settings.

How It Works

1You hold the Command Mode key (default F6)

2LotusQ captures the active window as a compressed screenshot

3You speak your command (e.g. “summarize the table on screen”)

4Your voice command + screenshot are sent to the vision AI

5The AI result is pasted back into your application

💡 If the vision API is unavailable, LotusQ automatically falls back to text-only Command Mode so your command still works.

Example Commands with Vision

"Fix the typo I can see"— AI reads the screen to find and correct the error

"Describe this chart"— Generates a text description of a visible chart or graph

"Rewrite the highlighted section"— Uses visual context to identify highlighted text

"What does this error mean?"— Reads an error dialog or stack trace on screen

"Translate the text on screen to French"— Identifies visible text and translates it

"Convert this table to markdown"— Reads a visible table and outputs markdown format

Enabling Screen Vision

1. Open Settings in LotusQ
2. Scroll to the Command Mode section
3. Toggle Screen Vision on

Once enabled, every Command Mode activation will include a screenshot automatically. No extra steps are required during use.

Screenshot Details

Window-Only Capture

Only the active window is captured, not your entire screen. Other windows, desktop, and taskbar are not included.

Compressed & Lightweight

Screenshots are resized to 1280px max width and compressed to JPEG (under 250 KB). This keeps processing fast and bandwidth low.

Not Stored

Screenshots are sent to the vision API for processing and are not saved to disk or retained after the command completes.

Platform Support

Windows — Captures the foreground window using native Win32 APIs.

macOS — Uses Quartz window capture targeted to the frontmost application.

Linux — Uses xdotool for window geometry with PIL capture. Wayland support via gnome-screenshot and grim.

Tips

• Screen Vision works best when the relevant content is visible in the active window
• You can still select text before activating Command Mode, the AI uses both the selection and the screenshot
• If you don't need visual context, you can leave Screen Vision off for faster command processing
• Vision commands may take slightly longer than text-only commands (a few extra seconds)

→ Command Mode → Auto-Detect & App Detection → Smart Styles & Tones