Screen Vision
Give Command Mode visual context by capturing a screenshot of your active window, so your voice commands understand what's on screen.
What Is Screen Vision?
Screen Vision is an optional enhancement to Command Mode. When enabled, LotusQ captures a screenshot of your focused window the moment you activate Command Mode. This screenshot is sent alongside your voice command to an AI vision model, giving it visual context about what you're looking at.
This means you can say things like “fix the typo in the second paragraph” or “move the title above the image” and the AI can see exactly what you mean.
How It Works
Example Commands with Vision
Enabling Screen Vision
- 1. Open Settings in LotusQ
- 2. Scroll to the Command Mode section
- 3. Toggle Screen Vision on
Once enabled, every Command Mode activation will include a screenshot automatically. No extra steps are required during use.
Screenshot Details
Only the active window is captured, not your entire screen. Other windows, desktop, and taskbar are not included.
Screenshots are resized to 1280px max width and compressed to JPEG (under 250 KB). This keeps processing fast and bandwidth low.
Screenshots are sent to the vision API for processing and are not saved to disk or retained after the command completes.
Platform Support
Tips
- • Screen Vision works best when the relevant content is visible in the active window
- • You can still select text before activating Command Mode, the AI uses both the selection and the screenshot
- • If you don't need visual context, you can leave Screen Vision off for faster command processing
- • Vision commands may take slightly longer than text-only commands (a few extra seconds)