Pixel-perfect desktop automation with mouse, keyboard, screen capture, window management, and clipboard control.
npx clawhub@latest install desktop-controlDesktop Control is a comprehensive desktop automation skill that gives AI agents precise, programmatic control over your entire desktop environment. It covers mouse movements (including smooth bezier-curve paths), keyboard input at configurable speeds, screen capture and image recognition, window management, and clipboard operations. Built on PyAutoGUI and OpenCV, it includes safety mechanisms like failsafe corners and approval mode to keep automation under your control.
npx clawhub@latest install desktop-controlClick the Install button at the top of this page for one-click setup
Move the cursor to absolute screen coordinates or relative offsets, with optional smooth bezier-curve paths that mimic natural human movement. Supports left, right, and middle clicks, double/triple clicks, drag-and-drop, and both vertical and horizontal scrolling.
Type text at any speed from instant to human-like WPM, execute multi-key hotkeys (Ctrl+C, Win+R, etc.), press special and function keys, and manually hold or release modifier keys for complex interactions like multi-file selection.
Capture the full screen or any rectangular region and save the result as a PNG. Optionally use OpenCV to locate a template image anywhere on screen with a configurable confidence threshold, enabling element detection without hardcoded coordinates.
List all open windows, activate any window by partial title match, retrieve the currently focused window, and read window position, size, and title — making it straightforward to orchestrate multi-application workflows.
A failsafe mode aborts automation when the mouse reaches any screen corner. An approval mode prompts for user confirmation before each action. Bounds checking prevents out-of-screen operations, and all actions are logged for auditing.
Programmatically write text to the clipboard or read its current contents, enabling seamless data transfer between applications without simulating keyboard shortcuts.
An agent activates a target application window, clicks into each form field in sequence, types the appropriate values at human-like speed, and submits — replicating a user filling out a complex web or desktop form without any app API access.
After triggering an application action, capture a specific screen region to save as a timestamped PNG, then use image recognition to confirm that the expected button or dialog is visible before proceeding.
Hold Ctrl and click multiple files in a file manager to select them, then drag the selection to a destination folder — all scripted as a single reproducible automation sequence.
Activate a source application, select and copy data using keyboard shortcuts, switch to a destination application via window activation, and paste — automating a workflow that would otherwise require manual copy-paste across programs.
pyautogui, pillow, pygetwindow are required. opencv-python is optional but needed for image recognition (find_on_screen).npx clawhub@latest install desktop-controlLog in to write a review
No reviews yet. Be the first to share your experience!