To help you understand the trends surrounding AI and other new technologies and what we expect to happen in the future, our highly experienced Kiplinger Letter team will keep you abreast of the latest developments and forecasts. (Get a free issue of The Kiplinger Letter or subscribe). You’ll get all the latest news first by subscribing, but we will publish many (but not all) of the forecasts a few days afterward online. Here’s the latest…
The tech industry is buzzing about AI agents. Part of the excitement stems from the distinctive, and almost human, way some of them work: They “see” your screen.
Microsoft recently unveiled a way to build an AI agent that “follows a loop of seeing, thinking, and doing,” says Sangya Singh, VP of Power Platform Intelligent Automations at Microsoft. The process goes something like this: An agent uses AI vision to look at the screen, capturing screenshots and interpreting the pixels, so it can navigate a computer desktop or web browser, including buttons, forms and web pages, says Singh.
Sign up for Kiplinger’s Free E-Newsletters
Profit and prosper with the best of expert advice on investing, taxes, retirement, personal finance and more – straight to your e-mail.
Profit and prosper with the best of expert advice – straight to your e-mail.
It’s powered by large language models, the systems at the core of generative AI. Microsoft uses an OpenAI model, which the Microsoft-backed start-up says is “a universal interface for AI to interact with the digital world.”
The AI agent controls a virtual mouse and keyboard, using AI reasoning to navigate a computer just like a human would. “It acts, by clicking, typing, or scrolling, until the task is done,” says Singh.
How does it know exactly what to do? A human gives it instructions in plain English.
This new AI tech offers a promising way to automate any computer task, without needing preprogrammed software or special protocols between apps. Current popular automation methods need to be preprogrammed and work best on tasks with rote steps. In contrast, these AI agents don’t need such programming and can work through various hurdles, such as a screen that looks different than normal or an intrusive pop-up message.
Using vision, this type of AI can take on incredibly complex tasks and navigate any apps or websites that it comes across. For consumers, AI agents could book a hotel, rent a car or buy a product online. At work, uses include data entry, market research and invoice processing, to save time and reduce human error. These examples are just the start of a seemingly endless list of digital tasks, many of which are already emerging.
Microsoft says “irreversible” decisions and “high-risk actions,” such as large financial transactions, will have an alert for human approval. Privacy will be a big concern as AI tools use credit card data and other personal information. There’s also the chance that AI could make mistakes. These issues are well known and safeguards are being put in place.
Note that using this type of generative AI can sometimes be an inefficient way to automate tasks, as it uses a lot of computing power, though it will get more efficient over time. Look for an explosion of AI agents in the coming year or so, using vision to navigate digital chores.