"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."
Yeah, not really.
I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.
What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.
Modern captchas use a number of tools including many of the approaches you mentioned. This why you might sometimes see a CloudFlare "I am not a robot" checkbox that checks itself and moves along before you have much time to even react. It's looking at a number of signals to determine that you're probably human before you've even checked the box.
When I am using keyboard navigation, shortcuts and autofills, I seem to get mistaken for a bot a lot. These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.
About five years ago, maybe more, Google started sending me captchas if I ran too many repetitive searches. I could be wrong, but it feel like most large platforms have fairly sophisticated anti-bot/scraping stuff in place.
When I go to a shopping website I want to be able to tell my browser "hey please go through all the sideboards on this list and filter out for the ones that are larger than 155cm and smaller than 100cm, prioritise the ones with dark wood and space for vinyl records which are 31.43cm tall" for example.
Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!
Hey, we’re working on MatterRank which is pretty similar to this but currently works on web search. (e.g. I want to prioritize results that talk about X and have Y bias and I want to deprioritize those that are trying to sell me something). Feel free to try it out at https://matterrank.ai
Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?
Most of these are not a real concern with remote servers with Oauth. If you install the PayPal MCP MCP server from im-deffo-not-hacking-you.com than https://mcp.paypal.com/sse its the same sec model as anything else online...
At the risk of it sounding like i support theft; the automobile, you know, enabled the likes of Bonnie and Clyde and that whole era of lawlessness. Until the fbi and crossing county lines became a thing.
So im not sure id give up the sum total progress of the automobile just because the first decade was a bad one
I know what you mean, I think MCP is being widely adopted but it's not grassroots.. its a quick entry to this market by an established AI company trying to dominate the mind/market share of developers before consensus can be reached developers.
It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,
and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc
Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images
with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
It seems strange to me to focus on this sort of standard well in advance of models being reliable enough to, ya know, actually be able perform these operations on behalf of the user with any sort of strong reliability that you would need for widespread adoption to be successful.
Cryptocurrency "if you build it they'll come" vibes.
Maybe because the LLM improvements haven't been that good in the last year, they needed some new thing to hype it/market it.
EDIT: Don't get me wrong, the benchmark scores are indeed higher, but in my personal experience, LLMs make as many mistakes as they did before, still too unreliable to use for cases where you actually need a factually correct answer.
I think MCPs compensate for the unreliability issue by providing a minimal and well defined interface to a controlled set of actions. That way, the llm doesn't have to be as reliable thinking what it needs to do and in acting, just in choosing what to do from a short list.
Crazy, in looking up some info on the web and creating a Spreadsheet on Google Sheets to insert the results, it worked almost perfectly the first time and completely failed subsequently on 8-10 different tries.
Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?
I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".
Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.
Same with identifying cells A1, B1, etc. and inserting into the rows.
Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.
Kudos to how smooth this experience is though, very nice setup & execution!
EDIT 2:
The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(
Such a rich UI like google sheets seems like a bad use case for such a general "browser automation" MCP server. Would be cool to see an MCP server like this, but with specific tools that let the LLM read and write to google sheets cells. I'm sure it would knock these tasks out of the park if it had a more specific abstraction instead of generally interacting with a webpage
What you're experiencing is commonly referred to as "luck". It's the same reason people consistently think newer versions of ChatGPT are nerfed in some way. In reality, people just got lucky originally and have unrealistic expectations based on this originally positive outcome.
There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.
Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.
I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.
Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).
Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).
Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.
I just view it as a relative minor convenience, but it's not some game-changer IMO.
The tool use / function calling thing far predates Anthropic releasing the MCP specification and it really wasn't that onerous to do before either. You could provide a json schema spec and tell the model to generate compliant json to pass to the API in question. MCP doesn't inherently solve any of the problems that come up in that sort of workflow, but it does provide an idiomatic approach for it (so there's a non-zero value there, but not much).
Yea it certainly does benefit Claude Desktop to some degree, but most MCP servers are a few hundred SLOC and the protocol schema itself is only ~400 SLOC. If that was the only major obstacle standing in the way of adoption, I'd be very surprised.
Coupled with the fact that any LLM trained for tool use can utilize the protocol, it doesn't feel like much of a moat that uniquely positions Claude Desktop in a meaningful way.
MCP is useful because anthropic has a disproportionate share of API traffic relative to its valuation and a tiny share of first-party client traffic. The best way around this is to shift as much traffic to API as possible.
Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp
Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.
2025-04-07T18:43:26.537Z [browsermcp] [info] Initializing server...
2025-04-07T18:43:26.603Z [browsermcp] [info] Server started and connected successfully
2025-04-07T18:43:26.610Z [browsermcp] [info] Message from client: {"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-ai","version":"0.1.0"}},"jsonrpc":"2.0","id":0}
node:internal/errors:983
const err = new Error(message);
^
Error: Command failed: FOR /F "tokens=5" %a in ('netstat -ano ^| findstr :9009') do taskkill /F /PID %a
at genericNodeError (node:internal/errors:983:15)
at wrappedFn (node:internal/errors:537:14)
at checkExecSyncError (node:child_process:882:11)
at execSync (node:child_process:954:15)
I just published a new version of the @browsermcp/mcp library (version 0.1.1) that handles the error better until I can investigate further so it should hopefully work now if you're using @browsermcp/mcp@latest.
It's working now with the 0.1.0 for me. But I will let you know if I experience any issues once I get updated to 0.1.1.
Thanks, great job! I like it overall, but I noticed it has some issues entering text in forms, even on google.com. It's able to find a workaround and insert the searched text in the URL, but it would be nice if the entry into forms worked well for UI testing.
The Puppeteer MCP server doesn't work well because it requires CSS selectors to interact with elements. It makes up CSS selectors rather than reading the page and generating working selectors.
The Playwright MCP server is great! Currently Browser MCP is largely an adaptation of the Playwright MCP server to use with your actual browser rather than creating a new one each time. This allows you to reuse your existing Chrome profile so that you don't need to log in to each service all over again and avoids bot detection which often triggers when using the fresh browser instances created by Playwright.
I also plan to add other useful tools (e.g. Browser MCP currently supports a tool to get the console logs which is useful for automated debugging) which will likely diverge from the Playwright MCP server features.
by the way, you can indeed access your personal context with Playwright. just `launchPersistentContext()` and set the userDataDir to that of your existing Chrome install:
Browser MCP uses the Chrome DevTools Protocol (CDP) to automate the browser so it currently only works for Chromium-based browsers.
Unfortunately, Firefox doesn't expose WebDriver BiDi (the standardized version of CDP) to browser extensions AFAIK (someone please correct me if I'm mistaken!), so I don't think I can support it even if I tried.
In the Task Automation demo, how does it know all of the attributes of the motorcycle he is trying to sell? Is it relying on the underlying LLM's embedded knowledge? But then how would it know the price and mileage? Is there some underlying document not referenced in the demo? Because that information is not in the prompt.
I just run into a bunch of errors on my Windows machine + Chrome when connected over remote-ssh. Extension installed, tab enabled, npx updated/installed, etc.
2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest
2025-04-07 10:57:11.606 [error] rmcp: No server info found
---
EDIT: Ended up fixing it by patching index.js. killProcessOnPort() was the problem. Can hit me up if you have questions, I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
> I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).
Thanks for the report and the update! I'd love to hear about what you changed — how can I get in touch? I didn't see anything in your HN profile. Feel free to email me at admin@browsermcp.io
Setting this up for claude desktop and cursor was alright.
Works well out of the box with little setup, and I like that it attached to my active browser tab. Keep up the good work.
I literally started working on the same exact idea last night haha. Great work OP. I'm curious, how are you feeding the web data to the LLM? Are you just passing the entire page contents to it and then having it interact with the page based on CSS selectors/xpath? Also, what are your thoughts on letting it do its own scripting to automate certain tasks?
I wonder if it's possible to add such plugins to election apps (e.g.: Slack).
It would be such a nice experience if I could just connect my AI of choice to a local app.
An extension is more user-friendly! I leave Chrome open basically 24/7 and having to create a new Chrome instance via the command line just to use Browser MCP just felt like too high of a barrier.
Exposing Chrome CDP is a terrible idea from a security and privacy perspective. You get the keys to the whole kingdom (and expose them on a standard port with a well documented API). All security features of the web can be bypassed, and then some, as CDP exposes even more capabilities than chrome extensions and without any form of supervision.
In the local context as well. Unlike say the docker socket which is protected by default using unix permissions, the CDP protocol has no authorization, authentication or permission mechanism.
Anything on your machine (such as a rogue browser extension or a malicious npm/pypi package) could scan for this and just get all your cookies - and that's only the beginning of your problems.
CDP can access any origin, any data stored (localStorage, indexedDB ...), any javascript heap, cross iframe and origin boundaries, run almost undetectable code that uses your sessions without you knowing, and the list is very long. CDP was never meant to expose a real browser in an untrusted context.
We work on something similar and aim to be the huggingface hub for automations you can run in your browser[0], with built-in support for MCP SSE.
Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.
> that's a great use case! the aria snapshot that browser mcp generates is enough to write tests for playwright using its role-based locators, but i may add a get_page_html tool in the same way that they're considering: https://github.com/microsoft/playwright-mcp/issues/103
Of course, you're sending data to the AI model, but the "private" aspect is contrasting automating using a local browser vs. automating using a remote browser.
When you automate using a remote browser, another service (not the AI model) gets all of the browsing activity and any information you send (e.g. usernames and passwords) that's required for the automation.
With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.
I think we need to be very careful & intentional about the language we use with these kinds of tools, especially now that the MCP floodgates have been opened. You aren't just exposing the users browsing data to which ever model they are using, you are also exposing it any tools they may be allowing as well.
A lot of non technical people are using these tools to "vibe" their way to productivity. I would explicitly tell them that potentially "all" of their browsing data is going to be exposed to their LLM client and they need to use this at their own risk.
What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:
Cursor is currently stuck using an outdated snapshot of the VSCode Marketplace, meaning several extensions within Cursor remain affected by high-severity CVEs that have already been patched upstream in VSCode. As a result, Cursor users unknowingly remain vulnerable to known security issues.
This issue has been acknowledged but remains unresolved: https://github.com/getcursor/cursor/issues/1602#issuecomment...
Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.
I am surprised that the VSCode team hasn't gone after them for mirroring the marketplace, as the Visual Studio team made it very clear that they don't want anybody to do that -- it is their marketplace.
Not, human tired of creating content to put online and being consumed not by people but by bots or any other form of mechanical consumption that I don't like. As the owner of the content I think I have the right to set that preference, don't you think?
What if you are using domain names for your local environment or a cloud environment like IDX or you want to automate the testing of the UAT environment?
This may be obvious to most here, but you need Node.js installed for the MCP server to run. This critical detail is not in the set up instructions.
So the website claims:
"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."
Yeah, not really.
I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.
The caveat with these things is usually "when used with high quality proxies".
Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.
What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.
Modern captchas use a number of tools including many of the approaches you mentioned. This why you might sometimes see a CloudFlare "I am not a robot" checkbox that checks itself and moves along before you have much time to even react. It's looking at a number of signals to determine that you're probably human before you've even checked the box.
When I am using keyboard navigation, shortcuts and autofills, I seem to get mistaken for a bot a lot. These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.
It might depend on the speed with which you click on the elements on the website.
it does, CF bans my own honest to God clicks if I do them too fast.
About five years ago, maybe more, Google started sending me captchas if I ran too many repetitive searches. I could be wrong, but it feel like most large platforms have fairly sophisticated anti-bot/scraping stuff in place.
I use Vimium (Chrome extension for using keyboard control of the browser) and this happens to me as well since the behavior looks "unnatural".
Must suck for people with assistive software. I get blocked on CF for now damn reason.
Same here. And I am also using vimium.
SSLy the speed clicker
When I go to a shopping website I want to be able to tell my browser "hey please go through all the sideboards on this list and filter out for the ones that are larger than 155cm and smaller than 100cm, prioritise the ones with dark wood and space for vinyl records which are 31.43cm tall" for example.
Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!
Hey, we’re working on MatterRank which is pretty similar to this but currently works on web search. (e.g. I want to prioritize results that talk about X and have Y bias and I want to deprioritize those that are trying to sell me something). Feel free to try it out at https://matterrank.ai
Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?
When doing interior decoration, I am definitely interested in finding objects that fit very specific prompts.
I feel like I slept for a day and now MCPs are everywhere... I don't know what MCPs are and at this point I'm too afraid to ask.
And the worst part is that it opens a pandora's box of potential exploits; https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands...
Most of these are not a real concern with remote servers with Oauth. If you install the PayPal MCP MCP server from im-deffo-not-hacking-you.com than https://mcp.paypal.com/sse its the same sec model as anything else online...
The article also reeks of LLM ironically
At the risk of it sounding like i support theft; the automobile, you know, enabled the likes of Bonnie and Clyde and that whole era of lawlessness. Until the fbi and crossing county lines became a thing.
So im not sure id give up the sum total progress of the automobile just because the first decade was a bad one
I know what you mean, I think MCP is being widely adopted but it's not grassroots.. its a quick entry to this market by an established AI company trying to dominate the mind/market share of developers before consensus can be reached developers.
It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,
and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc
Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images
with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
You actually can, its called Operator and its a complete waste of time, just like 99% of agents/MCPs.
> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc
It seems strange to me to focus on this sort of standard well in advance of models being reliable enough to, ya know, actually be able perform these operations on behalf of the user with any sort of strong reliability that you would need for widespread adoption to be successful.
Cryptocurrency "if you build it they'll come" vibes.
The speed that every major LLM foundational model provider has jumped on this bandwagon feels VERY artificial and astro turfy...
Maybe because the LLM improvements haven't been that good in the last year, they needed some new thing to hype it/market it.
EDIT: Don't get me wrong, the benchmark scores are indeed higher, but in my personal experience, LLMs make as many mistakes as they did before, still too unreliable to use for cases where you actually need a factually correct answer.
This is in my opinion exactly what it is. A bunch of people throwing stuff at the wall trying to show "impact."
I think MCPs compensate for the unreliability issue by providing a minimal and well defined interface to a controlled set of actions. That way, the llm doesn't have to be as reliable thinking what it needs to do and in acting, just in choosing what to do from a short list.
Crazy, in looking up some info on the web and creating a Spreadsheet on Google Sheets to insert the results, it worked almost perfectly the first time and completely failed subsequently on 8-10 different tries.
Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?
I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".
Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.
Same with identifying cells A1, B1, etc. and inserting into the rows.
Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.
Kudos to how smooth this experience is though, very nice setup & execution!
EDIT 2: The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(
Such a rich UI like google sheets seems like a bad use case for such a general "browser automation" MCP server. Would be cool to see an MCP server like this, but with specific tools that let the LLM read and write to google sheets cells. I'm sure it would knock these tasks out of the park if it had a more specific abstraction instead of generally interacting with a webpage
Agreed, I'd been working on a Google Sheets specific MCP last week – just got it published here: https://github.com/mkummer225/google-sheets-mcp
This is cool. You should submit this as a 'Show HN'.
Also consider publishing it so people can use it without having to use git.
What you're experiencing is commonly referred to as "luck". It's the same reason people consistently think newer versions of ChatGPT are nerfed in some way. In reality, people just got lucky originally and have unrealistic expectations based on this originally positive outcome.
There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.
Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.
I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.
Would be nice if it could use the Accessibility Tree from chrome dev tools to navigate the page instead of relying on screenshots (https://developer.chrome.com/blog/full-accessibility-tree)
In fact you have it backwards. It has no screenshots at the moment, only the accessibility tree
I mean no disrespect, but this looks like an outdated clone of https://github.com/microsoft/playwright-mcp
https://github.com/microsoft/playwright-mcp/blob/main/src/to... https://github.com/BrowserMCP/mcp/blob/main/src/tools/tool.t...
From the Browser MCP README.md:
> Credits: Browser MCP was adapted from the Playwright MCP server
Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).
Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).
Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.
So is MCP the new RPA (Robotics Process Automation)? Like generic yahoo pipes?
I just view it as a relative minor convenience, but it's not some game-changer IMO.
The tool use / function calling thing far predates Anthropic releasing the MCP specification and it really wasn't that onerous to do before either. You could provide a json schema spec and tell the model to generate compliant json to pass to the API in question. MCP doesn't inherently solve any of the problems that come up in that sort of workflow, but it does provide an idiomatic approach for it (so there's a non-zero value there, but not much).
The interesting thing about MCP as a tool use protocol is the traction that it has garnered in terms of clients and servers supporting it.
It seems the benefit of MCP is for Anthropic to enlist the community in building integrations for Claude desktop, no?
And if other vendors sign on to support MCP, then it becomes a self reinforcing cycle of adoption.
Yea it certainly does benefit Claude Desktop to some degree, but most MCP servers are a few hundred SLOC and the protocol schema itself is only ~400 SLOC. If that was the only major obstacle standing in the way of adoption, I'd be very surprised.
Coupled with the fact that any LLM trained for tool use can utilize the protocol, it doesn't feel like much of a moat that uniquely positions Claude Desktop in a meaningful way.
> And if other vendors sign on to support MCP, then it becomes a self reinforcing cycle of adoption
This is exactly what's happening now. A good portion of applications, frameworks and actors are starting to support it.
I've been reluctant on adopting MCP in applications until there was enough adoption.
However, depending on your use case it may also be too complex for your use case.
MCP is useful because anthropic has a disproportionate share of API traffic relative to its valuation and a tiny share of first-party client traffic. The best way around this is to shift as much traffic to API as possible.
First party client , meaning browser? User agent or … Electron app, or , any mobile app?
first party client as in a claude subscription will give you access (mostly app + web)
No, since MCP is just an interface layer it is to AI what REST API is to DPA and COM/App DLLs are to RPA.
APA (Agentic Process Automation) is the new RPA, and this is definitely one example of it.
But AI already supported function calling, and you could describe them in various ways. Isn't this just a different way to define function calling?
Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp
Oh good idea.
Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.
Still slightly confused on what MCPs are but looking at this it does look useful
Can you add a license to your code along with open sourcing the chrome extension?
Doesn't work on Windows:
2025-04-07T18:43:26.537Z [browsermcp] [info] Initializing server... 2025-04-07T18:43:26.603Z [browsermcp] [info] Server started and connected successfully 2025-04-07T18:43:26.610Z [browsermcp] [info] Message from client: {"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-ai","version":"0.1.0"}},"jsonrpc":"2.0","id":0} node:internal/errors:983 const err = new Error(message); ^
Error: Command failed: FOR /F "tokens=5" %a in ('netstat -ano ^| findstr :9009') do taskkill /F /PID %a at genericNodeError (node:internal/errors:983:15) at wrappedFn (node:internal/errors:537:14) at checkExecSyncError (node:child_process:882:11) at execSync (node:child_process:954:15)
Can you try again?
There was another comment that mentioned that there's an issue with port killing code on Windows: https://news.ycombinator.com/item?id=43614145
I just published a new version of the @browsermcp/mcp library (version 0.1.1) that handles the error better until I can investigate further so it should hopefully work now if you're using @browsermcp/mcp@latest.
FWIW, Claude Desktop currently has a bug where it tries to start the server twice, which is why the MCP server tries to kill the process from a previous invocation: https://github.com/modelcontextprotocol/servers/issues/812
It's working now with the 0.1.0 for me. But I will let you know if I experience any issues once I get updated to 0.1.1.
Thanks, great job! I like it overall, but I noticed it has some issues entering text in forms, even on google.com. It's able to find a workaround and insert the searched text in the URL, but it would be nice if the entry into forms worked well for UI testing.
I was able to make it work like this:
1. Kill your Claude Desktop app
2. Click "Connect" in the browser extension.
3. Quickly start your Calude Desktop app.
It will work 50% of the time - I guess the timing must be just right for it to work. Hopefully, the developers can improve this.
Now on to testing :)
Why use this over Puppeteer or Playwright extensions?
The Puppeteer MCP server doesn't work well because it requires CSS selectors to interact with elements. It makes up CSS selectors rather than reading the page and generating working selectors.
The Playwright MCP server is great! Currently Browser MCP is largely an adaptation of the Playwright MCP server to use with your actual browser rather than creating a new one each time. This allows you to reuse your existing Chrome profile so that you don't need to log in to each service all over again and avoids bot detection which often triggers when using the fresh browser instances created by Playwright.
I also plan to add other useful tools (e.g. Browser MCP currently supports a tool to get the console logs which is useful for automated debugging) which will likely diverge from the Playwright MCP server features.
by the way, you can indeed access your personal context with Playwright. just `launchPersistentContext()` and set the userDataDir to that of your existing Chrome install:
https://playwright.dev/docs/api/class-browsertype#browser-ty...
Ooo, i like that. one of the most annoying points has been 'not sharing' the browser context. i'll def check it out
Any plans to make a Firefox version?
Browser MCP uses the Chrome DevTools Protocol (CDP) to automate the browser so it currently only works for Chromium-based browsers.
Unfortunately, Firefox doesn't expose WebDriver BiDi (the standardized version of CDP) to browser extensions AFAIK (someone please correct me if I'm mistaken!), so I don't think I can support it even if I tried.
Just found this[0] implementation roadmap on Mozilla's wiki, recently updated too! At least it's actively being worked on.
Not going to lie, this makes me happy.
[0]: https://wiki.mozilla.org/WebDriver/RemoteProtocol/WebDriver_...
In the Task Automation demo, how does it know all of the attributes of the motorcycle he is trying to sell? Is it relying on the underlying LLM's embedded knowledge? But then how would it know the price and mileage? Is there some underlying document not referenced in the demo? Because that information is not in the prompt.
I just run into a bunch of errors on my Windows machine + Chrome when connected over remote-ssh. Extension installed, tab enabled, npx updated/installed, etc.
2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest
2025-04-07 10:57:11.606 [error] rmcp: Client error for command spawn npx ENOENT
2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: spawn npx ENOENT
2025-04-07 10:57:11.606 [info] rmcp: Client closed for command
2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: Client closed
2025-04-07 10:57:11.606 [info] rmcp: Handling ListOfferings action
2025-04-07 10:57:11.606 [error] rmcp: No server info found
---
EDIT: Ended up fixing it by patching index.js. killProcessOnPort() was the problem. Can hit me up if you have questions, I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
> I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.
Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).
Thanks for the report and the update! I'd love to hear about what you changed — how can I get in touch? I didn't see anything in your HN profile. Feel free to email me at admin@browsermcp.io
This one also uses aria snapshots formatted as yaml. This will quickly exceed context limits.
Setting this up for claude desktop and cursor was alright. Works well out of the box with little setup, and I like that it attached to my active browser tab. Keep up the good work.
I literally started working on the same exact idea last night haha. Great work OP. I'm curious, how are you feeding the web data to the LLM? Are you just passing the entire page contents to it and then having it interact with the page based on CSS selectors/xpath? Also, what are your thoughts on letting it do its own scripting to automate certain tasks?
This is really well done! Very cool.
I wonder if it's possible to add such plugins to election apps (e.g.: Slack). It would be such a nice experience if I could just connect my AI of choice to a local app.
Good idea! I'm sure this is possible since it looks like playwright can control electron apps. https://playwright.dev/docs/api/class-electronapplication
election -> Electron
MCP seems to be JavaScript's trojan horse into AI.
"Trojan horse"? 95% of people currently access AI via web or mobile app; those are pretty JS-dominated, no?
This is cool. I'm curious why you chose to use an extension, rather than getting the user to run Chrome with remote debugging turned on?
An extension is more user-friendly! I leave Chrome open basically 24/7 and having to create a new Chrome instance via the command line just to use Browser MCP just felt like too high of a barrier.
Not OP but I suspect it is because of this (mentioned on their page):
'Avoids bot detection and CAPTCHAs by using your real browser fingerprint.'
I don't think remote debugging by itself on a normal chrome profile is detectable
Exposing Chrome CDP is a terrible idea from a security and privacy perspective. You get the keys to the whole kingdom (and expose them on a standard port with a well documented API). All security features of the web can be bypassed, and then some, as CDP exposes even more capabilities than chrome extensions and without any form of supervision.
You're talking about exposing Chrome CDP to the wider internet, right? Or are you highlighting these dangers in the local context?
In the local context as well. Unlike say the docker socket which is protected by default using unix permissions, the CDP protocol has no authorization, authentication or permission mechanism.
Anything on your machine (such as a rogue browser extension or a malicious npm/pypi package) could scan for this and just get all your cookies - and that's only the beginning of your problems.
CDP can access any origin, any data stored (localStorage, indexedDB ...), any javascript heap, cross iframe and origin boundaries, run almost undetectable code that uses your sessions without you knowing, and the list is very long. CDP was never meant to expose a real browser in an untrusted context.
I'm sure its about the cookies/sessions but I do recall you can load cookies from another browser?
I like this. It would be interesting to use it for when I need to use authenticated browser sessions.
Pretty cool, do you know of a version of this that supports the new remote MCP protocol
We work on something similar and aim to be the huggingface hub for automations you can run in your browser[0], with built-in support for MCP SSE.
Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.
0: https://herd.garden
1: https://herd.garden/trails/@herd/browser
2: https://herd.garden/trails/@omneity/serp
Is anyone successfully running MCPs / Claude Desktop on Linux?
So why do I need an editor(Cusror)? How does a non-coder use it?
If you're a non-coder, use it with Claude Desktop.
Do you respect robots.txt so administrators can block this tool?
Should I be blocked if I ask Claude Desktop to lower the prices in all of my Craigslist ads by 10%?
Do user agents doing work for users need to respect robots.txt? If yes, does chrome?
Can these things automatically solve recaptcha? That's the only AI browser feature that I have a real use for.
https://github.com/dessant/buster
How does this compare to Anthropic's Computer Use?
Can u expose the sdk as a react component to be used inside an app ?
awesome! For the Cursor / React / Click to Add 2 example, can we also have it write a unit/e2e regression test?
author replied on Twitter:
> that's a great use case! the aria snapshot that browser mcp generates is enough to write tests for playwright using its role-based locators, but i may add a get_page_html tool in the same way that they're considering: https://github.com/microsoft/playwright-mcp/issues/103
https://x.com/roadtoramen/status/1909356255866733044
works better than puppet mcp for me but having issues with keyboard events and actions on some websites.
> Private > Since automation happens locally, your browser activity stays on your device and isn't sent to remote servers.
I think this is bullshit. Isn't the dom or whatever sent to the model api?
Of course, you're sending data to the AI model, but the "private" aspect is contrasting automating using a local browser vs. automating using a remote browser.
When you automate using a remote browser, another service (not the AI model) gets all of the browsing activity and any information you send (e.g. usernames and passwords) that's required for the automation.
With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.
I think we need to be very careful & intentional about the language we use with these kinds of tools, especially now that the MCP floodgates have been opened. You aren't just exposing the users browsing data to which ever model they are using, you are also exposing it any tools they may be allowing as well.
A lot of non technical people are using these tools to "vibe" their way to productivity. I would explicitly tell them that potentially "all" of their browsing data is going to be exposed to their LLM client and they need to use this at their own risk.
neat, but instead of asking me to install browser extension, can you just bundle a browser in the MCP server?
Thanks but idea is ok but it is not working smoothly.
this is the way
What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:
- https://github.com/mayt/BrowserGPT
- https://github.com/TaxyAI/browser-extension
- https://github.com/browser-use/browser-use
- https://github.com/Skyvern-AI/skyvern
- https://github.com/m1guelpf/browser-agent
- https://github.com/richardyc/Chrome-GPT
- https://github.com/handrew/browserpilot
- https://github.com/ishan0102/vimGPT
- https://github.com/Jiayi-Pan/GPT-V-on-Web
I think this is noteworthy in that it is using what is increasingly becoming the dominant API protocol for LLM.
Just because the wheel exists doesn't mean we shouldn't strive to make it better by applying new knowledge and technologies to it.
[dead]
WARNING for Cursor users:
Cursor is currently stuck using an outdated snapshot of the VSCode Marketplace, meaning several extensions within Cursor remain affected by high-severity CVEs that have already been patched upstream in VSCode. As a result, Cursor users unknowingly remain vulnerable to known security issues. This issue has been acknowledged but remains unresolved: https://github.com/getcursor/cursor/issues/1602#issuecomment...
Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.
I've flagged it here, apologies for the repost: https://news.ycombinator.com/item?id=43609572
I am surprised that the VSCode team hasn't gone after them for mirroring the marketplace, as the Visual Studio team made it very clear that they don't want anybody to do that -- it is their marketplace.
It seems that there is one sane PM left at VScode who knows that such move would only lead to MSFT losing more PR. And anti-trust scrutiny?
Why? This seems fine.
[dead]
Good, just what we needed. More bots browsing the internet. Somedays I think I am not 100% against of every website having a captcha...
Not out of the realm of possibility that this very comment was written by a bot prompted to write a negative response to a given piece of content.
Not, human tired of creating content to put online and being consumed not by people but by bots or any other form of mechanical consumption that I don't like. As the owner of the content I think I have the right to set that preference, don't you think?
Yeah this is definitely a bad English bot
It's a developer tool
Then it should be limited to localhost or something similar.
What if you are using domain names for your local environment or a cloud environment like IDX or you want to automate the testing of the UAT environment?
It can be, just do that when you install it