PayPal Begins Rollout of MCP Servers to Accelerate Agentic Commerce

adminJuly 6, 2025

The following is a repost from the PayPal Developer Blog by Prakhar Mehrotra, SVP of Artificial Intelligence, PayPal

At PayPal, we strive to make it easier for developers to access our services. Today, we are taking the first step to allow developers to embrace the new paradigm of agentic commerce by adopting the Model Context Protocol (MCP) and placing our services on an MCP server. This puts the power of generative AI at our merchants’ fingertips.

MCP is a standard put forward by Anthropic that is being adopted by the world’s leading AI companies to help standardize the way agents access data sources or third-party services, thereby enabling seamless integration–something that is paramount in AI-native, multi-agent systems. Starting today, developers can interact with PayPal’s official MCP server to begin enabling next-generation, AI-driven capabilities for merchants. This includes remote MCP servers available with auth integration on the cloud. With remote MCP support, users can seamlessly continue their tasks across devices with simple logins after the authentication process.

The availability of PayPal’s MCP server will enable a range of conversational AI capabilities for merchants, which we will roll out in the coming months. To illustrate the power of this new technology, we’re starting with PayPal Invoice feature, which is available today to eligible PayPal merchants.

Our First MCP Feature: Invoicing

Developers can now enable merchants who wish to utilize their preferred AI tools — including LLMs — to automatically generate invoices and shareable invoice links to send to their clients within their MCP host. This eliminates the need for merchants to visit the PayPal website or integrate using PayPal APIs for manual invoice creation, making the process of creating an invoice for a customer much faster, more intuitive, and easier to integrate into existing MCP clients.

Let’s say a PayPal merchant needs to create an invoice for a customer. Instead of creating one manually, they may decide to use an AI system that is integrated with PayPal’s MCP endpoint to create an invoice conversationally. The merchant simply should prompt the AI system with plain language, “Create a PayPal invoice link for painting a house with a cost of $450. Add 8% tax and apply 5% discount. Make sure it expires in 10 days.” With PayPal’s MCP, it will create an invoice based on this prompt, thanks to the power of AI.*

How to Get Started

There are two ways of connecting to PayPal’s MCP server:

Local PayPal MCP Server Using the Agentic Toolkit: This option enables developers to download, install, and run the PayPal MCP server locally on their own machines. It supports a wide range of MCP clients, including Claude Desktop and Cursor AI.
Remote PayPal MCP Server: This option opens the door to a broader audience — particularly users who prefer not to install local MCP servers. With remote PayPal MCP support, users have an endpoint that any MCP client can connect to seamlessly, allowing continuity of work across clients with a simple PayPal login.

Visit https://mcp.paypal.com to get started, which includes our MCP developer toolkit on GitHub, to start setting up your local PayPal MCP server or connect to the remote MCP server.

Where We Are Going

With the introduction of the MCP, PayPal is setting the foundation upon which we can build a more intelligent and responsive digital commerce ecosystem. MCP represents our commitment to continuous improvement and innovation, ensuring our developers and merchants are well-equipped to meet the evolving demands of the digital marketplace. And we’re just getting started. We’ll share more soon as we add additional products.

*Disclaimer: PayPal’s MCP server provides access to AI-generated content that may be inaccurate or incomplete. Users are responsible for independently verifying any information before relying on it. PayPal makes no guarantees regarding output accuracy and is not liable for any decisions, actions, or consequences resulting from its use.

PayPal Begins Rollout of MCP Servers to Accelerate Agentic Commerce was originally published in The PayPal Technology Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control

How we measure the impact of user actions and product adoptions at PayPal

In today’s competitive digital landscape, understanding user interactions with your products is essential for driving revenue and building lasting customer relationships. At PayPal, our Data Science teams use causal inference to evaluate the impact of key customer actions, such as adopting a new product or adding a credit card to their wallet, on engagement (measured by Transactions per Account, or TPA), revenue, and margin to help make data-driven strategic decisions.

The direct profit from a product adoption or a user action on the app could be $0 if viewed in isolation. However, that does not necessarily mean that these events are not driving engagement and monetization across the PayPal ecosystem; they can change the user engagement with other PayPal products in such a way that the user starts generating more profit.

To measure the overall impact of user actions or product adoption, we introduced Delta CV (delta in Customer Value), and we defined it as a customer’s incremental profit margin in the first year after adoption of a new product or completing an action. For example, if the average Delta CV for adoption of Crypto is $20, we expect customers who adopt Crypto for the first time to bring an additional $20 in margin on average in the next 12 months after the adoption. We define Delta Revenue (or TPA) in the same manner except we calculate the incremental lift in revenue (or TPA) instead of profit margin.

The concept of Delta CV is very different from customer life-time value (CLV) which estimates the total profit generated by a customer over the course of their relationship with PayPal. Delta CV gives us a wholistic view on how new adoptions affect the engagement and value of an existing PayPal user.

Adoption of a new product or completing an action can increase the customer’s value in a few ways:

The product itself may be a profit generating (Direct effect). For example, paying with PayPal when checking out at a merchant’s online website drives direct margin for PayPal.
Actions do not necessarily lead to profit directly, but they facilitate the usage of other products that generate profit (Halo effect). For instance, a customer adding a credit card to PayPal’s digital wallet can reduce future frictions or result in a higher conversion rate for that user paying with PayPal in the future.
Adoption of a product can increase the user engagement with PayPal, leading to adoption or increased usage of other products that generate profit (Halo effect). With Delta CV, we measure the cumulative lift in customer value considering all PayPal transactions of a user.

Today we estimate Delta CV for 40+ products (or actions) in multiple regions. Having Delta CV for different products helps us in a variety of areas such as strategic decision making, calculating the return on investment (ROI) of campaigns, opportunity sizing for new campaign efforts, in-app product ranking and placement, making trade-off between resource allocations, making ramp decisions on product launches, and so on.

Methodology

We measure Delta CV using causal inference and synthetic control. For each product, our treatment group are the adopters of the product for the first time in each quarter. To create a synthetic control group, we focus on users who never adopted the product of interest. Then we find matches for the treatment users inside the control group based on a set of transactional features calculated over 12month pre-adoption. Since we are building a synthetic control and our target variable is CV, we should always match on CV in pre-period. The remaining of our matching features are important covariates of CV. They capture user characteristics and are our best predictors of users’ CV response to external and internal changes.

Measurement of treatment effect (lift) using synthetic control for a given cohort

The synthetic control group acts as a counterfactual, meaning that we assume in the absence of an intervention, control and treatment group would change similarly over time. Therefore, if we introduce a change to the treatment group but not to the control group, the difference in the profit margin of the two groups measures the impact of the intervention.

We select the synthetic control group by matching on our group of features using KNN (K nearest neighbors) algorithm. Every user in treatment will have a synthetic control that is the average of up to 10 users from control. We define a threshold for the Euclidean distance between the treatment and control units, and we remove the matches that exceed this threshold to ensure a high quality of matching. The validity of synthetic control group selection can be checked by a bias analysis.

Creating synthetic control using KNN algorithm

Interpretation of Delta CV and Caveats

The incremental lift by adoption of a new product or completing an action can be a highly skewed metric. Some users may bring in significantly more value than others after adoption of new products. Median delta CV is more in line with the expected incremental CV from a typical user, while mean delta CV reflects the financial lift that is expected at scale on average per user.
Delta CV estimations are subject to variance and bias. It is important to consider the accuracy of estimations when making trade-off decisions based on the model’s output.
Delta CV model estimates the historical impact of adoptions, so there is always a lag. We generally measure the impact over 12 months after adoption, so there will be a 12-month lag between our estimation and the quarter in which adoption occurred. Sometimes we use delta CV to understand past behavior, and other times we use it as our best estimation for what happens in the future. We can reduce the measurement period from 12 months to say three months for quick readouts, but we know that the early lift can be skewed from novelty or immediate use cases of customers and the CV gap between the treatment and control groups reduces over time.
In an ideal causal inference scenario, the treatment group undergoes an intervention or change, while the control group does not. However, in our setup, both treatment and control groups may adopt other products during the post-period. It is tempting to assume that other adoptions in the treatment and control groups occur randomly and that their impacts cancel each other out. However, our data shows that in the treatment group, there is a phenomenon of chain adoptions, where certain products are adopted more frequently together than in the control group. Not every user in the treatment or control group adopts multiple products but certain product adoptions are more prevalent in the treatment group. Therefore, Delta CV measures the cumulative effect of a product adoption along with its frequent preceding and subsequent adoptions at scale within the same quarter.
Note that excluding users who adopt other products during adoption period or in the post-period results in very specific, unusual treatment and control groups who do not adopt any product or complete any action within 12 months and do not represent our user base. We have no interest in limiting the Delta CV estimation scope to these specific users.
Delta CV is not an additive metric meaning that adoptions of two products during the same period does not result in a total Delta CV that equals the sum of Delta CV of both. As mentioned earlier, Delta CV is not an immaculate metric; it captures the lift due to other frequent product adoptions by a portion of the treatment users during the same period as well. But more importantly, user engagement does not linearly increase with each new product adoption; therefore, Delta CV cannot be treated as an additive metric.
Sometimes we cannot find high quality matches for the treatment. The control condition of “never adopted the product” is very limiting for some of our products, especially in markets where we have many high-engaged users. This results in a small synthetic control group size and low quality of matching. We flag the reliability of Delta CV when the average difference between CV of treatment and synthetic control in pre period is larger than $1. This is an important piece of information that is provided along with Delta CV for every product.

End Note

PayPal products are rapidly evolving to provide the best value and experience to customers. While checkout was PayPal’s first product, we now offer an extensive variety of financial products including peer-to-peer payments, debit card, credit card, rewarding shopping experiences with cashback, and much more, all within the PayPal App. Delta CV has been an integral part of strategic decision making in PayPal. Adding new products to the scope of Delta CV, as well as continuously adjusting the matching methodology, is an ongoing effort. Reducing the estimation biases by improving the selection of matching features is another area for improvement.

Estimating Incremental Lift in Customer Value (Delta CV) using Synthetic Control was originally published in The PayPal Technology Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

The “libra scales in the cloud” figure shows the computational resources equilibrium of a big CPU-cluster to a small GPU-cluster

Leveraging Spark 3 and NVIDIA’s GPUs to Reduce Cloud Cost by up to 70% for Big Data Pipelines

By Ilay Chen and Tomer Akirav

At PayPal, hundreds of thousands of Apache Spark jobs run on an hourly basis, processing petabytes of data and requiring a high volume of resources. To handle the growth of machine learning solutions, PayPal requires scalable environments, cost awareness and constant innovation. This blog explains how Apache Spark 3 and GPUs can help enterprises potentially reduce Apache Spark’s jobs cloud costs by up to 70% for big data processing and AI applications.

Our journey will begin with a brief introduction of Spark RAPIDS — Apache Spark’s accelerator that leverages GPUs to accelerate processing via the RAPIDS libraries. We will then review PayPal’s CPU-based Spark 2 application, our upgrade to Spark 3 and its new capabilities, explore the migration of our Apache Spark application to a GPU cluster, and how we tuned Spark RAPIDS parameters. We will then discuss some challenges we encountered and the benefits of the updates.

Background

GPUs are everywhere, and their parallelism characteristics are perfect for processing AI and graphics applications, among other things. For those unfamiliar: what makes GPUs different from CPUs, computation-wise, is that CPUs have a limited amount of very strong cores, whereas GPUs have thousands, or even tens of thousands or more, relatively weak cores that work together very well. PayPal has been leveraging GPUs to train models for some time now, and so we decided to evaluate if the parallelism of the GPU can be helpful with processing big data applications based on Apache Spark.

In our research, we encountered NVIDIA’s Spark RAPIDS open-source project. It has many purposes, however we focused on Spark RAPIDS’s cost reduction potential, because enterprises like PayPal spend lots of money on running Spark jobs in the cloud. Using Spark with GPUs isn’t common in the industry yet, but according to our findings as described in this blog, the potential benefits could be enormous.

What is Spark RAPIDS?

Spark RAPIDS is a project that enables the use of GPUs in a Spark application. NVIDIA’s team adapted Apache Spark’s design to harness the power of GPUs. It is beneficial for large joins, group by, sorts, and similar functions. Spark RAPIDS can boost the performance of certain workloads, which we’ll discuss later in the blog their identification process. You can review the documentation here for more details.

There are a few reasons to use Spark RAPIDS to accelerate big data processing with GPUs: GPUs have their own environment and programming languages, so we can’t easily run Python/Scala/Java/SQL code on them. You must translate the code to a GPU programming language, and Spark RAPIDS does this translation in a transparent way. Another cool design change that Spark RAPIDS has made is how Spark handles the tasks in each stage of the job’s Spark plan. In pure Spark, every task of a stage is sent to a single CPU core in the cluster. This means that the parallelism is at the task level. In Spark RAPIDS, the parallelism is intra-task, meaning the tasks are parallelized as well as the processing of the data within each task. The GPU is a strong computation processor, which gives us incentives to manipulate our job to be more compute-bound, hence to work with large partitions.

Task level parallelism vs data level parallelism, provided by NVIDIA

For more information and thorough explanations, we recommend reading NVIDIA’s book, Accelerating Apache Spark 3.

Getting Started

Our initial experiment with Spark RAPIDS was successful in the PayPal research environment, which is an open environment with access to the web but with limited resources and without production data. The next step was to take the accelerator to production in order to measure real production applications.

According to Spark RAPIDS documentation, not all jobs are a good fit for this accelerator, so we worked on finding the most relevant ones. We started with a Spark 2 (CPU cluster) job that handles large amounts of data (multiple of ~10TB inputs), executes SQL operations on exceptionally large datasets, uses intense shuffles, and requires a fair number of machines. The job was predicted to have high success rate based on NVIDIA’s Qualification Tool, which analyzes Spark events from CPU based Spark applications to help quantify the expected acceleration of migrating a Spark application to a GPU cluster.

As explained above, we understood that in order for the GPU to be well-leveraged, we had to manipulate our Spark job to work with large partitions. Our objective of working with large partitions is to manipulate our queries and operations to be more computation-bound, rather than I/O or network-bound, thus utilizing the GPU in an effective way.

In order to manipulate our job to work with large partitions, we changed two parameters: The AQE (Adaptive Query Execution) parameter, which is a new optimization technique in Spark 3 that, among other things, adjusts the number of partitions in a shuffle stage such that each partition will be in a certain size. The second parameter is spark.sql.files.maxPartitionBytes, which handles the input partition’s size. The number of partitions in those shuffle/input stages affects many accompanying stages as well.

For the baseline run, we did not set the spark.sql.files.maxPartitionBytes parameter, so the Spark plan used the default value of 128MB. Now let us see how the original stage of reading the large input looks like in the Spark UI:

As you can see, we got 9.5TB of data as an input, Spark divides it into ~185,000 partitions(!) which means that every partition is around 9.5TB/185,000 = 50MB. The input files size is around 1GB, it does not make sense for us to divide each file into 20 different partitions in the Spark cluster. This separation causes many network communication overheads and results in a longer latency at this stage.

Now, after setting the spark.sql.files.maxPartitionBytes parameter to 2GB (where we manipulate Spark to read larger input partitions and thus work with larger partitions in the next stages), let us see how the stage was affected:

Our 9.5TB was distributed to 10,000 partitions, which is nearly 20 times fewer partitions than the baseline run, and it resulted in the decrease of the total time to 40 minutes, which is a 30% reduction in runtime.

Now, let us look at all the heaviest input stages of our baseline run, where spark.sql.files.maxPartitionBytes is set to default:

After setting spark.sql.files.maxPartitionBytes to 2GB:

As we can see, the change lowered the number of tasks in the input processing stages, this simple parameter change resulted in reducing the runtime of these stages by more than 20 minutes.

Spark 3 and AQE

To migrate our job to Apache Spark 3, a fair number of steps had to be taken. We had to update some syntax in our code, and each jar of our infrastructure and applications had to be compiled with an updated Scala version. You can review the official migration guide.

In Spark 3, the ability to use GPUs was added and the AQE optimization technique was enabled. As mentioned above, the goal is to manipulate Spark to work with large partitions which means applying AQE to at least 1GB (or reducing the spark.shuffle.partitions number). In order for Spark application to work with partitions of 1GB, these properties need to be configured:

"spark.sql.adaptive.enabled" : "true"
"spark.sql.adaptive.advisoryPartitionSizeInBytes" : "1GB"

As we can see below, in our use case, this kind of practice is beneficial in runtime terms:

A shuffle stage in our baseline run (no AQE):

A shuffle stage with AQE:

After tuning the candidate job to work with large partitions, we checked the cluster‘s utilization and saw it was not fully utilized, so we could try to reduce the amount of machines the application consumes. The baseline job is with 140 machines and after tuning Spark and the cluster nodes, we ended up with 100 machines that were fairly utilized. This change only slightly affected the runtime of the job, but dramatically reduced the cost!

The intermediate result:
We cut ~20% of our application runtime and ~30% of our resources, resulting in a ~45% cost reduction!

As an example, If the initial cloud usage cost was 1,000 PYUSD, so right now we would potentially stand at around 550 PYUSD!
Chart of CPU runs:

Overall, our first intention was to work with large partitions solely to benefit from the GPUs but as we can see, there is a significant performance boost even before using Spark RAPIDS, which is exciting!
(Disclaimer: This practice does not bring the same results for all jobs. It depends on the data and the operations you do with it.)

So far, we just prepared our job to be suitable for Spark RAPIDS and GPUs, now the new challenges began — migrating to GPU cluster, learning new tuning concepts, troubleshooting and optimizing GPU usages.

Migration to GPU Cluster

The GPU migration included enabling the Spark RAPIDS init scripts, copying all their dependencies into PayPal’s production, supporting GPU parameters in our internal infrastructure, learning the GPU cluster features of our cloud vendor and more.
(Disclaimer: These days, cloud vendors release new, custom images with a built-in instance of Spark RAPIDS, so this work can be saved.)

After running some simple jobs, making sure we created a stable and reliable infrastructure where the GPU clusters run Spark RAPIDS as expected, we deep-dived into running our candidate production application with it. Thanks to the Spark RAPIDS documentation, we triaged the few runtime errors we encountered while tuning it for our needs. Let us quickly cover two issues that helped us understand the Spark RAPIDS tuning better:

Could not allocate native memory: std::bad_alloc:
RMM failure at: arena.hpp:382: Maximum pool size exceeded

The meaning behind this error is that the GPU memory pool was exhausted. To resolve this issue, some pressure is needed to be released from the GPU’s memory. After reviewing the literature, it was clear that some configurations are critical for each job. for example:
spark.rapids.sql.concurrentGpuTasks — meaning the number of tasks that the GPU handles concurrently.

Intending to maximize the performance of our execution, we aimed to run in parallel as many tasks as possible. We were over-ambitious at first and set this parameter too high, and immediately got the above error. It happened because we use Tesla T4 GPUs, that have only 16GB of memory. As a check, we set the spark.rapids.sql.concurrentGpuTasks parameter to 1, and noticed that there are no memory errors. In order to utilize our resources properly, we had to find the sweet spot of the GPU concurrency parameter. To find that, we looked at the GPU utilization metrics, which we will explain later in the blog, and aimed the utilization to be around 50% — advised to us by NVIDIA’s team in order to have a fair division between the GPU computation and its communication/data transfer with the main memory. In our case, after some trial and error, we settled with running 2 tasks at a time, meaning setting spark.rapids.sql.concurrentGpuTasks = 2.

Another interesting issue we encountered was with runtime performance and stability. After reducing the number of machines in our cluster, from 140 to 30 machines, our Spark job was slower than expected and occasionally failed with the following prompt:

java.io.IOException: No space left on device

We looked deeper into our nodes and noticed that when we added the GPUs to the machines, we were able to solve the computation bottleneck, but the “pressure” moved to the local SSDs. This is because our GPUs with low memory capacity tend to swap memory onto local disks. The fact that our Spark plan is using large partitions adds to the disk spill. Originally, when each node had 4 SSDs (of 375GBs), we found that our job was slower than we expected, and sometimes even failed. To overcome this issue, we doubled the amount of our SSDs to 8, got stable results and better performance. Adding local SSDs is relatively cheap in cloud vendors, so this solution didn’t really affect our overall cost.

All interactions with local SSDs are much slower than main memory access. A critical parameter for this case is:
spark.rapids.memory.host.spillStorageSize — the amount of off-heap host memory to use for buffering spilled GPU data before spilling to local disks.

Increasing the spill storage parameter to 32GB decreased our job’s runtime.

Spark RAPIDS Optimizations: Tips and Insights

Choosing NVIDIA’s Tesla T4 GPU: Among NVIDIA’s GPUs, we found that the Tesla T4 generally has the best performance/price ratio for this kind of computation, recommended to us by NVIDIA’s team for the purpose of cost reduction. (Disclaimer: The new L4 GPU may give better results.)
Considering memory overhead: Keep in mind that the GPU does not work with the executor’s memory, but with off-heap memory, hence we have to guarantee enough memory overhead for each executor. We set the memory overhead to 16GB.
Tuning spark.task.resource.gpu.amount: This parameter limits the number of tasks that are allowed to run concurrently on an executor, whether those tasks are using the GPU or not. At first we were greedy and tried to assign a lot of tasks to each executor. It slowed the stage’s runtime because of excessive I/O and spilling. In our case, we found that 0.0625 (1/16) was a good spot.
Using spark.rapids.memory.pinnedPool.size: Pinned memory refers to memory pages that the OS will keep in system RAM and will not relocate or swap to disk. Using pinned memory significantly improves performance of data transfers between the GPU and host memory. We set this parameter to 8GB.
Configuring NVME Local SSDs: The disks in the Spark RAPIDS cluster were configured to use the NVME protocol, resulting in 10% speedup.

With stronger compute power, we allowed ourselves to challenge the cluster and reduce the number of machines. After some trial and error, we settled the GPU cluster to run with 30 machines of 32 cores, 120GB RAM, 8 SSDs and 2 Tesla T4 GPUs each, lasting for 1.3 hours.

Spark RAPIDS Final Tuning

GPU Utilization

Our cloud vendor provided a tool/agent that extracts metrics such as GPU utilization and GPU memory from GPU VM instances. This allowed us to monitor the usage of our GPUs, which is crucial to identify underutilized GPUs and optimize our workloads.

Final Cost Comparison

Below we can find a summary of our research findings:
As an example, consider a job that costs 1,000 PYUSD, Spark 3 with GPUs reduces that cost to 300 PYUSD. Depending on the configuration, you can enjoy potential cost savings of up to 70% for processing large amounts of data using GPU Clusters.

Key Learnings

GPUs can be effectively leveraged not only for training AI models, but for big data processing as well.
Spark jobs that consume large amounts of data to perform certain SQL operations on large datasets are good candidates to be accelerated with Spark RAPIDS. Their eligibility can be validated with NVIDIA’s Qualification Tool.
Certain workloads benefit from being compute-bound, which can be achieved by manipulating the Spark job to work with large partitions, via spark.sql.files.maxPartitionBytes and the AQE parameters.
Leveraging Spark 3 with GPUs and Spark RAPIDS can significantly reduce your cloud costs for eligible workloads.

Thoughts for the Future

The potential of running Spark RAPIDS with an autoscaling GPU cluster is highly regarded by us. This practice may significantly reduce the costs of major GPU machines due to their lower spot prices compared to permanent instances.

Acknowledgments

Thanks to the significant contributions of Lena Polyak, Neta Golan, Roee Bashary, and Tomer Pinchasi for the project’s success. Thanks so much to NVIDIA’s Spark RAPIDS team for supporting us.

Leveraging Spark 3 and NVIDIA’s GPUs to Reduce Cloud Cost by up to 70% for Big Data Pipelines was originally published in The PayPal Technology Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

PayPal Releases Agentic Toolkit to Accelerate Commerce

The following is a repost from the PayPal Developer Blog.

Building on the release of PayPal’s MCP servers, PayPal is excited to introduce the PayPal Agentic Toolkit*. This toolkit empowers developers to seamlessly integrate PayPal’s comprehensive suite of APIs — including those for managing orders, invoices, disputes, shipment tracking, transaction search and subscriptions — into various AI frameworks. With PayPal Agentic Toolkit, developers can now build sophisticated agentic workflows that handle financial operations with intelligence and efficiency.

Overview of PayPal Agent Toolkit

The PayPal Agentic Toolkit is a library designed to simplify the integration of PayPal’s core commerce functionalities into AI agent workflows. By providing a structured and intuitive interface, the toolkit bridges the gap between PayPal’s powerful APIs and modern AI agent frameworks, allowing agents to perform tasks such as creating and managing orders, generating and sending invoices, and handling subscription lifecycles. It eliminates the need for developers to manually integrate with API calls and data formatting by offering pre-built tools and abstractions to streamline these interactions.

Key Features of the Toolkit

Easy integration with PayPal services, including functions that correspond to common actions within the orders, invoices, disputes, shipment tracking, transaction search and subscriptions APIs, eliminating the need to delve deep into the specifics of each API endpoint.
Compatibility with popular frameworks and languages, including compatibility with leading AI agent frameworks, such as Model Context Protocol (MCP) servers and Vercel’s AI SDK, ensuring broad applicability. The toolkit currently supports the Typescript programming language with Python support coming soon.
Extensibility, allowing developers to build upon core PayPal functionalities, and enable use with 3rd party toolkits that allows for complex, multi-step operations tailored to their specific agentic use cases.

Merchant Use Cases

The integration of PayPal APIs with AI frameworks through the Agentic Toolkit enables developers to empower businesses with their own agents, facilitating seamless connections to PayPal services for process workflows and task generation for use cases. These include:

Order Management and Shipment tracking: AI agents can be designed to create orders based on user requests, handle payments and track their shipment status. Developers could design a customer service agent to leverage the PayPal toolkit to process an order and complete a payment via PayPal with buyer authentication. For example, an agent could create an order in PayPal when a user clearly confirms their purchase through a conversational interface, and the user authenticates the payment by logging in to their PayPal account.
Intelligent Invoice Handling: Agents can generate invoices based on predefined templates or dynamically created parameters, send them to customers, track their payment status, and send reminders for overdue payments. Imagine an AI assistant that generates invoices upon completion of a service, using natural language instructions to define the invoice details. For example, an AI assistant could generate a PayPal invoice based on the details of a service provided and email it to the client.
Streamlined Subscription Management: AI agents can manage the entire subscription lifecycle, from creating new products, subscription plans and processing recurring payments via PayPal supported payment. A membership management agent could use the toolkit to enroll new members (with their consent) in a PayPal subscription plan. For example, a membership agent could use the toolkit to set up a recurring PayPal payment when a new user signs up for a service and approves the payment.

Get Started

Go to our public GitHub repo to try out all the use cases offered by PayPal toolkit, refer to the details for installation and usage here.

*Disclaimer: PayPal Agentic Toolkit provides access to AI-generated content that may be inaccurate or incomplete. Users are responsible for independently verifying any information before relying on it. PayPal makes no guarantees regarding output accuracy and is not liable for any decisions, actions, or consequences resulting from its use.

PayPal Releases Agentic Toolkit to Accelerate Commerce was originally published in The PayPal Technology Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.