

Blockchain address risk scoring helps assign a numerical risk value (0–100) to wallets, transactions, or smart contracts, identifying potential ties to suspicious activities like scams or sanctioned entities. This is critical for businesses handling stablecoins like USDC or USDT to ensure compliance and avoid financial penalties.
Here’s how it works:
Data Sources: Combines on-chain activity, compliance lists (e.g., OFAC), and behavioral patterns.
Key Features:
Graph Analysis: Tracks address connections and transaction patterns using methods like Node2Vec.
Behavioral Metrics: Flags anomalies like dormant wallets reactivating or high-risk interactions.
Policy Rules: Screens for sanctions, address poisoning, and other predefined risks.
Model Training: Uses machine learning (e.g., XGBoost) to combine graph and behavioral data for risk predictions.
Outputs: Generates scores (0–100) with categories for low, medium, and high risk, alongside detailed risk dossiers.
Deployment: Integrated into payment workflows with real-time checks, audit logging, and compliance reporting.
This system reduces fraud by up to 30%, saves millions in losses, and supports compliance teams with actionable insights.

Blockchain Address Risk Scoring Model: 5-Step Implementation Framework
Using Blockchain Breadcrumb Analytical Tool: Trace, Visualize & Analyze Cryptocurrency Transactions
Data Sources for Risk Scoring Models
Building accurate risk scoring models depends on integrating three key data layers: on-chain activity, external compliance information, and behavioral signals. Together, these layers provide a comprehensive view that blends blockchain activity, regulatory data, and user-specific behaviors. This combination is essential for pre-sign risk evaluations of stablecoin payments. Let’s dive into how on-chain data forms the backbone of these assessments.
On-Chain Transaction Data
At the heart of any blockchain analysis lies the ledger itself. Think of addresses as nodes and transactions as directed edges, creating a network that reveals patterns like "peeling chains", often used to obscure the origins of funds.
For Ethereum and other EVM-compatible blockchains, raw transaction data can be overwhelming. Instead, focusing on "Transfer" logs from smart contracts is more effective. This method pinpoints actual economic participants, bypassing intermediaries like relayers or meta-transactions. For instance, in December 2023, Coinbase's RiskSEA system analyzed 270 million Ethereum addresses by combining event logs with dynamic graph embeddings. This approach successfully blocked high-risk transactions and saved millions.
"By leveraging the node2vec embeddings, we can identify and analyze these dense clusters of addresses that exhibit similar transaction patterns. This enables us to detect and classify potential malicious addresses based on their association within these tightly knit groups."
– Ayush Agarwal, Machine Learning Team, Coinbase
For stablecoins, it's critical to track "freeze" events and interactions with blacklisted addresses. In 2025, stablecoins were linked to 84% of crypto fraud volumes, with freeze actions immobilizing billions in USDT and USDC.
External Compliance Data
External compliance data acts as a reality check for risk scoring models, offering "ground truth" labels that help define risky behavior. This includes global watchlists from entities like OFAC, the EU, and the UN, as well as databases covering over 1 million entities, such as exchanges, mixers, darknet platforms, and private users.
These compliance sources serve two main purposes. First, they flag addresses directly tied to sanctioned entities. Second, they help calculate taint metrics, which measure the proportion of funds linked to flagged sources, even across multiple transaction hops. In 2024 alone, approximately $51 billion was laundered through cryptocurrencies, accounting for 0.14% of all on-chain transactions.
Additionally, IP screening helps identify transactions originating from sanctioned jurisdictions.
Behavioral Features
Behavioral features add an extra layer of insight by analyzing the specific activity patterns of each blockchain address. These features are typically divided into two categories: "Individual risk" (based on the address's own actions) and "Interaction risk" (focused on its connections with other high-risk entities).
For example, wallets that reactivate after 180 days of dormancy or new wallets with fewer than three transactions are often flagged as high risk. Behavioral anomalies, such as sudden spikes in activity or the reactivation of long-dormant accounts, can highlight risks that structural graph data might overlook. Similarly, a first-time interaction between two addresses carries more risk than a relationship with an established transaction history of three or more interactions.
Platforms like Stablerail (https://stablerail.com) use these behavioral signals for pre-sign checks. These checks detect anomalies, such as irregular payout patterns, unusual transaction timing, or amounts that deviate from expected behavior, helping finance teams identify suspicious payments before execution.
Feature Engineering for Risk Scoring
Once you've gathered on-chain data, compliance lists, and behavioral signals, the next step is to transform this raw information into numerical features for machine learning. This process, known as feature engineering, helps uncover structural relationships and patterns of suspicious activity. By applying graph and statistical methods, raw data can be shaped into meaningful inputs.
Graph Embedding Techniques
Graph embeddings are a way to represent blockchain networks as numerical vectors, capturing an address's position and relationships within the ecosystem. A popular method here is node2vec, which treats blockchain addresses as nodes and transactions as edges, generating embeddings that reflect how closely connected these addresses are.
In December 2023, Coinbase implemented a risk scoring system using node2vec embeddings to analyze Ethereum's 270 million addresses. This system, developed by Ayush Agarwal and the Coinbase ML team, processed embeddings through MapReduce and successfully flagged fraudulent addresses that had never interacted with Coinbase before. This effort protected millions of dollars in user funds. The breakthrough came from identifying how malicious actors often operate within tightly connected clusters of supporting addresses - patterns that graph embeddings reveal.
For massive networks, dynamic node2vec provides a more efficient approach. Instead of recalculating embeddings for the entire graph after every new transaction, it updates only the affected parts. In October 2024, researchers from Coinbase and Duke University showcased how dynamic node2vec handled a graph of 266 million nodes, assigning normalized risk scores between 0 and 1 to Ethereum addresses. To address the challenge of new addresses not included in the initial training data, embedding propagation extends embeddings from known nodes to their immediate neighbors, solving the cold start issue.
Transaction-Based Features
Graph embeddings focus on structural relationships, but transaction-based features dive into activity patterns. These include metrics like transaction volume, frequency, amounts, and timestamps for both incoming and outgoing transfers. Temporal trends are particularly telling - wallets with limited or dormant histories often raise red flags. Similarly, analyzing the interaction history between specific sender-recipient pairs can help differentiate risky first-time transactions from established relationships with multiple prior interactions.
Another important metric is distance-to-malicious, which measures how many transaction hops separate an address from known fraudulent entities. Addresses within 0–2 hops of a malicious actor are flagged as high risk, while those 4 or more hops away (or unconnected) are considered lower risk.
"Combining both behavioral and node2vec features boosts the classification performance significantly."
– Ayush Agarwal, Machine Learning Team, Coinbase
Policy-Driven Features
Policy-driven features rely on predefined rules to generate binary risk flags. Unlike probabilistic outputs from machine learning models, these features provide clear, immediate signals based on compliance rules and known attack patterns. For example, address poisoning detection uses 4-character prefix/suffix matching to identify spoofing attempts where attackers create addresses that visually mimic legitimate ones. Sanctions screening checks addresses against OFAC, EU, and UN watchlists, automatically assigning the highest risk score for any matches. Additionally, IP screening flags transactions originating from sanctioned jurisdictions.
These policy features are especially helpful when dealing with new addresses that lack sufficient transaction history for behavioral analysis. They also offer understandable context for compliance teams. For instance, Stablerail (https://stablerail.com) uses policy-driven features to generate plain-English risk alerts before payments are signed, highlighting situations like "Recipient is a new wallet" or "Weekend transfer over $10,000 requires additional approval." In live systems, a single high-risk policy flag often acts as a maximum risk override, ensuring that critical risks, such as sanctions violations, trigger immediate action regardless of other indicators.
Building and Training the Model
Once features are engineered, the next step is to integrate them into a machine learning model designed to predict risk scores. This process involves combining different types of data, selecting a suitable algorithm, and training the system to identify patterns of suspicious activity. By doing so, the model supports pre-transaction decision-making for stablecoin payments, enabling automated and reliable risk detection.
Combining Graph and Behavioral Features
An effective risk scoring model relies on combining graph embeddings with behavioral metrics into a single input vector. Graph embeddings analyze the blockchain network's structure, identifying how addresses group together, while behavioral metrics focus on individual transaction patterns, such as frequency and volume, to provide deeper insights into activity.
A great example of this approach is Coinbase's RiskSEA system. Using a distributed MapReduce framework, the team generated Node2Vec embeddings for Ethereum's 270 million addresses. These embeddings were then combined with behavioral metrics like transaction frequency, volume, and timestamps. The result? A classifier capable of identifying fraudulent addresses with improved accuracy.
When it comes to algorithms, tree ensemble models like XGBoost or Random Forest often outperform Graph Neural Networks (GNNs) in the context of stablecoins. Privacy protocols and transaction mixers fragment networks, making it difficult for GNNs to detect meaningful patterns. Research from Duke University in 2024 demonstrated that tree ensembles achieved higher Macro-F1 scores than GNNs when analyzing USDT and USDC flows on Ethereum.
For systems managing billions of addresses, incremental training is crucial. By updating only the affected graph regions instead of retraining the entire model, this method keeps the system up-to-date without incurring excessive computational costs. This streamlined approach ensures that the model’s outputs integrate smoothly into decision-making processes.
Risk Scoring Outputs
Once trained, the model generates a numerical score that reflects the likelihood of illicit activity. These scores are typically normalized to a 0–1 or 0–100 scale, making it easy to interpret and categorize risk levels.
Risk Level | Score Range (0-100) | Meaning/Action |
|---|---|---|
Low | 0 - 30 | Minimal risk of malicious activity; generally safe |
Medium | 31 - 70 | Moderate risk; further investigation recommended |
High | 71 - 100 | High likelihood of illicit behavior; proceed with extreme caution |
Automated systems often adopt a maximum risk approach. If a single risk factor - such as a link to a sanctioned entity - triggers a high-risk flag, the entire transaction is classified as high risk, regardless of other low-risk indicators. This ensures a conservative approach to risk assessment.
After generating risk scores, the system compiles detailed, actionable insights into risk dossiers.
Automated Risk Dossiers
Modern risk scoring systems go beyond producing raw scores - they create detailed risk dossiers to explain flagged addresses. These dossiers highlight specific risk factors, provide graph-based insights, and include plain-English explanations that compliance teams can easily interpret and auditors can verify.
For instance, the Range Risk API generates dossiers with detailed flags such as "new_wallet_recipient" or "malicious_connection_sender_high." These are paired with explanations like "Sender is 2 hops away from known malicious addresses". Similarly, Scorechain AI enhances these dossiers with expert-level analysis, offering compliance recommendations aligned with FATF guidelines.
Platforms like Stablerail (https://stablerail.com) take it a step further by delivering dossiers with clear verdicts - PASS, FLAG, or BLOCK - accompanied by plain-English justifications referencing specific policy clauses and timestamps. This transforms raw model outputs into actionable intelligence that fits seamlessly into existing workflows. CFOs and compliance teams can confidently use these dossiers to justify decisions to auditors and regulators.
A critical aspect of these dossiers is their automatic inclusion in transaction logs. This creates an audit trail that simplifies regulatory compliance and streamlines internal reviews.
Deploying the Model in Production
Taking a trained model and making it part of a live system demands thoughtful planning. You need to focus on data pipelines, workflow integration, and ensuring compliance. The aim is to incorporate risk scoring into the transaction approval process without slowing down legitimate payments or leaving gaps in your audit trail.
Data Aggregation and Training
Handling blockchain-scale data requires distributed processing. To keep up with the demands of production, incremental training is used over distributed frameworks. This method updates only the blockchain regions impacted by new transactions, ensuring scalability while managing on-chain data effectively.
For corporate treasury systems that use MPC-based wallets, data aggregation must carefully respect custody boundaries. The risk model combines on-chain transaction data with off-chain signals, such as sanctions lists and threat intelligence feeds. Distributed frameworks like MapReduce process this combined data efficiently. Incremental training ensures the model stays current without the need to reprocess the entire blockchain, saving on computational resources.
Workflow Integration
Once the data is aggregated and the model is updated, the next step is embedding these insights into real-time workflows. Risk assessment APIs typically process queries within 1,000 to 3,000 milliseconds for supported networks, ensuring they can be part of transaction approval processes without noticeable delays.
Platforms like Stablerail (https://stablerail.com) integrate risk scoring directly into the transaction intent process. Before a payment is signed, the system queries the risk model and generates a Risk Dossier. This dossier provides a verdict - PASS, FLAG, or BLOCK - along with plain-English explanations tied to specific policies and timestamps. For transactions with high risk (scores above 70), the system can trigger multi-factor authentication, delay execution for manual review, or block the payment altogether.
The system operates on a maximum risk principle: if any factor flags as high risk, the transaction is escalated immediately for review. This cautious approach helps prevent approval of potentially dangerous payments while maintaining a seamless integration into workflows.
Audit Trails and Compliance
Every risk assessment is logged in detail, creating a comprehensive audit trail. These logs include risk scores, contributing factors, and final decisions, along with original request parameters like sender/recipient addresses, amounts, network details, and timestamps. This level of detail supports data verification and ensures compliance with regulatory standards.
Such records bridge the gap between real-time risk scoring and regulatory requirements, ensuring every decision is fully traceable. Stablerail, for example, logs every step - from intent creation and risk checks to flags, approvals, overrides, and signing - into a permanent record. These logs provide CFOs with a defensible history of payment decisions.
"Improve audit trails by including risk scores in transaction logs"
– Veritas Protocol
Validating and Monitoring Model Performance
Deploying a model is just the beginning. To stay effective against evolving blockchain fraud patterns, it's critical to validate and update your model regularly. Without ongoing monitoring, even the most advanced models risk becoming obsolete as new fraud tactics emerge.
Performance Metrics
To measure how well your model is performing, focus on key metrics:
AUC-ROC (Area Under the Receiver Operating Characteristic curve): This shows how well the model differentiates between legitimate and illicit addresses across various thresholds.
Precision: This tells you the percentage of flagged addresses that are genuinely risky.
Recall: This measures how many risky addresses your model successfully identifies.
F1-score: A balanced metric that combines precision and recall into a single number.
For supervised learning models, accuracy is validated by comparing predictions against verified ground truth data. Historical data benchmarking is another essential step - testing your model against known scam data ensures it can catch past threats effectively. Some systems use a "maximum risk" approach, where the highest individual risk factor (like a sanctions list match) determines the overall risk level for a transaction.
Once your metrics are in place, the next step is to ensure your model stays adaptable to the fast-changing blockchain landscape.
Dynamic Updates for Evolving Data
Retraining your model from scratch every time new transactions occur is inefficient. Instead, incremental training allows you to update the model by incorporating only the latest transactional data, saving time and computational resources.
Using tools like dynamic graph embeddings (e.g., Node2Vec), you can monitor real-time changes in transaction patterns. This is especially useful as privacy tools like mixers and zero-knowledge proofs make traditional graph analysis less effective. Shifting focus to behavioral features - such as transaction frequency, gas usage, and timing anomalies - can help you detect risks even when connectivity in the graph is obscured.
"The system performs an initial training of the Node2Vec model, then incrementally adds addresses which have transacted since the last training run. This helps in addressing the challenges associated with the large scale and dynamically evolving nature of blockchain transaction graphs."
– Ayush Agarwal, Engineering, Coinbase
Incorporating off-chain signals is equally important. News reports, regulatory updates, and threat intelligence feeds can highlight real-world events like sanctions or exchange hacks that impact risk levels. Automated checks, such as API integrations, can flag changes in wallet behavior - an address that was low risk yesterday might become high risk if it starts interacting with flagged entities. Also, pay attention to wallets that reactivate after being dormant for over 180 days, as this is often a red flag for compromised accounts or new illicit activity.
These updates should also align with compliance requirements to ensure operational readiness.
Regulatory and Operational Readiness
Compliance is non-negotiable. Your model must align with FATF (Financial Action Task Force) guidelines and integrate with KYC/AML databases to identify sanctioned entities. Risk reports should summarize key indicators in plain language, making audit reviews straightforward and actionable.
Detailed logging is essential. Every risk assessment should document risk scores, contributing factors, and transaction metadata. This level of detail supports data verification and helps defend decisions during audits or regulatory reviews. Tools like Stablerail automate this process by creating a permanent audit trail that records every step - from initial risk checks to final approvals.
To further reduce false positives, cross-reference multiple data sources. Combining on-chain transaction data with off-chain intelligence, such as KYC/AML databases and threat feeds, improves detection accuracy. Multi-layered heuristics - like identifying new wallets, monitoring dormant wallet reactivations, and spotting address poisoning patterns - add another layer of reliability. For high-risk accounts, additional measures like multi-factor authentication or enhanced due diligence can be triggered before allowing transactions.
"Risk scoring is not a crystal ball. It doesn't guarantee that a transaction is safe or unsafe. It's simply a tool that provides an indication of the level of risk involved."
– Veritas Protocol
Companies that implement robust blockchain risk scoring systems have seen a 30% reduction in fraud-related losses. Coinbase, for example, has saved users millions of dollars by using automated transaction delays and warnings to protect funds. These results stem from a commitment to continuous validation, regular updates, and strict compliance - not from a one-and-done approach to model deployment.
Conclusion
Creating a blockchain address risk scoring model is a continuous process that requires regular updates and strict adherence to compliance standards. The main steps are clear: gather data from both on-chain transactions and off-chain sources, develop features using graph embeddings and behavioral analysis, train supervised models with reliable ground truth labels, assign risk scores with clear thresholds, and integrate the model into treasury workflows with full audit trails. This framework lays the groundwork for automating and securing stablecoin treasury management.
For stablecoin treasury teams, these models streamline compliance checks, flag tainted funds before they can infiltrate your system, and reduce the risk of financial blacklisting linked to illicit activities. As noted earlier, such systems have cut fraud-related losses by 30% and saved users millions of dollars.
The true impact lies in workflow integration. Risk scores can automatically trigger actions like manual reviews, multi-factor authentication, or enhanced due diligence, ensuring every transaction complies with regulations before execution. Solutions such as Stablerail enhance these processes by performing pre-sign checks and maintaining comprehensive audit trails. Every decision is documented, offering CFO-level evidence for audits.
As blockchain networks grow - Ethereum alone had around 270 million addresses as of October 2023 - scalable and automated risk management becomes even more vital. By combining real-time monitoring, adaptable model updates, and audit-ready documentation, you can safeguard your treasury against emerging threats while preserving the speed and efficiency that make blockchain payments so appealing.
FAQs
How do you get reliable labels for training a risk scoring model?
To build a reliable risk scoring model, it's crucial to rely on credible data sources that accurately represent blockchain address risks. These labels typically originate from sources like flagged addresses, documented fraud cases, or reports of illicit activities. Blockchain intelligence platforms and machine learning tools play a key role by examining on-chain data, sanctions lists, and behavioral trends to produce verified labels. This approach provides a strong base for supervised learning and ensures precise risk evaluations.
How do you score brand-new or low-activity wallets accurately?
Scoring wallets that are either new or show little activity can be tough because there’s not much historical data to work with. However, there are ways to tackle this. For starters, looking at when the wallet was created, its first transaction patterns, and other contextual signals can provide some insights.
More advanced techniques involve using risk engines and machine learning models. These tools combine data from behavioral analytics, network relationships, and even external sources to paint a clearer picture. By layering these methods together, it becomes possible to spot potential risks, even when wallet activity is minimal.
What score thresholds should trigger PASS, FLAG, or BLOCK in production?
The article doesn't specify exact score thresholds for triggering PASS, FLAG, or BLOCK actions in production. Instead, it emphasizes the overall process of implementing a risk scoring model, leaving the decision about thresholds up to the user. This allows flexibility to tailor the model based on individual needs and specific use cases.
Related Blog Posts
Ready to modernize your treasury security?
Latest posts
Explore more product news and best practices for using Stablerail.


