More than a year of cross-team collaboration has resulted in an important achievement: Ripple has been awarded the SOC 2 certification!
- How do you make a computer system maximally secure and reliable? Disconnect it from all networks and never change any of the software or data.
- How do you make a computer system maximally useful? Connect it to networks and make frequent changes to the software and data!
- How can we safely balance these factors in order to offer our clients security and reliability, along with regular product updates? By building a system which applies principles which enable secure and reliable data processing, and building formal processes to safely manage all aspects of change to those systems.
- How do we verify whether we have done a good job in building the system and organization controls for our service organization? By inviting external auditors to assess our compliance with the SOC 2 standard.
What is SOC 2?
The System and Organization Controls for Service Organizations (SOC 2) standard certifies that a technical system is designed and operated according to principles which support data privacy and security.
Why does Ripple want to pass SOC 2?
Banks and large payment providers often require a SOC 2 certification as a condition of the bank using a given SaaS product—because without SOC 2 certification, the bank’s internal auditors and security team would need to conduct their own audit of the product and the company, which is an expensive process.
At smaller payment providers, SOC 2 certification can give them comfort that Ripple has been thoroughly audited and has strong processes in place to protect their users’ data—an audit that a smaller payment provider wouldn’t be able to do themselves.
In short: the SOC 2 audit is an independent “seal of approval” for Ripple’s processes and its security posture.
How did Ripple pass SOC 2?
A reliable system starts with a reliable design, built on fundamental principles. As we have developed the system infrastructure which supports RippleNet, core principles such as code/data separation, fault isolation, and role based access control have helped us manage risk by segmenting ownership, control, and resources
Two important design decisions which have benefited us in the long run: don't store secrets like passwords or SSL keys on disk (instead retrieve them from a secret storage service at runtime), and don't allow password based authentication to your infrastructure (instead use certificates).
A reliable system design must be repeatable, so we need a method for describing and deploying our designs: "infrastructure as code". Infrastructure as code extends the boundaries of what may be defined in software: system components like servers, databases, and networks are described in config files and spun using automated tools, which provides two important capabilities: repeatability and auditability.
A reliable system must be automated: using infrastructure as code and configuration management tools, adding, changing, or re-creating resources becomes a routine operation, which decreases risk.
The Git repository containing the infrastructure configs, plus logs from the tools which apply the configs, become a complete record of all changes to the system infrastructure. This gives us a way to audit or rebuild the running system.
The expression of design principles in code, the automated deployment of infrastructure and configuration, and the auditable capability of our system infrastructure, are the basis on which we are able to deliver the data security and privacy assurances our customers require.
Arriving at a system infrastructure which is defined in code did not happen overnight - in fact the foundations were laid years ago and it has taken consistent commitment in order to benefit from the principles of this design: a composable, repeatable, and auditable system infrastructure.
The complement to good system design is good operating procedures: in order to maintain the integrity of the system and deliver its designed capabilities, formal procedures are required for all aspects of change management and system maintenance.
A fundamental tenet of reliable system operations is that all changes must be auditable: at a later date, we must be able to discover what changed, who made the change, what their intention was, and what the result was. For some classes of changes that story is told by a commit log and CI/CD pipeline output; for others a formal Change Approval Board is employed, and the story of the change is told in a ticket filed in an issue tracking system.
Shifting our internal culture to value this kind of formalism did not happen overnight. However, our change management program has demonstrated its value by giving us a process to manage risks, which has increased system reliability for our customers.
Here's a quick example of how a change may flow through our system:
- You are a Ripple engineer
- You are working on a feature for a cryptocurrency exchange integration
- You want to modify the fraction of transactions which are being processed by your feature
- You develop your change and test in an appropriate environment
- You open a merge request to make the equivalent change in the production environment
- You open a ticket to request the change: you describe the goal, link to the merge request, and submit it for approval
- The change approval board evaluates the change and conditionally approves it for deployment. The change is assigned and approved, possibly for execution at a specific time.
- The assignee takes the approved actions to deploy the specified change
- Standard procedures are followed to verify success or respond appropriately to failure
- The result is recorded in the ticket, which becomes a record of the change containing all necessary information for future readers or auditors to understand and verify what happened.
- The ticket is marked as complete by the assignee
- The completed ticket is reviewed by the change approval board
- The ticket is closed
That's an example of one change, but in order to maintain the security and privacy capabilities of the system, we must apply this rigor and formalism to every change. In all cases, we must be able to describe what has changed.
Laying the foundation of system design, policy, and supporting documentation can be traced back across several years of Ripple's company history. After we decided to seek SOC 2 certification, it took more than a year to perform our own internal audits, reform and document processes before we were ready to begin the formal auditing process.
Our entire organization collaborated to write dozens of policy and procedure documents covering topics such as disaster recovery, business continuity, password rotation, laptop security, and more. For example, we wrote documentation to support the following assertions:
- A process is in place to identify risks arising from changes in the entity’s systems and changes in the technology environment.
- A process is in place to manage system changes throughout the life cycle of the system and its components (infrastructure, data, software, and procedures) to support system availability and processing integrity.
- A process is in place to authorize system changes prior to development.
- A process is in place to design and develop system changes.
- A process is in place to document system changes to support ongoing maintenance of the system and to support system users in performing their responsibilities.
- A process is in place to track system changes prior to implementation.
- A process is in place to select and implement the configuration parameters used to control the functionality of software.
- A process is in place to test system changes prior to implementation.
- A process is in place to approve system changes prior to implementation.
- A process is in place to implement system changes.
- A process is in place to identify objectives affected by system changes, and the ability of the modified system to meet the objectives is evaluated throughout the system development life cycle.
- A process is in place to identify changes in infrastructure, data, software, and procedures required to remediate incidents to continue to meet objectives, and the change process is initiated upon identification.
The actual SOC 2 audit takes the form of an assessor reading spreadsheets full of data we have collected and questions answered regarding various factors, and interviewing process owners, service owners, and members of compliance and security teams. In the interviews they ask questions such as, "please demonstrate that you are taking database backups, show us the time when the last backup was taken, show us that the backup is encrypted, and show us that the encryption keys for the backups are safely managed."
By selecting appropriate design principles, compliance with SOC 2 was an achievable goal for Ripple because we had already established good patterns and had a capability to demonstrate our adherence to them. Therefore it was a point of pride when the audit finding was "no remediation required" - there were no repairs or amendments needed as a result of audit findings. Ripple's system and operating controls for RippleNet are SOC 2 compliant!
What this means for our clients and their customers: a third party has certified that we have demonstrated the risk management and operational excellence required in order to be entrusted to secure private financial data.
If you’re interested in joining Ripple’s engineering team, we’re hiring.