Data Architecture & Flows
Last updated
Last updated
This document provides an overview of data movement and storage within the Gomboc system. As Gomboc is a SaaS offering, it is even more important for our customers to understand how their information is handled.
The following series of diagrams provides a high-level overview of the various components of the Gomboc system and how data moves across it. The primary function Gomboc provides is to analyze your infrastructure source code and provide recommendations to enforce security best practices in that code through diffs and change requests. As such, Gomboc must integrate with your source code management (SCM) system. Each SCM integration operates slightly differently. For instance, the GitHub integration uses a GitHub Application with minimally required access scope, while the GitLab, BitBucket, and Azure DevOps integrations use similarly scoped access tokens to use the respective service APIs.
A few things that Gomboc explicitly does not do:
Store copies of customer source code
Use customer source code to train any shared machine learning models
Send data to external generative AI systems to create code recommendations
The Gomboc application is a software as a service (SaaS) offering hosted in the cloud. System services component are currently spread between AWS us-east-1 and GCP us-east4 regions.
The environment is deployed on a private subnet and all traffic to and from the Gomboc API is encrypted in transit. All data stored in the Gomboc DB (which includes a relational database and a key-value database) is encrypted at rest using AES-256 encryption.
Note: A transition from AWS to Google Cloud is scheduled to occur during Q4 2024, which will improve some of the elements of security posture in the cloud. This includes:
All data will be fully encrypted in motion and at rest, including intra-service communication
Encryption keys will be rotated every 90 days
In #2 in the high-level architecture diagram above, Gomboc executes the following data flow to deliver remediations on customer code:
On initial configuration of the SCM integration with Gomboc, a customer provides a personal access token and follows the instructions provided by Gomboc here to complete the integration.
After access to the SCM is confirmed, Gomboc pulls a list of repositories accessible with the access token. The user selects which repositories should be linked to Gomboc for scanning. No actions are ever taken on repos that have not been explicitly linked, even if they are within the scope of the access token provided.
The next step is for Gomboc to analyze the one or more of the linked code repositories. A user can run an individual repository scan or a bulk scan of multiple repositories.
Picking up after the SCM Integration and linking of code repositories is complete, the Remediation Service performs an analysis of source code to recommend contextual fixes to the user.
Gomboc pulls the source code code for the repository, transmitting it from the customer SCM to temporary storage on a node in our cloud service infrastructure.
The source code is analyzed to find recognized IaC types, currently Terraform and Cloudformation. Then, the system proceeds to analyze the infrastructure code. The results of this analysis are either:
A set of findings about the code, which include information about the policy statement applied, cloud resource type, cloud resource name, finding type (e.g., remediable), a diff of the changes proposed by Gomboc, and reference to any code change request, if applicable.
Both (a) and a code change request created in the customer SCM system.
Once the remediation scan performed by Gomboc is complete, any temporary artifacts from the analysis are deleted.
If there are any errors executing analysis as part of the Remediation Service, any system logs include references that let us come back to the customer to determine how we can reproduce the error, but we do not log any source code as it is processed.
Users may navigate and access results from Gomboc source code analysis using the web UI client (#3 above). The information available in this UI includes remediation observations, performance data tracking the time from issue detection through to resolution, scan activity, and compliance adherence.
In the case of build system integrations such as in #1 in the architecture diagram above, Gomboc provides a build script which uses a container that holds a command line client application which communicates with the Gomboc APIs. Based on configuration of the pipeline, the build can trigger code analysis by Gomboc on code change request creation. The workflow is as follows:
A user with a git client creates a code change request (i.e., a pull request)
The code change causes the CI build pipeline to execute, trigging the Gomboc client to execute a static analysis scan.
The static analysis scan performed uses the Gomboc client, which triggers a request for the Gomboc remediation service to pull the code under review for analysis and returns any findings. Steps 1 and 2 above from the Remediation Service description are executed. Results of this scan show in the Gomboc UI audit trail as well.
If there are findings detected, then merging the changes can be blocked. A new code change request is created and linked to the current changes under review.
The user can address the fixes by approving the new change request, after which the original change request can be merged.
Based upon user configuration of the pipeline, it can either block the pipeline from completing or simply issue a warning to the user.
As a Gomboc administrator you have the option to let Gomboc manage your digital identities or you can supply your own identity provider (IdP) for single sign-on (SSO) authentication. See Authentication - Pwdless & SSO for more details.
We recommend using SAML SSO authentication as user access can be centrally managed and revoked in the case of employee turnover or other security events.
To access SCM integrations, Gomboc does store the configuration for the integration, including the access token. However, this information can not be revealed by a user once it is entered, so harvesting the token is not possible after entry. It is a best practice that this token should be rotated on a regular basis and updated in the SCM configuration in Gomboc.
As for authentication to Gomboc, a user can authenticate with the authentication methods previously mentioned (magic link or SAML SSO) or a user can generate a personal access token to access the API at api.app.gomboc.ai, accessing the public API as noted in #4 above. See API for more information.