A Crash Course in Google Cloud Platform

Getting cirrus about Cloud Computing

James Collerton
8 min readNov 20, 2022
Token Unsplash Photo 💃🏻

Audience

This article is aimed at engineers looking to get a surface level introduction to Google Cloud Platform (GCP). This article is not designed to give you an introduction to the concept of cloud computing in general. To get the most out of it it will help if you have used at least one other provider (Microsoft, Amazon…) before.

Personally, I have used AWS, so I will use it as a reference. However, even without that specific background, hopefully you can understand.

Argument

The first thing we need to do is create a free GCP account. Once you’re done you should receive a screen similar to the below.

This gives us the freedom to dig into the following sections.

Resource Manager

In cloud computing we provision different resources. These may be virtual machines, storage providers, databases etc.

It is useful to categorise resources depending on where they are used. We employ these categorisations to manage their shared properties, such as security or configuration settings. This is the role of Resource Manager.

The first thing you need is an organisation. This is the root of the resource hierarchy. For example, we may have a single organisation for our company. It is here we enter any billing information.

Within an organisation you have projects. All resources must belong to a project. This contains their shared permissions, settings and metadata. Projects are designed such that internally services can communicate with ease.

Projects have unique IDs (which cannot be reused, even after deletion). They act as a namespace for their contained resources, meaning resources can share names, but must be in separate projects.

Projects are also associated with a billing account (although multiple ones can associate with the same billing account).

An example of how resources are organised, the organisation contains the projects.

Finally, we land on folders. These come under an organisation and can contain multiple projects. We may use them to group and assign IAM roles to projects on a per-department basis, or any other useful segmentation.

Google and Geography

Anyone familiar with cloud computing should understand that the services they provision are running physically somewhere round the globe. At Google the globe is split into Regions (Asia, Australia, Europe, North America, and South America), with each region being split into zones (using the region-zone nomenclature, e.g. asia-east1-a).

Within this, there is the idea of global, regional and zonal resources. Intuitively this is where we think of our resource being limited to.

  • Global Resources: Can be accessed by any other resource, across regions and zones.
  • Regional Resources: can be accessed only by resources that are located in the same region.
  • Zonal Resources: can be accessed only by resources that are located in the same zone.

Compute

Now we’ve covered some basic organisational concepts, let’s dig into the resources themselves. The first thing we will cover is the functionality provided for hosting computation.

Compute Engine: For those of you familiar with AWS EC2, Compute Engine is the GCP equivalent. It allows you to host virtual machines, choosing their types, specs and scaling.

Kubernetes Engine: This is a managed service for deploying containers on Kubernetes. My complete article on how to use it is found here.

Batch: Lets you run batch jobs using Compute Engine instances.

Distributed Cloud: Sometimes we may not be able to move our data into Google’s cloud. For example, there may be regulations around where we can store sensitive financial data. Distributed Cloud allows us to use Google services in our own edge locations and data centres.

Serverless

Serverless in GCP is the same as serverless in other providers. It lets us carry out computations without being concerned about the underlying machines.

Cloud Run: This allows you to run long-lived container-based services without worrying about infrastructure. For example you may want to have an API written in Spring, or a database migration job written in Python.

Cloud Functions: These are the Google equivalent of AWS Lambda functions. They are small, ephemerally-run pieces of code kicked off by some kind of trigger.

App Engine: This lets you build monolithic server-side rendered websites (as opposed to more generic services like Cloud Run). It handles deployments, scaling and databases for you.

Storage

It’s important to note we’ll delineate between object storage and databases, which will be covered in the next section.

Cloud Storage: For those of you familiar with AWS, this is the Google S3. It provides highly available object storage.

Transfer Appliances: A physical device you can use to take data from your premises and upload it into Google Cloud.

Databases

This section covers the database offerings from Google.

BigTable: This is one of Google’s non-relational offerings. It is a key-value store and similar in nature to AWS’ DynamoDb. It maintains a cluster of nodes which scale to offer consistent performance.

FireStore/ DataStore: FireStore is the next generation of DataStore. These are Google’s document databases. As with most Google products it offers scalability and availability.

Memorystore: A managed in-memory datastore service for Memcached and Redis. Takes over the work for availability and scaling. Similar to AWS Elasticache.

Spanner: Now me move to Google’s relational offerings. Spanner is the managed relational DB. It handles replicas, sharding, transactions and maintenance. An example of how we may use it is in my article here.

SQL: This is a managed service for MySQL, PostgreSQL and SQL Server. The main selling points are easy management of the underlying machines and quick integration with other Google services.

Identity and Access Management

IAM revolves around controlling who has what access to which resource.

We have the following components:

  • Resources: These are any GCP resources such as those for computing, storage or data.
  • Resource Hierarchies: This pertains to the resource hierarchy we defined in the earlier section. By grouping things by project and folder we can also group their permissions.
  • Permissions: This is what we are allowed to do on a resource and is expressed in a service.resource.verb format, for example pubsub.topics.create.
  • Principals/ Members: These are used to identify a user or group of users. They include Google accounts, Google groups and Cloud Identity Domains.
  • Roles: These are groups of permissions. You can’t grant a permission directly to a principal, you instead have to give them a role.
  • Allow Policies: You do the granting of roles to a user by creating an allow policy which is attached to the resource. This enacts the permissions when the resource is accessed.

A quick note on Google Identities. One thing that is very useful are Service Accounts. These are accounts that represent a group of one or more resources running in GCP.

For example we may have a cluster of compute engines and want to let each node have the same permissions. We would give the cluster a service account that the nodes would use.

Security

Now we move to the security functionality offered by Google.

reCAPTCHA: This service is used to invisibly monitor and detect fraudulent web traffic. It provides a score between 0 and 1 indicating how likely it is a user isn’t genuine.

Secret Management: This allows you to store and access sensitive data including API keys, certificates and passwords. It also eases permissions and life cycle management.

Key Management: This lets you manage your symmetric and asymmetric keys, as well as control generation, access and rotation.

Certificate Authority Service: A highly available, managed CA allowing you to acquire and manage certificates. For more information on CAs you can read my article here.

Integration Services

These services are used to connect either the outside world, or internal Google services.

API Gateway: This is very similar to the AWS API Gateway and is used to expose serverless backends as APIs. It provides monitoring, tracing, logging, scaling and alerting for your API layer.

Workflows: These let you orchestrate several services together in a state-machine like manner. Imagine you want to pass messages between Lambdas, calling different services depending on the content of the messages. This is a perfect solution. It is very similar to AWS Step Functions.

Application Integration: This service lets you integrate with different providers and use them as a data source, triggering different Google services. For example, we may want to ingest Salesforce data into Google BigTable.

Operations

Monitoring: This can be used out the box to monitor all of your GCP services. It can also be integrated with Kubernetes, and allows for the definition of service SLOs — all available straight to your GCP dashboard!

Logging: A place to easily store and search the logs for your services. Additionally you can set alarms and perform analysis on the results.

Trace: This system allows for distributed tracing of requests. It is primarily used to identify bottlenecks over your entire system. If you have a slow request it can help you track where in its path it is incurring the latency (at the API, database or other level).

CI/ CD

Now we move onto the services involved in building and releasing your code.

Cloud Build: This is used to create pipelines for building and deploying your code to Google services. It cares for all of the infrastructure letting you only worry about what’s being created!

Cloud Deploy: Whereas Cloud Build is designed to work with most GCP resources, Cloud Deploy focuses on delivery to Google Kubernetes Engine and Anthos.

Container Registry: This lets you manage your container images (similar to a Docker repository). It is highly available and offers extras like easy integration with other Google services and security scanning.

Analytics

Dataproc: This is used to run popular contemporary datacentric frameworks like Apache Hadoop, Apache Spark and Apache Flink. It removes the overhead of looking after the underlying infrastructure by using its serverless model.

Pub/Sub: This offers managed functionality for services to publish events which other services can subscribe to. The separation of concerns means that publishers and subscribers can function independently.

Dataflow: This is Google’s managed streaming functionality, and is similar to AWS Kinesis. It offers extras such as autoscaling your infrastructure and the opportunity to implement real-time AI-based reactions to large-scale data.

Datastream: This is used to synchronise data across databases and data storage solutions. For example, we may be updating a relational DB and want to keep a non-relational DB synchronised. Datastream is a managed solution.

Conclusion

In conclusion we have carried out a superficial coverage of the different GCP services offered.

--

--

James Collerton

Senior Software Engineer at Spotify, Ex-Principal Engineer at the BBC