FinOps

From Ian Loe Academy Wiki
Revision as of 14:50, 23 November 2022 by Ila admin (talk | contribs) (→‎Operate Phase)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Background

What is FinOps?

FinOps is the practice of bringing together Finance, Technology, and Business to master the unit economics of the cloud for business advantage.

The financial operating model for the cloud
The financial operating model for the cloud
A way of introducing accountability for cloud spend
A way of introducing accountability for cloud spend
A cultural practice that delivers financial and operational control
A cultural practice that delivers financial and operational control
The financial operating model for the cloud A way of introducing accountability for cloud spend A cultural practice that delivers financial and operational control
FinOps is shorthand for “Cloud Financial Operations” or “Cloud Financial Management” or “Cloud Cost Management”.

It is the practice of bringing financial accountability to the variable spend model of cloud, enabling distributed teams to make business trade-offs between speed, cost, and quality.

Culture

FinOps is a cultural practice.

It’s the way for teams to manage their cloud costs, where everyone takes ownership of their cloud usage supported by a central best-practices group.

Delivery teams can be made responsible not just for delivering code, operating the code, securing the code, and making sure that the code accomplishes its objectives, but also for managing its costs, both fixed and variable.

Cost Centre of Excellence

The FinOps Team is a agile team made up of representations from the Engineering team, the Business or product teams, the sponsoring executives, and Finance/Procurement teams. Cost structure should be decided collectively.

FinOps Lifecycle

The FinOps Lifecycle is made up of iterative phases to establish baselines, optimise rates and usage patterns and drive operations to the right balance of cost efficiency and business value.

There are 3 phases in the LifeCycle:

Inform
  • Dashboards & Reporting
  • Direct Chargeback
  • Anomaly Alerts
  • Budgets & Forecasting
Optimise
  • •Terminate Idle Resources
  • •Schedule server run times
  • •Right size Resources based on utilisation
  • •Use serverless technologies
Operate
  • Continuous Improvement
  • Evaluate the value to the business
  • Align with Finance and Line of Business

A visual view of the lifecycle:

A Visual View of the Lifecycle
A Visual View of the Lifecycle


Inform Phase

Most important thing to do is to know who owns what workload on the cloud.

The use of the tags/ labels, account hierarchy and other taxonomy can help to allocate all costs in order to get a near-real time view of our cloud usage.

This can be:

  • Across the enterprise
  • For subsets of the enterprise
  • For specific projects or accounts
  • For granular resource usage

Purpose

Why is this important? We need to establish trust in the numbers and to consistently report cloud cost data with all stakeholders.

This can help with:

  • Transparency and the feedback loop
  • Anomaly Detection
  • Benchmarking teams performance
  • Cost allocation
  • Accounts, Taxonomy, Tags/Labels
  • Forecasting Spend and Budgets

Report

What do we need to do?

  • Give Daily (or Periodic) Usage Feedback
  • Ensure Clean, accurate, consistent data that is simply presented
  • Use Common language and cost metrics
  • Automation of reporting can help
  • Report by Stakeholders
  • Groups of people with different needs / focuses / scopes
  • Report on variety of views of data
  • Direct Monthly Costs (Cash basis, supports invoice reconciliation)
  • Prepaid Costs (Amortised basis)
  • Special Project View (report by Project, Strategic Initiative, R&D Program, etc.)

Anomaly Detection

Anomaly detection is crucial to any large-scale cloud operation. In addition to security and operational monitoring, cost monitoring can provide crucial early warning signs.

  • Choose tools that meet your needs
  • Consider various alerting schemes
  • Alert on cost thresholds
  • Alert on standard deviation thresholds
  • Alert on specific views/subsets of spending
  • Alert quickly, automate alerting to email/monitoring/ticketing systems


Optimise Phase

In the optimise phase we target, define, and document optimisation opportunities.

  • Define Goals, Metrics, Targets
  • Optimize Candidates
  • Optimize Usage
    • Workload Management
    • Rightsizing
  • Optimize Rates
    • Reservation Purchasing
    • Spot Market
    • Discounting
  • Build Business Cases

Candidates for this phase

  • Avoid 100%
    • you can get rid of or turn off things you aren’t using. remove unused instances or move backup snapshots to cheaper storage options
  • Save 50%
    • You can buy Reserved Instances (RIs) for things that you are using correctly. If you have a steady predictable workload, it is better to get long term commitment of the resources.
  • Save 25%
    • Right size things you’re not using correctly. If sizing is too small, you might need to scale up with too many instances. If sizing is too large, you are paying for unnecessary capacity.
  • Save between 0% and 100%
    • Use different things to do the same job. Explore architecture options that can potentially save cost - e.g. use of container or serverless technologies, use of managed Databases.

Optimise Usage

Usage Commitment Usage Consolidation Architecture Optimisation
  • Enterprise Discount (5~8%)
  • AWS RI/GCP CUD (30~40%)
  • Shared Reserved Instances
  • Sustained use discount (20~30%)
  • Storage Volume discount
  • Usage optimisation (Cleanup unused resources)
  • Autoscaling
  • Upgrade Instance generations
  • Rightsizing Instances
  • Resize EBS
  • Introduce spot instances (~80%)
  • S3 lifecycle
  • Priority to Managed Services

Suggestions for Optimisation

  • Crawl: manually turn off resources when you’re not using them
  • Walk: schedule turn off of resources on a schedule (weekends, overnights, etc.)
  • Run: Identify and auto-terminate resources identified not to be running and without scheduled turn-off times


Operate Phase

The Operate Phase is where we take action to achieve the goals and the company’s internal processes are engaged.

The business may:

  • Choose to perform an optimisation plan
  • Table it for a later time Backlog
  • Decide not to implement it for a good reason
  • Determine that it is infeasible to action –minimise

Any of these outcomes is positive if transparently communicated

The ultimate goal of FinOps is to track costs back to business benefits - It’s not dollars spent on cloud, it’s dollars spent per reservation, per available seat mile, per ticket, per customer transaction, per million active users

Never lose sight of that goal

Challenge the business to look at cloud IT costs like this: They could most likely never do it before

Think back: Why did you adopted FinOps in the first place?