Skip to content

Example: NorthWind Retail Online

About This Example

This is a fictional but realistic Solution Architecture Document for NorthWind Retail Ltd’s customer-facing e-commerce platform. It demonstrates the Architecture Description Standard at Recommended documentation depth — the expected level for a Tier 2 High Impact system handling PCI-DSS regulated payment data and peak sales volumes of £30M/day.

Fictional company: NorthWind Retail Ltd — a UK-based B2C retailer with 450 stores and £2.8bn annual turnover. Fictional solution: NorthWind Online — a customer-facing e-commerce platform (web and mobile app) migrating from a legacy .NET monolith to a microservices architecture on AWS.


FieldValue
Document TitleSolution Architecture Document — NorthWind Online
Application / Solution NameNorthWind Online
Application IDAPP-0821
Author(s)Priya Doe (Solution Architect)
OwnerPriya Doe
Version1.2
StatusApproved
Created Date2025-07-14
Last Updated2026-03-18
ClassificationInternal — Restricted
VersionDateAuthor / EditorDescription of Change
0.12025-07-14Priya DoeInitial draft covering executive summary and logical view
0.22025-08-21Priya DoeAdded physical, data and security views following architecture workshops
0.32025-09-30Priya Doe, Tom BloggsSecurity review incorporated; PCI-DSS scope narrowed via tokenisation decision
1.02025-11-10Priya DoeFirst approved version following Design Authority review
1.12026-01-22Priya DoeUpdated cost model after Black Friday 2025 peak capacity validation
1.22026-03-18Priya DoeRevised ADRs and risk register following mobile-app launch
NameRoleContribution Type
Priya DoeSolution ArchitectAuthor
Fred BloggsHead of Digital EngineeringReviewer
Jane DoePrincipal Security ArchitectReviewer
Tom BloggsData Protection OfficerReviewer
Sally DoeSRE LeadReviewer
Raj BloggsHead of Digital Commerce (Business Owner)Approver
Helen DoeCTOApprover
Design AuthorityGovernanceApprover

This SAD describes the architecture of NorthWind Online, the customer-facing e-commerce platform for NorthWind Retail Ltd. It replaces the legacy NW-Commerce .NET monolith with a cloud-native microservices platform hosted on AWS, supporting peak sales of £30M/day during seasonal events.

  • Scope boundary: Customer-facing web storefront (Next.js), mobile application back-end services, microservices domain (catalogue, basket, checkout, order, customer, search), data stores, payment integration, and supporting AWS infrastructure.
  • Out of scope: Warehouse management system (documented in APP-0214), in-store EPOS (APP-0088), marketing cloud platform (SaaS — vendor-managed), and the corporate SAP ERP (APP-0001).
  • Related documents: NorthWind Cloud Landing Zone SAD (APP-0750), PCI-DSS Scope Document (SEC-PCI-2025-03), Data Protection Impact Assessment (DPIA-2025-091), Digital Channels Strategy (STRAT-DGT-2025).

NorthWind Online is the primary digital sales channel for NorthWind Retail Ltd, serving approximately 12 million active customers across the UK via responsive web (www.northwind.co.uk) and native mobile applications (iOS and Android). The new platform replaces the legacy NW-Commerce .NET monolith — which has reached the limits of its scaling capacity and cannot reliably handle Black Friday and Boxing Day peaks — with a cloud-native microservices architecture on AWS.

The platform is built on Amazon EKS running Node.js microservices, fronted by a Next.js storefront (server-side rendered via Vercel-equivalent pattern on AWS), and backed by Amazon RDS Aurora PostgreSQL. Payments are processed via Stripe (tokenised at the browser via Stripe Elements), email via SendGrid, and customer behaviour events are captured via Segment CDP for downstream marketing analytics.

DriverDescriptionPriority
Peak capacity failureLegacy monolith failed twice in Black Friday 2024 peak, losing an estimated £8.2M in sales over 3 hours; board directive to remediate before Black Friday 2026Critical
PCI-DSS complianceCurrent platform is PCI-DSS v3.2.1 scoped at Level 1; v4.0 transition required by 31 March 2026 with tokenised payment flow to reduce scopeCritical
Digital growth strategyBoard target of 40% of group revenue online by 2028 (currently 22%); requires platform able to deliver new customer experiences quicklyHigh
Legacy end-of-life.NET Framework 4.7.2 and Windows Server 2016 reach end of extended support in 2026; Oracle Commerce platform is unsupported since 2024High
Mobile channel growthMobile traffic has grown from 48% to 71% of sessions in 2 years; current platform has no mobile-specific API surface, relying on scraped web viewsHigh
Personalisation & CDPMarketing team requires real-time customer event stream for personalisation; legacy platform cannot emit structured eventsMedium
QuestionResponse
Which organisational strategy or initiative does this solution support?Digital Channels Strategy 2025-2028 (STRAT-DGT-2025), specifically Workstream 1: Re-platforming NorthWind Online
Has this solution been reviewed against the organisation’s capability model?Yes — mapped to Digital Storefront, Order Management, Customer Identity, and Payment Processing capabilities
Does this solution duplicate any existing capability?No — replaces the legacy NW-Commerce monolith which will be decommissioned
CapabilityShared Service / PlatformReused?Justification (if not reused)
Identity & Access (Customer)AWS Cognito (customer tenant)YesCorporate-approved customer IDP; replaces legacy custom auth
Identity & Access (Colleague)Okta (corporate SSO)YesUsed for admin portal and operations tooling access
Payment ProcessingStripeYesExisting group-wide contract; handles tokenisation to reduce PCI scope
Email / Transactional MessagingSendGridYesCorporate-approved email service; shared with loyalty programme
CDNAmazon CloudFrontYesCorporate landing zone standard
Customer Data PlatformSegmentYesExisting enterprise contract; feeds Salesforce Marketing Cloud
Monitoring & LoggingDatadog (corporate)YesCorporate APM and log aggregation platform
CI/CDGitHub Actions (corporate organisation)YesCorporate standard
Container PlatformAmazon EKSYesCorporate landing zone standard
  • Customer-facing web storefront (Next.js SSR) and native mobile applications (iOS, Android)
  • Back-end microservices: catalogue, search, basket, checkout, order, customer, promotion
  • Payment integration via Stripe (Stripe Elements client-side tokenisation)
  • Customer identity and account management via AWS Cognito
  • AWS infrastructure: EKS, RDS Aurora PostgreSQL, ElastiCache Redis, OpenSearch, S3, CloudFront, WAF, SQS
  • Integration with SAP ERP (inventory, pricing, order hand-off), warehouse management (APP-0214), and loyalty platform (APP-0417)
  • All environments: development, test, staging, production, and DR
  • Event capture for Segment CDP
  • Warehouse management system modifications (APP-0214)
  • In-store EPOS (APP-0088)
  • Marketing cloud platform configuration (Salesforce Marketing Cloud, vendor-managed)
  • Corporate finance reporting integrations (handled by ERP team)
  • Back-office merchandising tooling (Phase 2, planned 2027)

The legacy NW-Commerce platform was built in 2016 on Oracle Commerce 11 and .NET Framework 4.7.2, hosted on Windows Server 2016 virtual machines in NorthWind’s private data centre in Basingstoke. It serves the current £620M/year online turnover.

Key limitations:

  • Peak capacity: Vertical scaling limits reached at approximately 1,800 orders/minute; Black Friday 2024 demand peaked at 2,400 orders/minute and the platform failed for 3 hours 12 minutes, losing an estimated £8.2M in sales.
  • Release velocity: Full-regression release cycle of 6 weeks; any code change requires full platform deployment.
  • Mobile experience: No mobile-specific APIs; the iOS and Android apps scrape the responsive website HTML, which is brittle and slow.
  • Vendor support: Oracle Commerce 11 is unsupported since 2024; there is no patch stream for security or functional issues.
  • Operational cost: Annual hosting, licensing and operational support totals £4.1M including 11 FTEs.
  • PCI-DSS scope: The entire application stack is in PCI-DSS scope because cardholder data enters the application server prior to tokenisation.

What is being retained: SAP ERP (integration via APIs), warehouse management system (APP-0214), loyalty platform (APP-0417). What is being replaced: Oracle Commerce 11, .NET monolith, on-premises Windows Server hosting. What is being decommissioned: NW-Commerce application servers (post 3-month parallel-run period).

Decision / ConstraintRationaleImpact
AWS as hosting platformCorporate Cloud Landing Zone is AWS-only; existing enterprise agreementAll infrastructure on AWS in eu-west-2 (London)
EKS for container orchestrationExisting team skills; corporate standard; portability across cloudsMicroservices deployed as Kubernetes pods
Aurora PostgreSQL over MySQLSuperior JSONB support for product catalogues, stronger consistency model, better observability ecosystemRDS Aurora PostgreSQL for all transactional data
Next.js SSR over client-only SPASEO is critical for e-commerce discovery; SSR improves Core Web Vitals (LCP) substantiallyStorefront rendered server-side on AWS
Stripe for paymentsGroup-wide contract; Stripe Elements keeps cardholder data out of NorthWind systems, reducing PCI scope to SAQ A-EPPCI-DSS scope reduced; dependency on Stripe for card payment flow
Data residency: UKUK GDPR and corporate data policy require customer PII in the UKeu-west-2 (London) primary; non-PII operational data only in eu-west-1 (Ireland) DR
Must deliver before Black Friday 2026Board directive following 2024 outageGo-live milestone: 2026-10-01 (7 weeks prior to Black Friday)
FieldValue
Project NameNorthWind Online Re-platform
Project Code / IDPRJ-2025-112
Project ManagerFiona Bloggs
Estimated Solution Cost (Capex)GBP 2,000,000 (delivery)
Estimated Solution Cost (Opex)GBP 800,000 per annum (AWS, SaaS, support)
Target Go-Live Date2026-10-01 (full cut-over); phased roll-out from 2026-06-01

Selected criticality: Tier 2: High Impact

Justification: NorthWind Online is the primary digital sales channel, contributing £620M/year currently (projected £1.1bn by 2028). Failure during peak trading periods would cause:

  • Direct revenue loss of up to £30M per day during peak trading (Black Friday, Boxing Day, Cyber Week)
  • Breach of PCI-DSS obligations if security controls fail, with potential fines and card scheme sanctions
  • UK GDPR breach notification obligations if customer PII is exposed
  • Reputational damage in a competitive retail market
  • Failure is not immediately life-safety critical (Tier 1 reserved for in-store point-of-sale and safety systems)

StakeholderRole / GroupKey ConcernsRelevant Views
Raj BloggsHead of Digital Commerce (Business Owner)Revenue, conversion rate, time-to-market for new features, peak resilienceExecutive Summary, Scenarios, Performance
Helen DoeCTOStrategic alignment, technology direction, costExecutive Summary, Cost, Lifecycle
Jane DoePrincipal Security ArchitectPCI-DSS, threat model, customer PII protectionSecurity View, Data View
Tom BloggsData Protection OfficerUK GDPR, data sovereignty, DPIA, retentionData View, Security View
Priya DoeSolution ArchitectDesign integrity, standards compliance, maintainabilityAll views
Sally DoeSRE LeadObservability, incident response, on-call, peak readinessOperational Excellence, Reliability
Fred BloggsHead of Digital EngineeringMicroservice design, developer experience, CI/CDLogical View, Integration, Lifecycle
Fiona BloggsProject ManagerDelivery milestones, budget, risks, dependenciesExecutive Summary, Governance
Harriet DoeHead of MarketingPersonalisation, event capture, SEOIntegration View, Scenarios
Dave BloggsHead of Customer ServiceOrder visibility, account self-service, refundsScenarios
Customers (c.12M)End UsersSpeed, availability, security, trustExecutive Summary, Scenarios, Performance
Retail merchandisers (c.80)Internal admin usersProduct-listing workflow, stock visibilityScenarios, Logical View
ConcernStakeholder(s)Addressed In
Peak trading availability and performanceRaj Bloggs, Sally Doe, Customers4.2 Reliability, 4.3 Performance, 3.3 Physical View
PCI-DSS compliance and card data protectionJane Doe, Helen Doe3.5 Security View, 2.3 Compliance
UK GDPR and customer data protectionTom Bloggs, Jane Doe3.4 Data View, 3.5 Security View
Revenue loss from downtimeRaj Bloggs, Helen Doe4.2 Reliability, 1.8 Business Criticality
Speed of feature deliveryFred Bloggs, Harriet Doe5.1 CI/CD, 5.4 Release Management
Cost of AWS platform at peakHelen Doe, Fiona Bloggs4.4 Cost Optimisation
Vendor lock-in to StripePriya Doe, Helen Doe3.1.6 Technology & Vendor Lock-in, 6.3 Risks
Search quality and relevanceHarriet Doe, Raj Bloggs3.1 Logical View, 3.6 Scenarios
Mobile app parity with webRaj Bloggs, Customers3.1 Logical View, 3.2 Integration
Regulation / StandardApplicabilityImpact on Design
PCI-DSS v4.0Mandatory — platform accepts card paymentsScope reduced to SAQ A-EP via Stripe Elements tokenisation; PAN never traverses NorthWind systems. Network segmentation, encryption, audit logging and quarterly ASV scans still required
UK GDPR / Data Protection Act 2018Mandatory — platform processes customer PII at scaleDPIA completed, lawful basis documented, right-to-erasure supported, retention policies enforced
PSD2 / Strong Customer AuthenticationApplicable — card payments above £30 require 3-D Secure 2Stripe handles SCA challenge flow; checkout UX must accommodate challenge redirect
Consumer Rights Act 2015Applicable — digital B2C contractsCooling-off period support, refund handling, clear terms presentation
WCAG 2.2 AACorporate accessibility policyStorefront and mobile app must meet AA; automated accessibility testing in CI
  • No FCA-regulated activities. Payment regulation (PSD2 SCA) is satisfied by Stripe acting as the acquirer.
StandardVersionApplicability
NorthWind Information Security Standard3.4All sections — security controls, access management
NorthWind Cloud Landing Zone Standard2.1Physical View, Security View — AWS controls, tagging
NorthWind Data Classification Standard1.2Data View — classification and handling
PCI-DSS4.0Security View — card data flow, segmentation
OWASP ASVS4.0 L2Application security verification

graph TD
  Web[Customers: Web Browser] --> CF[CloudFront CDN + WAF]
  Mob[Customers: Mobile App] --> CF
  CF --> Next[Next.js Storefront SSR]
  CF --> APIGW[API Gateway]
  Next --> APIGW
  APIGW --> Cat[Catalogue Service]
  APIGW --> Bas[Basket Service]
  APIGW --> Chk[Checkout Service]
  APIGW --> Ord[Order Service]
  APIGW --> Cus[Customer Service]
  APIGW --> Sea[Search Service]
  Cat --> Aur[(Aurora PostgreSQL)]
  Bas --> Red[(ElastiCache Redis)]
  Chk --> Aur
  Chk --> Stripe[Stripe Payments]
  Ord --> Aur
  Ord --> SQS[SQS Order Queue]
  Cus --> Cog[AWS Cognito]
  Cus --> Aur
  Sea --> OS[(OpenSearch)]
  SQS --> SAP[SAP ERP / Warehouse]
  Ord --> SG[SendGrid Email]
  APIGW --> Seg[Segment CDP]
Application architecture: Customers access Next.js storefront and mobile apps via CloudFront. The API Gateway routes to Node.js microservices on EKS (catalogue, basket, checkout, order, customer, search). Microservices use Aurora PostgreSQL, OpenSearch and Redis. Payments go to Stripe, emails via SendGrid, events to Segment CDP.
ComponentTypeDescriptionTechnologyOwner
Storefront WebWeb ApplicationServer-side rendered customer-facing storefront for SEO and performanceNext.js 14, React 18, TypeScriptDigital Commerce team
Mobile App (iOS, Android)Web Application (native)Native customer apps consuming platform APIsSwift (iOS), Kotlin (Android)Mobile team
API GatewayGatewaySingle ingress for all microservices; request validation, throttling, authAWS API Gateway (REST)Platform team
Catalogue ServiceAPI ServiceProduct data, categories, pricing, availabilityNode.js 20, NestJS, on EKSCommerce team
Search ServiceAPI ServiceFaceted search, autocomplete, type-ahead, relevance rankingNode.js 20, NestJS, on EKSCommerce team
Basket ServiceAPI ServiceCustomer basket state, promotion applicationNode.js 20, NestJS, on EKSCommerce team
Checkout ServiceAPI ServiceCheckout orchestration, Stripe integration, 3-D Secure flowNode.js 20, NestJS, on EKSCommerce team
Order ServiceAPI ServiceOrder creation, hand-off to SAP, status trackingNode.js 20, NestJS, on EKSCommerce team
Customer ServiceAPI ServiceCustomer profile, address book, consent, order historyNode.js 20, NestJS, on EKSCommerce team
Promotion ServiceAPI ServicePromotion rules engine, voucher validationNode.js 20, NestJS, on EKSCommerce team
Transactional DatabaseDatabaseAuthoritative store for catalogue, orders, customer, promotionsAmazon RDS Aurora PostgreSQL 15 (Multi-AZ)DBA team
Search IndexSearch EngineProduct search index with synonyms and boostsAmazon OpenSearch 2.xPlatform team
Basket CacheCacheSession-scoped basket state and rate-limiting countersAmazon ElastiCache Redis 7.x (cluster mode)Platform team
Order QueueQueueDecouples order hand-off to SAP from checkout response pathAmazon SQS (standard + DLQ)Platform team
Static Asset StoreFile StorageProduct images, merchandising assets, app bundlesAmazon S3 + CloudFrontPlatform team
Customer IdentityServiceCustomer sign-up, sign-in, MFA, password reset, social loginAWS Cognito (customer user pool)Platform team
Service IDService NameCapability IDCapability Name
SVC-NWO-01Product DiscoveryCAP-COMM-010Digital Storefront
SVC-NWO-02Basket & CheckoutCAP-COMM-011Online Order Capture
SVC-NWO-03Customer AccountCAP-CUS-004Customer Self-Service
SVC-NWO-04Order Fulfilment Hand-offCAP-OPS-007Order Orchestration
SVC-NWO-05Payment ProcessingCAP-FIN-003Card Payment Acceptance
Application NameApplication IDImpact TypeChange DetailsComments
Legacy NW-CommerceAPP-0412DecommissionRetired after 3-month parallel run2016 Oracle Commerce monolith
SAP ERPAPP-0001Modify (consume)New order hand-off queue integrationExisting APIs; no SAP-side changes
Warehouse ManagementAPP-0214UseOrder events consumed via existing topicNo changes required
Loyalty PlatformAPP-0417UseCustomer identity linkage via shared Cognito attributeMinor attribute mapping update
Corporate OktaAPP-0099UseAdmin access for merchandisers and opsExisting federation
Salesforce Marketing CloudAPP-0601Use (indirect)Customer events flow via Segment CDPNo direct integration from NorthWind Online
PatternWhere AppliedRationale
MicroservicesDomain-aligned services (catalogue, basket, checkout, order, customer, search, promotion)Independent scaling, deployment, fault isolation; smaller blast radius during peak
API GatewayAWS API Gateway fronting all servicesCentralised throttling, WAF integration, auth enforcement, contract versioning
Strangler FigTransition from legacy NW-CommerceTraffic gradually shifted per domain (search first, checkout last) via CloudFront routing rules
Backend-for-Frontend (BFF)Mobile BFF service composing catalogue, basket and customer callsReduces round-trips for mobile clients on cellular networks; tailored payloads
Event-Driven (Pub-Sub)Order events to SAP and SegmentDecouples downstream systems from checkout latency path
Cache-AsideCatalogue and pricing reads via RedisReduces Aurora read load during peak; P95 latency improvement
Circuit BreakerStripe and SAP integrationsPrevents cascading failure when a downstream dependency degrades

3.1.6 Technology & Vendor Lock-in Assessment

Section titled “3.1.6 Technology & Vendor Lock-in Assessment”
Component / ServiceVendor / TechnologyLock-in LevelMitigationPortability Notes
AWS EKSAWS (Kubernetes)LowStandard Kubernetes manifests; Helm chartsPortable to AKS, GKE, or self-managed
RDS Aurora PostgreSQLAWS (PostgreSQL-compatible)ModerateAurora-specific features avoided where possible; standard PostgreSQL schemaMigratable to standard PostgreSQL with minor effort (pg_dump / logical replication)
CloudFront + WAFAWSLowCache behaviours are declarative; rules documentedReplaceable with Cloudflare or Akamai
AWS CognitoAWSModerateStandard OIDC claims; customer identity data exportableMigration to alternative IDP (e.g. Auth0, Okta CIAM) would require password reset cycle
StripeStripe Inc.HighPayment abstraction layer in Checkout Service isolates Stripe SDK; documented migration plan to alternative PSPMigration would be a 6-9 month programme; vouchers/stored cards need reissue
SendGridTwilioLowStandard SMTP / REST API; templates are HTMLEasily swapped with AWS SES, Mailgun, Postmark
OpenSearchAWS (Apache 2.0 fork)LowStandard Elasticsearch query DSLFully compatible with Elasticsearch 7.10 and OpenSearch self-hosted
Segment CDPTwilioModerateThin event emission layer; tracking plan documentedMigration to alternative CDP requires event replay

Primary data flow — Customer places an order:

  1. Customer browses the storefront; Next.js SSR calls Catalogue and Search services via API Gateway for product listings.
  2. Customer adds items to basket; Basket Service persists state to Redis (keyed by basket ID).
  3. Customer proceeds to checkout; Checkout Service validates the basket, calculates shipping and applies promotions.
  4. Browser loads Stripe Elements iframe; customer enters card details directly into Stripe-hosted input fields. PAN never reaches NorthWind systems.
  5. Stripe returns a payment method token to the browser; browser forwards the token to Checkout Service.
  6. Checkout Service calls Stripe’s PaymentIntent API with the token; Stripe performs 3-D Secure challenge if required.
  7. On successful authorisation, Order Service creates the order record in Aurora and emits an OrderCreated event to SQS.
  8. Order Service triggers a transactional email via SendGrid and a customer event to Segment CDP.
  9. A downstream consumer (SAP integration Lambda) processes the SQS queue and calls SAP’s REST API to create the sales order.
Source ComponentDestination ComponentProtocol / EncryptionAuthentication MethodPurpose
Next.js StorefrontAPI GatewayHTTPS / TLS 1.3AWS SigV4 (server-side)Render product and catalogue data server-side
Mobile AppAPI GatewayHTTPS / TLS 1.3OAuth 2.0 (Cognito) + API keyMobile client API access
API GatewayMicroservices (EKS)HTTPS / TLS 1.2 within VPCIAM (IRSA)Route requests to service pods
MicroservicesAurora PostgreSQLTCP/TLS 1.2 (PostgreSQL protocol)IAM database authenticationRead/write authoritative data
MicroservicesElastiCache RedisTLS 1.2AUTH token (Secrets Manager)Cache and basket state
Search ServiceOpenSearchHTTPS / TLS 1.2IAM with fine-grained accessSearch queries and index updates
Order ServiceSQSHTTPS / TLS 1.2IAM rolePublish order events
SAP Integration LambdaSQSHTTPS / TLS 1.2IAM roleConsume order events
Source ApplicationDestination ApplicationProtocol / EncryptionAuthenticationSecurity ProxyPurpose
Customer browser / mobile appCloudFrontHTTPS / TLS 1.3None (public)AWS WAF, Shield StandardPublic storefront and API access
Checkout ServiceStripeHTTPS / TLS 1.3Stripe secret key (Secrets Manager)NAT Gateway (fixed IP)Payment authorisation and capture
Customer browserStripe (direct)HTTPS / TLS 1.3Stripe publishable keyN/A (client-side)Card tokenisation via Stripe Elements
Order ServiceSendGridHTTPS / TLS 1.3API key (Secrets Manager)NAT GatewayTransactional email delivery
SAP Integration LambdaSAP ERPHTTPS / TLS 1.2OAuth 2.0 client credentialsSite-to-Site VPN to on-premSales order creation
API Gateway / StorefrontSegment CDPHTTPS / TLS 1.3Write keyN/ACustomer event capture
Admin usersAdmin portal (CloudFront origin)HTTPS / TLS 1.3Okta SSO (OIDC)VPN + WAFMerchandiser and operations access
User TypeAccess MethodAuthenticationProtocol
Retail customers (web)Web browser, public InternetAWS Cognito (email + password, optional social, optional MFA)HTTPS / TLS 1.3
Retail customers (mobile)Native app (iOS / Android)AWS Cognito (OAuth 2.0 authorisation code + PKCE)HTTPS / TLS 1.3
MerchandisersAdmin web portalOkta SSO + MFAHTTPS / TLS 1.3
SRE / Operationskubectl, AWS Console, DatadogOkta SSO via AWS IAM Identity CentreHTTPS / TLS 1.3
API / InterfaceTypeDirectionFormatVersionDocumentation
Catalogue APIRESTExposedJSONv1Internal developer portal (Swagger)
Basket APIRESTExposedJSONv1Internal developer portal
Checkout APIRESTExposedJSONv1Internal developer portal
Order APIRESTExposedJSONv1Internal developer portal
Customer APIRESTExposedJSONv1Internal developer portal
Stripe PaymentIntentsRESTConsumedJSON2024-06-20Stripe API reference
SendGrid MailRESTConsumedJSONv3SendGrid API reference
SAP Sales Order APIRESTConsumedJSONv2SAP team wiki
Segment TrackRESTConsumedJSONv1Segment API reference

graph TD
  R53[Route 53] --> CF[CloudFront + WAF + Shield]
  CF --> ALB[Application Load Balancer]
  subgraph Primary[eu-west-2 London - 2 AZs]
      subgraph Public[Public Subnets]
          ALB
          NAT[NAT Gateways]
      end
      subgraph Private[Private Subnets]
          EKS[EKS Node Groups]
          Aurora[Aurora PostgreSQL Multi-AZ]
          Redis[ElastiCache Redis]
          OS[OpenSearch]
      end
  end
  ALB --> EKS
  EKS --> Aurora
  EKS --> Redis
  EKS --> OS
  EKS --> NAT
  NAT --> Stripe[Stripe]
  NAT --> SG[SendGrid]
  subgraph DR[eu-west-1 Ireland - Pilot Light]
      AuroraDR[Aurora Global Replica]
      OSDR[OpenSearch Replica]
  end
  Aurora -- Global DB --> AuroraDR
Deployment architecture: CloudFront fronts the platform with WAF and Shield. An Application Load Balancer distributes to EKS node groups across two Availability Zones in eu-west-2 (London). Aurora PostgreSQL, ElastiCache Redis, and OpenSearch are Multi-AZ. DR is a pilot-light deployment in eu-west-1 Ireland.
AttributeSelection
Hosting Venue TypePublic Cloud
Hosting Region(s)UK (eu-west-2 London — primary), Ireland (eu-west-1 — DR, non-PII only)
Service ModelPaaS (EKS, Aurora, ElastiCache, OpenSearch) and SaaS (Stripe, SendGrid, Segment)
Cloud ProviderAWS
Account / Subscription TypeNorthWind AWS Organisation — nwo-prod workload account
AttributeDetail
Container PlatformAmazon EKS 1.29
Base Image(s)node:20-alpine (hardened and signed via corporate base image pipeline)
Cluster Size2 managed node groups: application (4-24 nodes, auto-scaling via Karpenter) and platform (3 nodes)
Node Instance Typec7g.xlarge (Graviton3, 4 vCPU, 8 GB RAM) for application; m7g.large for platform
Pod Resource LimitsCatalogue/Search: 1 vCPU / 1.5 GB; Basket/Checkout/Order: 750m vCPU / 1 GB; Customer/Promotion: 500m vCPU / 768 MB
Pod Replicas (steady state)Catalogue: 6-24 (HPA); Search: 4-16; Basket: 6-30; Checkout: 4-20; Order: 4-16; Customer: 3-10; Promotion: 2-8
  • Anti-Malware — Amazon GuardDuty (runtime protection on EKS)
  • EDR — CrowdStrike Falcon container sensor on application nodes
  • Vulnerability Management — Amazon Inspector (container images, EC2 nodes)
QuestionResponse
Is this an Internet-facing application?Yes — customer-facing web and mobile
Outbound Internet connectivity required?Yes — Stripe, SendGrid, Segment (via NAT Gateway with fixed Elastic IPs allowlisted by partners)
Cloud-to-on-premises connectivity required?Yes — SAP ERP is still on-premises; Site-to-Site VPN with IPsec (corporate Direct Connect planned for 2027)
Wireless networking required?No
Third-party / co-location connectivity required?No (all third parties over public Internet with TLS)
Cloud network peering required?Yes — VPC peering to the NorthWind Shared Services VPC for Datadog, Secrets Manager reach and corporate DNS
AttributeSelection
User access methodWeb (HTTPS), Mobile native apps
User locationsUK-predominant, Internet (global access permitted)
Administrator access methodAWS Console via IAM Identity Centre; kubectl via EKS OIDC; bastion-less (SSM Session Manager for emergency OS access)
VPN requiredYes — for administrator access only (corporate VPN)
Direct Connect / ExpressRouteNo (planned 2027); currently Site-to-Site VPN to Basingstoke data centre for SAP
ProtocolUsed?Purpose
HTTPS (TLS 1.2+)YesAll customer and API traffic (TLS 1.3 on CloudFront; TLS 1.2 minimum internally)
WebSocketNoNot required for current use cases
SFTPNo
JDBCNoPostgreSQL protocol used instead of JDBC
TCP (other)YesPostgreSQL and Redis within VPC (TLS)
gRPCNo
MetricValue
Peak egress bandwidth to Internet1.5 Gb/s (Black Friday peak validated 2025)
Peak ingress bandwidth from Internet400 Mb/s
Peak bandwidth to on-premises (SAP VPN)150 Mb/s
Traffic characteristicsSeasonal — very significant peaks during Black Friday, Cyber Monday, Boxing Day and January sale
Latency requirement< 100ms P95 page load; < 200ms P95 API
ControlImplementedDetail
DDoS ProtectionYesAWS Shield Advanced on CloudFront, ALB and Route 53
Rate LimitingYesWAF rate-based rules (2000 req/min per source IP); API Gateway per-route throttling
Web Application Firewall (WAF)YesAWS WAF v2: AWS Managed Rules core set, OWASP Top 10, Known Bad Inputs, bot control
Bot / scraping controlsYesAWS WAF Bot Control managed rule group; CAPTCHA challenge for suspicious patterns
EnvironmentDescriptionCount & VenueCompute Solution
DevelopmentShared dev cluster with preview environments per PR1x AWS (eu-west-2)EKS (3 nodes, m7g.large), Aurora t4g.medium
Test / QAAutomated integration and contract tests1x AWS (eu-west-2)EKS (3 nodes, m7g.large), Aurora t4g.large
Staging / Pre-ProductionProduction-mirror for release validation and load testing1x AWS (eu-west-2)EKS (4-8 nodes, c7g.xlarge), Aurora r7g.large
ProductionLive service1x AWS (eu-west-2), Multi-AZEKS (4-24 nodes, c7g.xlarge), Aurora r7g.xlarge Multi-AZ
DRPilot-light disaster recovery1x AWS (eu-west-1)EKS (2 nodes, scaled up on failover), Aurora Global Database secondary

Non-production environments scale down outside business hours.

QuestionResponse
Hosting region chosen for low carbon intensityeu-west-2 (London) — chosen primarily for UK customer proximity and data residency. AWS published carbon intensity for eu-west-2 is moderate; AWS commitment to 100% renewable matching by 2025 applies. DR region eu-west-1 (Ireland) operates at lower carbon intensity than the AWS European average.
Non-production environments auto-shutdown out of hoursYes — dev and staging EKS clusters scale to 1-2 system nodes overnight (19:00-07:00 weekdays) and weekends; non-prod Aurora paused via Lambda cron. ~£14k/year saving on non-prod compute.
Compute family chosen for performance-per-wattYes — Graviton3 (c7g/m7g) throughout; AWS published data shows ~60% better performance-per-watt vs equivalent x86. CloudFront and S3 reduce origin compute.
Auto-scaling configured to release capacity when idleYes — Karpenter consolidates underutilised nodes; HPA on CPU + custom queue-depth metrics; nodes scaled down within 5 minutes of becoming idle. Black Friday peak fleet (~24 nodes) scales back to 8 within 2 hours of peak passing.
DR strategy proportionatePilot-light (Aurora Global Database secondary + minimal EKS) chosen over warm standby. RTO 4 hours, RPO 1 minute. Hot active-active was rejected: unnecessary for the business RTO and would have ~50% additional always-on compute and replication carbon cost.

Data NameStore TechnologyAuthoritative?Retention PeriodData SizeClassificationPersonal Data?Encryption LevelKey Management
Product catalogueAurora PostgreSQL 15No (SAP is master)Refreshed continuously8 GBInternalNoStorage (AES-256)AWS KMS CMK
Customer profileAurora PostgreSQL 15Yes7 years after last activity40 GBRestrictedYes (name, email, address, phone)Storage + Application (field-level for sensitive attributes)AWS KMS CMK (annual rotation)
Order historyAurora PostgreSQL 15Yes7 years (financial record)220 GB (growing 12 GB/month)RestrictedYes (delivery address, email)Storage + Application (field-level)AWS KMS CMK
Basket stateElastiCache Redis 7.xYes (transient)TTL 24 hours (anonymous); 30 days (signed-in)4 GB in-memoryInternalYes (items, no card data)In-transit (TLS) + At-restAWS KMS (ElastiCache-managed)
Search indexOpenSearch 2.xNo (rebuilt from catalogue)Continuous10 GBInternalNoStorage (AES-256)AWS KMS
Product imagesS3YesLife of product + 3 years1.2 TBPublicNoStorage (SSE-S3)AWS-managed
Stripe payment tokensAurora PostgreSQL 15No (Stripe is master)Life of customer5 GBRestrictedNo (opaque tokens, not PAN)Storage + Application (field-level)AWS KMS CMK
Application logsDatadog + S3 archiveNo15 months (Datadog), 7 years (S3 for audit events)80 GB/monthInternal (logs) / Restricted (audit)PII redacted at sourceStorageAWS KMS / Datadog-managed
Customer eventsSegment CDP (SaaS)Yes (streamed)Governed by Segment contract (13 months)~500 GB/monthInternalYes (behavioural)In-transit (TLS)Segment-managed
AttributeDetail
Storage ProductAmazon Aurora PostgreSQL (Multi-AZ + Global Database), ElastiCache Redis, OpenSearch, S3
Storage SizeAurora: 300 GB (growing); Redis: 8 GB; OpenSearch: 10 GB; S3: 1.2 TB
ReplicationAurora: 6-way replication across 3 AZs + cross-region replica (Global DB); Redis: primary + 1 replica per shard; OpenSearch: 3 data nodes across 3 AZs; S3: cross-region replication for audit data
Minimum RPO1 minute (Aurora continuous backup)
Classification LevelData TypesHandling Requirements
PublicProduct catalogue, images, merchandising copyOpen access, CDN-cacheable, versioning
InternalApplication logs (PII-redacted), infrastructure metrics, search indexInternal access only, standard encryption at rest, VPC-only reachability
RestrictedCustomer PII (profile, address, email), order history, payment tokens, audit logsEncryption at rest (storage + field-level for selected columns), TLS in transit, access audited, 7-year retention

No cardholder primary account number (PAN) is stored. PAN is tokenised by Stripe Elements at the browser; NorthWind stores only the opaque Stripe payment method token. This keeps the platform out of full PCI-DSS scope (SAQ A-EP applies).

StageDescriptionControls
Creation / IngestionCustomer data entered by customers at sign-up or during checkout; product data synchronised from SAP; events emitted to SegmentInput validation at API Gateway and service layer; PII fields tagged at schema level
ProcessingServices read customer and order data to fulfil requests; field-level decryption only at point of useColumn-level encryption for sensitive PII (Aurora client-side encryption); no PII in logs (structured logger strips marked fields)
StorageAurora (Multi-AZ), Redis (in-memory with persistence disabled for basket), OpenSearch, S3AES-256 at rest via KMS CMK; TLS 1.2 minimum in transit; Aurora automated backups + continuous WAL
Sharing / TransferOrder data to SAP (internal); customer events to Segment (SaaS); transactional email via SendGrid; no data to marketing without consentTLS in transit; API authentication; consent flags checked before event emission; data processing agreements with all third parties
ArchivalAudit logs move from Datadog to S3 after 15 months; S3 lifecycle transitions to Glacier Deep Archive after 1 yearS3 lifecycle policies; retrieval SLA 48 hours from Deep Archive
Deletion / PurgingCustomer erasure requests trigger an async purge job; order data retained 7 years for statutory reasons then deleted; basket data TTL-evictedRight-to-erasure job logs action; retention job scheduled monthly; deletion certificate generated for DPO
Assessment TypeIDStatusLink
Data Protection Impact Assessment (DPIA)DPIA-2025-091Approved by DPOCorporate Confluence / Compliance / DPIA
Legitimate Interest Assessment (LIA) — event analyticsLIA-2025-022ApprovedCorporate Confluence / Compliance / LIA

The DPIA identified a medium-risk processing activity (behavioural event capture for personalisation) which is mitigated by consent-gated event emission and a public-facing privacy portal where customers can view and manage their data.

ApproachSelected
Masked production data used in staging[x]

Production customer data is tokenised into a masked dataset via a scheduled AWS Glue job for staging use. Names, addresses, emails and phone numbers are replaced with synthetic but realistic values derived from the Faker library. Test and dev environments use entirely synthetic data.

  • Yes — Aurora provides ACID transactions and foreign-key constraints; orders are reconciled nightly against SAP via a scheduled integrity job; discrepancies alert to Finance operations.
  • Yes (limited) — mobile apps cache the product catalogue and basket for offline browsing. No payment data or full PII (beyond display name) is cached. Mobile caches are encrypted at rest via platform keychain / keystore.
DestinationData TypeClassificationTransfer MethodProtection
StripePayment tokens and purchase amountRestricted (tokens are non-PII; opaque)REST API over HTTPS / TLS 1.3API key; IP allowlist; contractual DPA
SendGridCustomer email and order summaryRestrictedREST API over HTTPS / TLS 1.3API key; transactional-only templates; contractual DPA
Segment CDPBehavioural events with pseudonymous customer IDInternalREST API over HTTPS / TLS 1.3Write key; consent-gated emission; contractual DPA
SAP ERP (internal)Order and customer delivery dataRestrictedREST API over Site-to-Site VPNOAuth 2.0; internal network path
DatadogApplication logs (PII redacted)InternalTLS 1.3API key; redaction pipeline at source
  • Yes — customer PII and order data must remain in the UK (eu-west-2 London). The DR region (eu-west-1 Ireland) contains only operational telemetry. Aurora Global Database is configured to replicate non-PII schemas only; customer PII tables are replicated via a filtered logical replication stream terminated at a UK-only subsystem. Segment is configured to use its EU data plane; Stripe operates under UK and EU safeguards under standard contractual clauses.
QuestionResponse
Retention periods minimisedCustomer order data 7 years (HMRC); browsing/clickstream 25 months (legitimate interest basis); inactive customer accounts archived after 3 years inactivity (PII deleted at 5 years); session data ≤ 24 hours. Lifecycle policies enforce automated expiry.
Older data tiered to cold/archive storageYes — order archives transition S3 Standard → Intelligent-Tiering → Glacier IR (90 days) → Glacier Deep Archive (1 year). Aurora cold tables exported to S3 quarterly. ~75% of historical data sits in archive tiers.
Unused or duplicate replicasSingle Aurora primary + 1 DR replica; no read-replicas (SearchKit + ElastiCache absorb read load). Quarterly review of S3 buckets via AWS Trusted Advisor.
Compression appliedBrotli on HTTPS (~70% reduction on JSON catalogue payloads); WebP/AVIF for product images (CloudFront origin transformation); Parquet+Snappy for BigQuery exports.
Cross-region replication justifiedAurora Global Database secondary required by DR RPO (1 min). Customer PII tables explicitly excluded from cross-region replication (sovereignty + reduced carbon cost).
Large data transfers off-peakNightly Snowflake export 02:00-04:00 UTC; weekly partner reconciliations Sunday 03:00 UTC; both align with low-carbon-intensity periods on the UK grid.

QuestionResponse
Does the solution support regulated activities?Yes — accepts card payments under PCI-DSS v4.0 (scope reduced to SAQ A-EP via Stripe Elements)
Is the solution SaaS or third-party hosted?No — self-managed on AWS; key SaaS dependencies: Stripe, SendGrid, Segment
Has a third-party risk assessment been completed?Yes — AWS: TPRA-2024-001 (approved); Stripe: TPRA-2024-018 (approved); SendGrid: TPRA-2024-031 (approved); Segment: TPRA-2025-007 (approved)
Impact CategoryBusiness Impact if Compromised
ConfidentialityHigh — exposure of 12M customer PII would trigger ICO notification, potential GDPR fines (up to 4% of turnover = £112M), and severe brand damage
IntegrityHigh — manipulated prices or promotions could cause direct financial loss and regulatory scrutiny under consumer protection law
AvailabilityCritical — revenue loss up to £30M/day during peak trading; board-level visibility
Non-RepudiationMedium — order audit trail supports dispute resolution and card scheme chargeback defence

A STRIDE-based threat model was produced (SEC-TM-2025-044). Headline threats:

ThreatAttack VectorLikelihoodImpactMitigation
Credential stuffing on customer loginBots replaying leaked credentialsHighHighWAF Bot Control, rate limiting, Have-I-Been-Pwned password check at sign-up, optional MFA, device-fingerprint anomaly detection
Checkout injection / parameter tamperingManipulated basket or promotion parametersMediumHighServer-side price recalculation, signed basket IDs, input validation, audit log of pricing decisions
Card scraping via JavaScript injection (Magecart)Malicious third-party script on storefrontMediumCriticalStripe Elements isolates card entry in Stripe-controlled iframe; Content Security Policy blocks unauthorised scripts; Subresource Integrity on third-party scripts
DDoS on checkout pathVolumetric or application-layer attackMediumHighAWS Shield Advanced, WAF rate-based rules, CloudFront edge absorption
API abuse by compromised mobile appReverse-engineered app making unauthorised callsMediumMediumCognito token binding, per-device rate limits, app attestation (iOS DeviceCheck, Android Play Integrity)
Insider threat (admin misuse)Privileged user exfiltrates customer dataLowCriticalJust-in-time elevation via AWS IAM Identity Centre, session recording, alerting on bulk PII queries
Access TypeRole(s)Destination(s)Authentication MethodCredential Protection
Customer sign-in (web)CustomerStorefront, APIsAWS Cognito (email + password, optional social, optional MFA)Cognito-managed password policy (12 char min, complexity, breach detection); hashed in Cognito
Customer sign-in (mobile)CustomerAPIs via mobile appOAuth 2.0 authorisation code + PKCERefresh tokens stored in platform keychain / keystore
Guest checkoutGuestCheckout APIsAnonymous session with signed basket tokenShort-lived token (30 min), bound to IP and user-agent
Access TypeRole(s)Destination(s)Authentication MethodCredential Protection
MerchandisersCatalogue AdminAdmin portalOkta SSO + MFA (push / FIDO2)Corporate password policy (90-day rotation)
SRE / OperationsSRE EngineerAWS Console, kubectl, DatadogOkta SSO via IAM Identity Centre; kubectl via EKS OIDCShort-lived session (8 hours); hardware MFA preferred
Service accountsMicroservices, LambdaAWS services, SaaS APIsIAM roles (IRSA for pods); short-lived Secrets Manager retrieval for Stripe/SendGrid API keysNo long-lived AWS credentials
ControlResponse
Does the application use SSO or group-wide authentication?Yes — Okta for internal; Cognito for customer
What is the unique identifier for user accounts?Internal: Okta user ID; Customer: Cognito sub (UUID)
What is the authentication flow?Internal: OIDC Authorization Code + PKCE; Customer: Cognito hosted UI (web) or native OAuth flow (mobile)
What are the credential complexity rules?Customer: 12 char min, mixed case, number, symbol; Cognito breach detection; Internal: Okta policy
What are the account lockout rules?Customer: 5 failed attempts in 10 minutes -> 30-minute lockout + optional CAPTCHA; Internal: Okta policy
How can users reset forgotten credentials?Customer: self-service via email link with time-limited token; Internal: Okta self-service with MFA
ControlResponse
How are sessions established after authentication?Customer: Cognito-issued JWT (ID, access, refresh); Internal: OIDC session cookie (HttpOnly, Secure, SameSite=Lax)
How are session tokens protected against misuse?Tokens signed by Cognito (RS256); access tokens 1-hour expiry; refresh rotation; bound to client IP for admin sessions
What are the session timeout and concurrency limits?Customer access token: 1 hour; refresh: 30 days rolling; Internal: 8 hour absolute
Access TypeRole / ScopeEntitlement StoreProvisioning Process
CustomersCustomer (owns own data only)Cognito groups + enforced by API authorisation middlewareSelf-service sign-up
MerchandisersCatalogue Admin, Promotion AdminOkta groups mapped to Cognito admin claimsRequest via ServiceNow; line-manager approval
SRE / OperationsSRE Engineer (full), Read-Only ObserverAWS IAM Identity Centre permission sets + Kubernetes RBACTerraform-managed; quarterly recertification
Service accountsService-specific least privilegeIAM policies attached to IRSA rolesTerraform; pre-commit policy check (tfsec)

3.5.3 Network Security & Perimeter Protection

Section titled “3.5.3 Network Security & Perimeter Protection”
ControlImplementation
Network segmentationVPC with public, private and data subnets across 2 AZs; security groups per service; NACLs as secondary layer; Kubernetes network policies for pod-to-pod
Ingress filteringCloudFront -> AWS WAF v2 (managed rules, rate limits, bot control) -> ALB; Shield Advanced
Egress filteringNAT Gateways with fixed Elastic IPs for partner allowlisting; egress security groups restrict destinations; VPC Flow Logs
Encryption in transitTLS 1.3 enforced on CloudFront; TLS 1.2 minimum everywhere else; ACM-managed public certificates; private CA for internal mTLS on service mesh
AttributeDetail
Encryption deployment levelStorage (all data stores) + Application (field-level for selected PII columns)
Key typeSymmetric
Algorithm / cipher / key lengthAES-256-GCM (field-level); AES-256 (Aurora, Redis, OpenSearch, S3)
Key generation methodAWS KMS (HSM-backed, FIPS 140-2 Level 3)
Key storageAWS KMS (customer-managed keys per data classification)
Key rotation scheduleAnnual automatic rotation (KMS); field-level encryption keys rotated every 12 months via re-encryption job
AttributeDetail
Secret storeAWS Secrets Manager (Stripe keys, SendGrid keys, Aurora credentials, Segment write keys)
Secret distributionRetrieved at runtime by services via AWS SDK + IRSA; never written to container images or environment variables at build time
Secret rotationAurora credentials: automatic 30-day rotation via Lambda; SaaS API keys: manual 90-day rotation with calendar reminders to owning engineer

3.5.5 Security Monitoring & Threat Detection

Section titled “3.5.5 Security Monitoring & Threat Detection”
CapabilityImplementation
Security event loggingAll authentication events, authorisation failures, admin actions, WAF blocks, payment events, customer account changes. Logs forwarded to Datadog and archived to S3
SIEM integrationDatadog Cloud SIEM with custom detection rules; high-severity events mirrored to corporate Splunk for cross-platform correlation
Infrastructure event detectionAWS GuardDuty (all accounts); CloudTrail (all API calls); VPC Flow Logs
Security alertingPagerDuty for P1/P2; Slack channel for P3; security operations on-call 24x7 during peak trading windows

UC-01: Customer Places an Order (Card Payment)

AttributeDetail
Actor(s)Retail customer (signed-in or guest)
TriggerCustomer clicks “Pay now” at checkout
Pre-conditionsBasket is valid; customer has provided delivery and billing details; Stripe Elements has loaded
Main Flow1. Customer enters card details into Stripe Elements iframe; Stripe returns a payment method token to the browser. 2. Browser posts the token to Checkout Service. 3. Checkout Service revalidates basket and price server-side. 4. Checkout Service calls Stripe PaymentIntent.confirm with the token. 5. Stripe performs 3-D Secure challenge if required; customer completes in-browser. 6. On success, Checkout Service calls Order Service to create the order. 7. Order Service writes to Aurora and publishes OrderCreated to SQS. 8. Order Service triggers transactional email via SendGrid. 9. Customer redirected to order-confirmation page. 10. SAP integration Lambda consumes SQS and creates the sales order in SAP.
Post-conditionsCustomer sees confirmation; order visible in “My Orders”; SAP has sales order; email sent; event emitted to Segment
Views InvolvedLogical, Integration & Data Flow, Physical, Data, Security

UC-02: Black Friday Traffic Surge

AttributeDetail
Actor(s)Retail customers (aggregate); SRE on-call
Trigger18:00 Black Friday launch; traffic surges from 200 to 2,400+ orders/min
Pre-conditionsPlatform warmed; capacity plan executed; “freeze” window in force (no deployments)
Main Flow1. CloudFront absorbs cacheable product-detail traffic at the edge. 2. HPAs scale Catalogue and Search pods (6 to 24 pods within 90 seconds). 3. Karpenter provisions additional EKS nodes. 4. Aurora read-replica auto-scaling adds 2 replicas within 3 minutes. 5. WAF rate-based rules throttle abusive IPs. 6. P95 API latency rises to 240ms but remains within SLA; error rate held below 0.05%. 7. SRE on-call monitors Datadog dashboard; no manual intervention required.
Post-conditionsPeak traffic absorbed without service degradation; post-event review captures metrics for next year
Views InvolvedLogical, Physical, Performance

UC-03: Customer Requests Right-to-Erasure

AttributeDetail
Actor(s)Customer; DPO team
TriggerCustomer submits erasure request via privacy portal
Pre-conditionsCustomer is authenticated; consent model supports erasure request
Main Flow1. Customer submits request via Customer Service portal. 2. Customer Service queues an erasure job (SQS erasure queue). 3. Erasure Lambda anonymises PII in Aurora (customer record retained as pseudonymised placeholder for financial/order integrity); order records retain statutory minimum for 7 years. 4. Cognito account deleted. 5. Segment is sent a user.delete call to purge behavioural events. 6. SendGrid suppression list updated. 7. Customer receives confirmation email. 8. DPO notified via dashboard; audit record retained.
Post-conditionsCustomer PII removed within 30 days (UK GDPR statutory timeframe); audit trail preserved
Views InvolvedLogical, Data, Security

3.6.2 Architecture Decision Records (ADRs)

Section titled “3.6.2 Architecture Decision Records (ADRs)”

ADR-001: PostgreSQL (Aurora) over MySQL for Transactional Store

FieldContent
StatusAccepted
Date2025-08-05
ContextThe platform requires a relational database for catalogue, customer, order and promotion data. Both Aurora PostgreSQL and Aurora MySQL are approved under the Cloud Landing Zone Standard.
DecisionUse Amazon RDS Aurora PostgreSQL 15.
Alternatives ConsideredAurora MySQL: Widely used at NorthWind but weaker JSONB support for semi-structured product attributes; the team found the MySQL JSON functions awkward for catalogue filtering. DynamoDB: Rejected because the data is strongly relational (customer -> orders -> order-lines) and multi-row ACID is a hard requirement for checkout.
ConsequencesPositive: rich JSONB for flexible catalogue attributes, stronger CTE and window function support for reporting, PostGIS available for store-locator if later needed, excellent observability via pg_stat_statements. Negative: less internal familiarity than MySQL; training investment needed for the ops team (closed via a 3-day workshop).
Quality Attribute TradeoffsPerformance: comparable (positive). Maintainability: PostgreSQL richer ecosystem for our data model (positive). Operational Excellence: increased training cost (negative, one-off).

ADR-002: Next.js SSR over Client-Only SPA for the Storefront

FieldContent
StatusAccepted
Date2025-08-12
ContextThe storefront must be highly discoverable via search engines (organic search is 42% of customer acquisition) and must deliver first-paint quickly on cellular networks.
DecisionUse Next.js 14 with server-side rendering (SSR) for category, product and landing pages; use incremental static regeneration (ISR) for campaign pages; use client-side rendering only for the account area.
Alternatives ConsideredClient-only SPA (React + Vite): Simpler operationally but poor SEO, slower first contentful paint, and heavy JavaScript bundle on mobile. Static site (Gatsby / Astro): Good for marketing pages but cannot handle the dynamic, personalised storefront.
ConsequencesPositive: strong SEO, improved Core Web Vitals (LCP improved from 3.1s to 1.4s in prototype), identical rendering for crawlers and users. Negative: additional server capacity for SSR (budget allocated); cache invalidation more complex than pure static.
Quality Attribute TradeoffsPerformance: major improvement (positive). Cost: increased compute for SSR (negative, quantified and accepted). Reliability: SSR failure could impact page rendering — mitigated by graceful fallback to client-side hydration.

ADR-003: Stripe Elements Tokenisation to Reduce PCI-DSS Scope

FieldContent
StatusAccepted
Date2025-09-02
ContextThe legacy platform is in full PCI-DSS scope (SAQ D) because cardholder data enters application servers. This imposes substantial audit and remediation cost. The target is SAQ A-EP via client-side tokenisation.
DecisionIntegrate Stripe Elements so that card data is entered into a Stripe-hosted iframe and never traverses NorthWind servers. Only opaque Stripe payment method tokens are stored.
Alternatives ConsideredDirect card acceptance into Checkout Service: Rejected — expands PCI scope to the entire platform. Stripe Checkout redirect: Rejected — breaks the custom checkout UX the business requires. Alternative PSP (Adyen, Worldpay): Evaluated; Stripe selected due to existing group-wide contract and superior developer experience.
ConsequencesPositive: SAQ A-EP scope achieved (annual audit cost reduced by an estimated £240k/year); reduced blast radius in the event of a storefront compromise. Negative: Stripe vendor lock-in is elevated (see R-002); Stripe outage would halt all card payments.
Quality Attribute TradeoffsSecurity: major reduction in scope and risk (positive). Cost: lower audit cost (positive); Stripe transaction fees higher than some alternatives (negative, small). Reliability: additional SaaS dependency (negative, mitigated by fallback messaging during Stripe outage).

Log TypeEvents LoggedLocal StorageRetention PeriodRemote Services
Application logsAPI request/response metadata (PII redacted), errors, business eventsstdout (container)EphemeralDatadog (15 months), S3 archive (7 years for audit)
Data store logsAurora slow query log, PostgreSQL error logRDS log files7 daysDatadog
Infrastructure logsEKS control plane, node logs, VPC Flow LogsCloudWatch Logs90 daysDatadog (subset)
Security event logsAuth events, admin actions, WAF blocks, GuardDuty findingsCloudWatch Logs + S37 years in S3Datadog Cloud SIEM + Splunk

4.1.2 Observability — Monitoring & Alerting

Section titled “4.1.2 Observability — Monitoring & Alerting”
Alert CategoryTrigger ConditionNotification MethodRecipient
API error rate> 0.5% of requests over 5 minutesPagerDuty P1SRE on-call
Checkout conversion dropConversion rate < 80% of 7-day baselinePagerDuty P1SRE on-call + Commerce lead
LatencyP95 API latency > 400ms over 5 minutesPagerDuty P2SRE on-call
Stripe failure rate> 2% of payment attempts failingPagerDuty P1SRE + Payments lead
Aurora CPU> 85% for 10 minutesPagerDuty P2SRE + DBA
Peak-readiness drill failureAny scheduled drill failsSlack + EmailPlatform team + SRE
WAF rule trigger spike> 1000 blocks/min sustainedSlackSecurity ops
Certificate expiry< 30 days to expiryEmailPlatform team
CapabilityToolCoverage
Application Performance MonitoringDatadog APMAll microservices, Next.js storefront, Lambda
Infrastructure MonitoringDatadog + CloudWatchEKS, Aurora, ElastiCache, OpenSearch, API Gateway
Log AggregationDatadog LogsApplication, infrastructure, security logs
Distributed TracingDatadog APM tracingFull request tracing from CloudFront to Aurora
Real User MonitoringDatadog RUMStorefront and mobile app user experience
DashboardsDatadogExecutive, SRE, peak-readiness, per-service dashboards
AlertingPagerDutyP1-P3 alerts; on-call rotation
ProcedureDescriptionOwnerDocumentation
Incident responseP1: 15-min response, P2: 30-min; ITIL-aligned; blameless post-incident review within 48 hoursSRE Lead (Sally Doe)Corporate Confluence / Ops / Runbooks
Change managementAll changes via GitHub PR; production requires 2 approvals; change freeze from 1 November to 31 December (peak trading)SRE LeadCorporate Confluence
Peak-readiness drillMonthly load test at 2x current peak against staging; full game-day 4 weeks before Black FridaySRE Lead + PlatformCorporate Confluence
On-call24x7, 1-week rotation, 6-engineer pool; secondary on-call during Nov/Dec peakSRE LeadPagerDuty

4.2.1 Geographic Footprint & Disaster Recovery

Section titled “4.2.1 Geographic Footprint & Disaster Recovery”
QuestionResponse
Is the application deployed across multiple hosting venues for continuity?Yes — eu-west-2 (London) primary; eu-west-1 (Ireland) pilot-light DR
What is the DR strategy?Pilot Light. DR region has Aurora Global Database secondary (continuous replication, 1-minute RPO), minimum EKS node group (2 nodes), pre-provisioned OpenSearch snapshot restore. Scaled up on failover.
Are there data sovereignty requirements affecting geographic choices?Yes — PII must remain in UK (eu-west-2); DR carries only non-PII operational data; failover including PII requires DPO approval
AttributeResponse
Scaling capabilityFull auto-scaling — HPA on all services (CPU + custom request-rate metric); Karpenter for EKS node provisioning; Aurora read-replica auto-scaling
Scaling detailsValidated to 3x current peak (approx. 7,000 orders/min) during 2025 staging game-day. Cold-start expansion from baseline to peak in 4 minutes.
AttributeResponse
Dependencies adequately sized?Yes (confirmed) — Stripe SLA supports 10k TPS; SendGrid transactional sending limits raised to 500k/day by arrangement; SAP order queue sized for 5,000 orders/min peak (buffered via SQS)
Dependency detailsSQS buffering protects against SAP slow-down; circuit breakers prevent cascade failure. OpenSearch indexing throttles to 2,000 docs/sec during peak reindex.
  • Yes — designed with fault tolerance patterns:
    • Component failures: Each microservice runs 3+ replicas across 2 AZs; Kubernetes reschedules failed pods; pod disruption budgets enforced.
    • Graceful degradation: If Stripe is unavailable, the storefront disables the “Pay now” button and surfaces a clear message with a “Notify me” option; no partial orders are created.
    • Circuit breakers: Stripe (open after 5 failures, half-open after 30s) and SAP (open after 3 failures, half-open after 60s); opossum library.
    • Health checks: Kubernetes liveness (/health/live, 10s), readiness (/health/ready, 5s, checks DB + Redis reachability).
    • Testing: Monthly chaos tests (AWS Fault Injection Service: AZ blackout, pod kill, latency injection); quarterly DR failover drill.
Component / DependencyFailure ModeDetection MethodRecovery BehaviourUser Impact
Single EKS podCrash or OOMKubernetes liveness probeAutomatic restart; traffic drainedTransparent; in-flight requests may retry
Availability ZoneAZ outageCloudWatch + EKS node statusKarpenter provisions replacement nodes in healthy AZ (< 90s)Brief latency increase
Aurora primaryInstance failureAurora health checkAutomatic failover to replica (< 60s)30-60s elevated errors
ElastiCache Redis nodeFailureRedis cluster health checkFailover to replicaBrief cache miss spike; requests fall through to Aurora
StripeOutageHTTP 5xx or timeout; circuit breakerCheckout disabled with customer-facing message; browse/basket continue workingCustomers cannot complete new card purchases
SendGridOutageHTTP error / timeoutTransactional emails queued to SQS for retry; in-app confirmation still shownDelayed order-confirmation email
SAP ERPOutageVPN or HTTP failureOrders buffer in SQS DLQ; replay when SAP recoversNo customer impact; delayed fulfilment
CloudFrontRegional disruptionRoute 53 health checksRoute 53 DNS failover to regional origin (less common)Short disruption
AttributeDetail
Backup strategyAurora continuous backup (point-in-time to any second within retention window); S3 versioning; OpenSearch daily snapshots
Backup product/serviceAWS Backup (centralised), Aurora automated backups, OpenSearch snapshot repository
Backup typeContinuous (Aurora WAL) + Daily snapshot (OpenSearch, Aurora cluster)
Backup frequencyAurora: continuous; OpenSearch: daily 02:00 UTC; S3: real-time versioning
Backup retentionAurora: 35 days; OpenSearch: 30 days; S3 versions: 90 days; audit logs: 7 years
ControlDetail
ImmutabilityAWS Backup Vault Lock (compliance mode, 35 days); S3 Object Lock on audit bucket
EncryptionAll backups encrypted with AWS KMS CMK; cross-region copies re-encrypted
Access controlRestore requires DBA or SRE Lead approval; cross-account backup vault in isolated security account
#ScenarioRecovery ApproachRTORPO
1Single AZ failureAutomatic: Karpenter + Aurora Multi-AZ failover5 minutes0
2Primary region failure (eu-west-2)Manual DR activation: promote Aurora Global DB secondary, scale EKS in eu-west-1, update Route 532 hours1 minute (async replication lag)
3Critical software defectAutomatic: Kubernetes rolls back to last healthy deployment; Argo Rollouts canary analysis15 minutes0
4Ransomware / destructive cyber-attackIsolate affected components; restore from immutable backups (Vault Lock); forensic investigation4 hoursWithin last hourly snapshot
5Accidental data deletionAurora point-in-time recovery1 hour1 minute

MetricTargetMeasurement Method
Storefront LCP (Core Web Vitals)< 1.8s (75th percentile)Datadog RUM
API response time P95< 200ms (steady state), < 400ms (peak)Datadog APM
Checkout success rate> 99.5%Datadog custom metric
Throughput (steady state)600 orders/minDatadog + API Gateway metrics
Throughput (Black Friday peak validated)3,000 orders/min sustained, 4,500 orders/min burstLoad test (k6) and production observation
Error rate< 0.1% 5xx at steady state, < 0.5% at peakAPI Gateway metrics
Search P95< 150msOpenSearch query latency
Cache hit ratio (catalogue)> 88%ElastiCache metrics
AttributeDetail
Performance testing approachMonthly load tests at 2x current peak; quarterly peak-readiness tests at 3x current peak; soak test (72 hours at steady state) before each major release
Testing toolsk6 (Grafana Cloud) for load generation; Datadog for observation
Testing environmentStaging (production-mirror); read-only production smoke tests off-peak
Testing frequencyMonthly (standard); weekly in September/October prior to Black Friday
MetricCurrent1 Year3 Years5 Years
Customers (active)12M13.5M17M21M
Peak orders per minute2,400 (2024 legacy)3,5005,5008,000
Data volume (Aurora)300 GB420 GB750 GB1.2 TB
Daily orders (average)180k220k320k450k
QuestionResponse
Will the current design scale to accommodate projected growth?Yes for 3 years. At 5-year horizon, Aurora vertical scaling is the primary concern; assessment of sharding via Aurora Limitless Database scheduled for 2028 review.
Are there known seasonal or cyclical demand patterns?Strongly seasonal. Black Friday week: 8x baseline; Christmas: 4x; January sale: 3x; Easter: 1.5x; payday (last working day): 1.3x. Capacity plan aligns with retail calendar.

  • Yes — detailed cost model produced using AWS Pricing Calculator and validated against 4 months of running-cost data in staging. Estimated annual opex is £800,000 (production + non-prod + SaaS). Reserved instance / Savings Plan commitment produces an approximately 22% saving versus pure on-demand.
Monthly Cost Breakdown (Production, steady state)
Section titled “Monthly Cost Breakdown (Production, steady state)”
ComponentMonthly Cost (GBP)Notes
EKS cluster (Graviton nodes, 1-year Savings Plan)18,5008-16 nodes average, 24 at peak
Aurora PostgreSQL (Multi-AZ, reserved)11,200r7g.xlarge primary + 2 replicas + Global DB
ElastiCache Redis (reserved)2,8002 shards with replicas
OpenSearch3,4003 x r7g.large data nodes
CloudFront + WAF + Shield Advanced5,600Shield Advanced £2,400/mo; WAF £400/mo; CloudFront ~£2,800/mo
API Gateway1,200Request-based pricing
SQS, EventBridge300Consumption-based
S3 + lifecycle5001.2 TB + audit archive
NAT Gateway + data transfer1,4002 NATs (Multi-AZ) + egress to Stripe/SendGrid/Segment
Datadog6,800APM + Logs + RUM + Cloud SIEM
Stripe14,000Blended rate (varies with volume); £0.20 + 1.4% domestic
SendGrid600Pro plan + transactional volume
Segment4,200Enterprise tier (enterprise contract, allocated to NWO)
Secrets Manager, KMS, Route 53, misc500
Total monthly (production)71,000
Total annual (production)852,000Offset by non-prod auto-shutdown and peak handling premium
Non-production environments5,500/monthDev + Test + Staging, auto-shutdown outside hours
Target annual (all environments)800,000Target achieved via Savings Plans, Graviton and non-prod shutdown
PracticeImplementation
Cost monitoringCorporate CloudHealth + Datadog cost dashboard; weekly review in Platform team
Cost allocationAWS tagging: Project (NWO), Environment, Service, CostCentre (CC-8821)
Reserved capacity1-year Savings Plan (partial upfront) on EKS; 1-year reserved instances on Aurora + ElastiCache
RightsizingMonthly Compute Optimizer review; quarterly pod resource-request review
Waste eliminationNon-prod auto-shutdown 19:00-08:00 weekdays, full weekends (£3k/month saved); Spot instances for non-prod nodes
Budget governanceAWS Budgets alerts at 80%/100% of monthly forecast; any incremental spend > £1,000/month requires Platform Lead approval

QuestionResponse
Has the hosting location been chosen to reduce environmental impact?Partially — eu-west-2 (London) was chosen primarily for data sovereignty; AWS London operates under AWS’s 100% renewable energy commitment achieved in 2023
What is the expected workload demand pattern?Variable — strong UK business-hours pattern with extreme seasonal peaks (Black Friday, Christmas)
QuestionResponse
Must the application be available continuously?Yes — 24x7 customer-facing platform
Can the solution be shut down or scaled down during off-peak hours?Yes — auto-scaling reduces steady-state capacity by ~40% overnight; maintains minimum 3 replicas per service for HA
Are non-production environments configured to downscale or shut down when not in use?Yes — dev, test and staging shut down outside office hours; saves approximately £3,000/month
QuestionResponse
Are resources rightsized to avoid overprovisioning?Yes — pod resource requests based on P95 observed usage; Karpenter consolidates workload onto fewer nodes during low demand
Are the highest performance-per-watt hardware options used?Yes — Graviton3 (ARM) instances throughout; approximately 60% better energy efficiency than equivalent x86 (AWS published data)
Are efficient networking patterns used?VPC endpoints for S3 and SQS to avoid NAT Gateway traffic; CloudFront caches 72% of storefront requests at the edge, reducing origin compute

The application is developed internally by the Digital Commerce team.

AttributeDetail
Source control platformGitHub Enterprise (NorthWind organisation)
CI/CD platformGitHub Actions (corporate standard)
Build automationGitHub Actions workflows on push and PR; npm + Docker multi-stage builds; signed images pushed to ECR
Deployment automationArgo CD (GitOps) for Kubernetes; Terraform for infrastructure; Helm charts
Test automationUnit (Jest), integration (Testcontainers), contract (Pact), accessibility (axe), performance smoke (k6) — all in CI
ControlImplementation
Security requirementsCaptured in threat model (SEC-TM-2025-044); OWASP ASVS L2 baseline
SASTSonarCloud (blocks merge on high/critical)
DASTOWASP ZAP weekly scan against staging
SCASnyk (blocks merge on high/critical CVEs)
Container scanningSnyk Container + Amazon Inspector (continuous on ECR)
Secure codingMandatory annual OWASP training; security champion in each squad; peer review on all PRs
Patch managementCritical CVE: 24h plan, 7-day deployment. High: 30-day. Medium/Low: next scheduled release.
ClassificationSelected?Description
ReplaceYesThe legacy Oracle Commerce / .NET monolith is being entirely replaced with a cloud-native microservices platform
AttributeDetail
Deployment strategyStrangler Fig — traffic migrated domain-by-domain via CloudFront routing rules (search first, then catalogue, then basket, then checkout)
Data migration modePhased — customer accounts migrated in cohorts; order history back-loaded; product catalogue rebuilt from SAP
Data migration methodAWS DMS for customer and order data (Oracle -> Aurora); SAP IDoc stream for catalogue
Data volume to migrateApproximately 240 GB (customer + order history)
End-user cutover approachPhased — 5% traffic cohort for 4 weeks, then 25%, 50%, 100% over 8 weeks
External system cutoverPhased — SAP integration switched over cohort by cohort; loyalty platform integration continues across both
Maximum acceptable downtimeMinutes (hard cut-over windows are 5 minutes, always at 03:00-03:05 UTC on Tuesday)
Rollback planCloudFront routing rules revert traffic to legacy monolith within 5 minutes per cohort; legacy platform retained for 3 months post-100% cut-over
Transient infrastructureYes — AWS DMS replication tasks decommissioned after final cut-over
Test TypeScopeApproachEnvironmentAutomated?
IntegrationService-to-service, database, SaaSTestcontainers in CI; full suite in stagingCI + StagingYes
ContractConsumer-driven contractsPact brokerCI + StagingYes
AccessibilityWCAG 2.2 AAaxe-core + manual reviewCI + StagingPartial
PerformanceLoad, stress, soak, spikek6 + DatadogStagingYes
SecuritySAST, DAST, SCA, annual pen testContinuous + annual by external firmCI + Staging + ProdPartial (pen test manual)
DRFailover, restoreQuarterly scripted drillProd + DRPartial
AttributeDetail
Release frequencyMultiple times daily for services (trunk-based with feature flags); fortnightly release-train for coordinated changes; freeze from 1 November to 31 December (peak trading)
Release processFeature branch -> PR (automated tests + 1 approval) -> merge to main -> auto-deploy staging -> canary (5% for 15 min) -> full production via Argo Rollouts
Feature flagsLaunchDarkly used extensively for progressive roll-out, A/B testing, and kill switches
AttributeDetail
Support modelL1: NorthWind Service Desk (customer-facing triage) + Tier 1 for system alerts; L2: SRE team; L3: Digital Commerce engineering; L4: Solution Architect / CTO
Support hours24x7 (SRE on-call); enhanced coverage November-January (double-up rota)
SLAsExternal (customer-facing, published): 99.95% monthly availability excluding freeze windows. Internal: P1 response < 15 min, P2 < 30 min, P3 < 4 hours
Escalation pathsL1 -> L2 (15 min) -> L3 (30 min) -> L4 (1 hour). Security incidents: CISO notified immediately.
QuestionResponse
Non-prod auto-shutdown scheduleEKS dev/staging scale to system-only 19:00-07:00 weekdays + weekends; Aurora non-prod paused via Lambda cron; AWS Config rule alerts FinOps on non-prod resources running > 24h without exception.
Right-sizing review cadenceQuarterly via AWS Compute Optimizer + Datadog. Last review (Q1 2026) downgraded 24 over-provisioned pods, recovering ~£3,200/month.
Unused / orphaned resource reclamationWeekly Lambda tags resources idle > 14 days; FinOps reviews and confirms before deletion. Scope: snapshots, EBS volumes, ELB targets, Lambda versions > 5 generations old.
Carbon footprint reported alongside costYes — monthly FinOps + Sustainability review using AWS Customer Carbon Footprint Tool; tracked against a 2026 baseline. Sustainability KPI not yet formalised (gap noted in 4.5 scoring).
Environment retirement actually deletes (vs stops)Yes — decommissioning runbook requires Terraform destroy + S3 emptying + KMS schedule deletion; CMDB Retired status only after Cost Explorer confirms zero spend for 30 days.
Skill AreaCurrent LevelAction Required
AWS (EKS, Aurora, networking)MediumOngoing: AWS SA Associate certification for 4 engineers
Infrastructure as Code (Terraform)HighNone
CI/CD (GitHub Actions, Argo CD)HighNone
Node.js / NestJSHighNone
Next.js SSRMediumWorkshop delivered 2025-09; ongoing community of practice
PostgreSQL DBAMediumDedicated DBA allocated; advanced PostgreSQL training completed Q4 2025
Security & complianceMediumSecurity champion training complete; annual OWASP refresh
QuestionResponse
Can the team fully operate and support this solution in production?A: Fully capable
ConcernApproach
Keeping software versions currentEKS: upgraded within 60 days of minor release; Aurora PostgreSQL: minor versions in monthly maintenance window; Node.js: LTS tracked, upgraded within 90 days
Certificate managementACM for public TLS (auto-renewal); AWS Private CA for internal mTLS
Dependency managementSnyk continuous monitoring; Dependabot PRs; quarterly dependency review
AttributeDetail
Exit strategyMicroservices are containerised (Helm charts); PostgreSQL is standard; data exportable; storefront (Next.js) portable to any Node.js host
Data portabilityAurora: pg_dump / logical replication; S3: standard APIs; Cognito: CSV export with password reset required
Vendor lock-in assessmentLow-Moderate overall. Primary concerns are Stripe (High — see R-002) and Cognito (Moderate — migration requires password reset cycle). All other components are standard and portable.
Exit timeline estimate6-9 months (3 months infrastructure + 3-6 months payment provider migration if Stripe replaced)

IDConstraintCategoryImpact on DesignLast Assessed
C-001Must comply with PCI-DSS v4.0 by 31 March 2026RegulatorySAQ A-EP scope achieved via Stripe Elements tokenisation; network segmentation and audit logging retained2026-03-01
C-002All customer PII must remain in the UKRegulatoryPrimary region eu-west-2; DR limited to non-PII; Aurora Global DB filtered replication2026-01-15
C-003Must deliver before Black Friday 2026TimeFixed cut-over milestone 2026-10-01; scope prioritised accordingly2026-03-01
C-004Must integrate with SAP ERP for order fulfilmentTechnicalSQS-buffered asynchronous integration; existing SAP APIs consumed as-is2025-09-30
C-005Corporate Cloud Landing Zone mandates AWSOrganisationalAll hosting on AWS; Azure / GCP not permitted2025-07-14
IDAssumptionImpact if FalseCertaintyStatusOwnerEvidence
A-001Stripe will maintain UK PSD2 SCA compliance and current pricing through 2028Commercial model re-negotiation; possible re-platformHighOpenPriya DoeStripe contract signed 2025-05-01 with 3-year fixed pricing
A-002SAP order API will handle 5,000 orders/min sustained during peakOrder backlog in SQS beyond SLA; customer confusionMediumClosedFred BloggsSAP team load-tested at 6,000 orders/min 2025-10-18
A-003Customer mobile app adoption will reach 55% of sessions by 2027Over-investment in mobile BFFMediumOpenRaj BloggsCurrent: 47%; trending +2pp/quarter

Risk identification:

IDRisk EventCategorySeverityLikelihoodOwner
R-001Peak trading capacity insufficient; platform degrades or fails during Black FridayOperationalCriticalLowSally Doe
R-002Vendor lock-in to Stripe creates commercial leverage or single-PSP exposureCommercialHighMediumPriya Doe
R-003Customer PII data-residency breach via misconfigured Aurora Global DB replicationComplianceHighLowTom Bloggs
R-004Third-party JavaScript (e.g., marketing tag) compromises storefront (Magecart-style)SecurityCriticalMediumJane Doe
R-005Mobile app store review delays or rejection blocks timely releaseDeliveryMediumMediumFred Bloggs
R-006AWS eu-west-2 regional outage during peak tradingOperationalCriticalLowSally Doe

Risk response:

IDMitigation StrategyMitigation PlanResidual RiskLast Assessed
R-001MitigateMonthly load tests at 2x peak, quarterly at 3x; full game-day 4 weeks before Black Friday; peak-readiness sign-off gate; additional SRE on rota Nov-DecLow2026-03-01
R-002MitigatePayment abstraction layer in Checkout Service isolates Stripe SDK; documented 6-9 month migration plan to a secondary PSP; stored payment token strategy reviewed annually; Adyen considered for dual-acquirer model from 2027Medium2026-03-01
R-003MitigateFiltered Aurora logical replication (PII tables excluded); monthly compliance audit of replication; Terraform guardrails prevent inadvertent PII-table replication; DPO quarterly sign-offLow2026-02-15
R-004MitigateStrict Content Security Policy (script-src allowlist); Subresource Integrity on all third-party scripts; Stripe Elements isolates card entry in Stripe iframe; quarterly client-side security audit; tag-manager discipline enforced by MarketingMedium2026-03-01
R-005MitigateEarly submission 4 weeks before hard deadline; in-flight review with Apple / Google developer support; progressive web app (PWA) fallback if native store delaysLow2026-03-01
R-006Accept (with mitigation)Pilot-light DR in eu-west-1; RTO 2 hours validated quarterly; customer-facing status page; accept 1-minute RPOMedium2026-03-01
IDDependencyDirectionStatusOwnerEvidenceLast Assessed
D-001SAP ERP provisioned for cloud-origin order traffic (new API scope + bandwidth)InboundResolvedSAP teamSAP integration live; load test 2025-10-182025-10-31
D-002Corporate Cognito customer user pool configured and DPIA-approvedInboundResolvedPlatform teamCognito live 2025-08-15; DPIA-2025-091 approved2025-09-30
D-003Stripe contract signed with UK acquiring and 3-year pricingInboundResolvedProcurementContract NW-PROC-2025-118 signed 2025-05-012025-05-01
D-004Loyalty platform (APP-0417) supports Cognito identity attribute mappingInboundCommittedLoyalty teamIntegration in test; completion 2026-05-012026-03-01
IDIssueCategoryImpactOwnerResolution PlanStatusLast Assessed
I-001OpenSearch index rebuild time of 42 minutes blocks catalogue refresh cadenceOperationalLowSally DoeMove to rolling reindex with dual-index alias swap; completion 2026-05-01In Progress2026-03-18
I-002Mobile app iOS notification permissions prompt shown too early, depressing opt-inDeliveryLowFred BloggsReorder onboarding flow; A/B test via LaunchDarklyIn Progress2026-03-10
QuestionResponse
Does this design create any exception to current policies and standards?No
QuestionResponse
Does this design create an issue against the process library?No
QuestionResponse
Does the design materially change the organisation’s technology risk profile?Yes — reduces PCI-DSS scope and operational risk by replacing unsupported legacy; introduces elevated SaaS dependency on Stripe. Net impact assessed as favourable by Risk & Controls (RC-2025-118).
ADR #TitleStatusDateImpact
ADR-001PostgreSQL (Aurora) over MySQL for Transactional StoreAccepted2025-08-05Determines data platform, tooling, team training
ADR-002Next.js SSR over Client-Only SPA for the StorefrontAccepted2025-08-12Determines rendering model and SEO strategy
ADR-003Stripe Elements Tokenisation to Reduce PCI-DSS ScopeAccepted2025-09-02Determines payment architecture and PCI scope (SAQ A-EP)

TermDefinition
AuroraAmazon Aurora — AWS managed PostgreSQL / MySQL-compatible database
BFFBackend-for-Frontend — a service tailored to a specific client (e.g., mobile)
CDPCustomer Data Platform (Segment, in this context)
CMACardholder Authentication — card-scheme authentication step
CognitoAWS customer identity and access management service
Core Web VitalsGoogle’s user-experience metrics (LCP, INP, CLS)
HPAHorizontal Pod Autoscaler — Kubernetes autoscaling mechanism
IRSAIAM Roles for Service Accounts — pod-level IAM on EKS
LCPLargest Contentful Paint — page-load performance metric
MagecartClass of attack injecting malicious JavaScript to skim payment details
NWONorthWind Online — the subject of this SAD
PANPrimary Account Number — the card number
PCI-DSSPayment Card Industry Data Security Standard
PSD2Payment Services Directive 2 — European payments regulation
SAQ A-EPPCI-DSS Self-Assessment Questionnaire A-EP — applicable to merchants using a third-party tokenisation iframe
SCAStrong Customer Authentication — PSD2 multi-factor requirement
SSRServer-Side Rendering — rendering HTML on the server prior to sending to the browser
Strangler FigMigration pattern that gradually replaces a legacy system
TPPThird-Party Provider (not used in this context; included for family-of-standards clarity)
DocumentVersionDescriptionLocation
NorthWind Information Security Standard3.4Corporate security standardCorporate Confluence / Security
NorthWind Cloud Landing Zone Standard2.1AWS baseline controls, tagging, networkingCorporate Confluence / Cloud
NorthWind Data Classification Standard1.2Data classification and handlingCorporate Confluence / Data
PCI-DSS4.0Payment card industry security standardhttps://www.pcisecuritystandards.org/
UK GDPR2021UK General Data Protection Regulationhttps://www.legislation.gov.uk/
OWASP ASVS4.0Application Security Verification Standardhttps://owasp.org/www-project-application-security-verification-standard/
NWO Threat ModelSEC-TM-2025-044STRIDE-based threat model for NorthWind OnlineCorporate Confluence / Security
DPIA - NorthWind OnlineDPIA-2025-091Data Protection Impact AssessmentCorporate Confluence / Compliance
AWS Well-Architected Framework2025AWS best practicehttps://aws.amazon.com/architecture/well-architected/
Standard / Pattern IDNameVersionApplicability
PCI-DSS-4.0Payment Card Industry DSS4.0Security View
OWASP-ASVS-4.0Application Security Verification Standard4.0 L2Application security
WCAG-2.2-AAWeb Content Accessibility Guidelines2.2 AAStorefront and mobile
12-FactorTwelve-Factor AppMicroservice design
Strangler FigStrangler Fig migration patternMigration plan
RoleNameDateSignature / Approval Reference
Solution ArchitectPriya Doe2026-03-18ARB-2026-NWO-011
Head of Digital EngineeringFred Bloggs2026-03-17ARB-2026-NWO-012
Principal Security ArchitectJane Doe2026-03-17ARB-2026-NWO-013
Data Protection OfficerTom Bloggs2026-03-18DPO-2026-014
SRE LeadSally Doe2026-03-17SRE-2026-NWO-009
CTOHelen Doe2026-03-18ARB-2026-NWO-APPROVED
Head of Digital CommerceRaj Bloggs2026-03-18ARB-2026-NWO-APPROVED

Assessment Summary

This SAD was assessed at Recommended depth — the expected level for a Tier 2 High Impact regulated system. The scores below reflect a well-documented architecture proportionate to a B2C e-commerce platform with PCI-DSS and UK GDPR obligations.

SectionScore (0-5)AssessorDateNotes
1. Executive Summary5Design Authority2026-03-18Clear business drivers with priority, strategic alignment with reuse documented, current-state architecture complete, revenue impact quantified
3.1 Logical View4Design Authority2026-03-18Full component decomposition, design patterns with rationale, vendor lock-in assessed. Service mesh detail could be deeper
3.2 Integration & Data Flow4Design Authority2026-03-18All internal and external integrations documented with protocols and auth; customer-event tracking plan referenced externally
3.3 Physical View4Design Authority2026-03-18Deployment, hosting, networking, environments fully documented; peak bandwidth characterised from real Black Friday telemetry
3.4 Data View4Design Authority2026-03-18All data stores classified with retention and encryption; DPIA approved; sovereignty addressed with filtered replication. Field-level encryption detail at Recommended depth, not exemplary
3.5 Security View4Design Authority2026-03-18STRIDE threat model with 6 named threats and mitigations; PCI-DSS scope-reduction strategy documented; identity models comprehensive
3.6 Scenarios4Design Authority2026-03-18Three architecturally significant use cases; three ADRs with alternatives and tradeoffs
4.1 Operational Excellence4Design Authority2026-03-18Datadog APM/Logs/RUM, PagerDuty on-call, peak-readiness drills. Runbook library noted but detail out of this document
4.2 Reliability4Design Authority2026-03-18Multi-AZ with pilot-light DR, RTO/RPO validated via quarterly drills, fault tolerance with circuit breakers, immutable backups
4.3 Performance4Design Authority2026-03-18KPIs defined, load-testing cadence documented, 3-year capacity projection; 5-year horizon flagged for review
4.4 Cost Optimisation5Design Authority2026-03-18Detailed monthly breakdown, Savings Plan + RI strategy, FinOps practices, tagging, rightsizing cadence
4.5 Sustainability3Design Authority2026-03-18Graviton used, non-prod shutdown configured, right-sizing practised. Carbon KPIs not baselined (gap)
5. Lifecycle4Design Authority2026-03-18CI/CD with security scanning, Strangler Fig migration plan, LaunchDarkly feature flags, team skills assessed, exit plan documented
6. Decision Making4Design Authority2026-03-185 constraints, 3 assumptions (with evidence), 6 risks with mitigation, 4 dependencies tracked, 2 issues with resolution plans
Overall4Design Authority2026-03-18Recommended depth achieved. Proportionate, well-evidenced documentation for a Tier 2 High Impact regulated e-commerce platform. Lowest individual score 3 (Sustainability: carbon KPIs not baselined).