Context

This document proposes revised SLA timelines for our Product Development team. The original timelines were built for a larger team with dedicated support capacity, but our current setup is 3 product developers and 1 Product Development lead. That means every support issue pulls someone directly off feature work, and there is no bench or backup rotation to absorb the hit.

We also do not have global coverage. The team operates during standard US business hours (9:00 AM to 5:00 PM EST), so all hour-based SLAs only apply within that window. Issues reported outside of business hours will have their SLA clock start at the beginning of the next business day.

The revised timelines are designed to be realistic for a team of this size and coverage model while still holding us accountable to fast, consistent support. The goal is to set expectations we can actually meet rather than ones that look good on paper but get missed regularly.

Original SLA Timelines

For reference, here are the timelines we started with:


Severity	Initial Response	Follow-up	Intent to Resolve	Description
1 - Critical	1 hour	1 hour	4 hours	Core system failure
2 - Major	1 hour	2 hours	1 business day	Impact on basic functionality
3 - Moderate	1 business day	2 business days	3 business days	False/inconsistent results
4 - Minor/Cosmetic	1 business day	2 business days	5 business days	Low impact / UI issues

Revised SLA Timelines

Changed cells are highlighted in orange. Green cells indicate no change from the original. All hour-based timelines apply during US business hours only.


Severity	Initial Response	Follow-up	Intent to Resolve	What Changed
1 - Critical	1 hour	2 hours	8 business hours	Follow-up and resolution extended
2 - Major	2 hours	4 hours	2 business days	All timelines extended
3 - Moderate	1 business day	3 business days	5 business days	Follow-up and resolution extended
4 - Minor/Cosmetic	2 business days	5 business days	10 business days	All timelines extended

Justification by Severity

Severity 1: Critical

Response stays at 1 hour during business hours. Follow-up moves to 2 hours so we can give a real status update, not a rushed placeholder. Resolution moves to 8 business hours because a 4-hour fix window with no bench depth and no after-hours coverage means cutting corners on testing and deployment.

Severity 2: Major

Response moves to 2 hours to allow a clean context switch. Resolution moves to 2 business days because a same-day fix requirement pulls someone entirely off feature work, and with 4 people that has a cascading effect on the sprint.

Severity 3: Moderate

Response stays at 1 business day. Follow-up and resolution get slight bumps (3 and 5 days respectively) because these stack up behind Sev 1 and 2 work. A little more room upfront means we actually hit the SLA instead of consistently missing it.

Severity 4: Minor and Cosmetic

All timelines extended. Resolution moves to 10 business days (or next sprint) so we can batch these into planned work instead of treating each one as an individual interrupt.

Additional Considerations

Business Hours Definition

SLA clocks run Monday through Friday, 9:00 AM to 5:00 PM EST only. Issues reported outside this window start their clock at 9:00 AM EST the next business day. This should be clearly communicated to any stakeholders or clients expecting after-hours support.

Dedicated Reporting Channel and On-Call Schedule

All breaking changes and bugs should be reported through a single dedicated channel (e.g. #bugs-and-incidents in Teams) rather than scattered across DMs, emails, or random threads. This gives the team a single place to triage from and creates a clear audit trail for what was reported and when.

The team should maintain a weekly on-call rotation so there is always one designated engineer responsible for monitoring the channel, triaging incoming issues, and owning the initial response. The on-call engineer is not expected to fix everything solo, but they are the first point of contact and responsible for pulling in the right people. The rotation should be visible to the full team and updated weekly.

Holidays and PTO

With a team of 4, even one person on PTO reduces capacity by 25%. During holidays or periods with multiple people out, SLA timelines may need to be temporarily extended. We should define which holidays are observed and have a plan for communicating reduced coverage windows in advance.

Escalation Path

If an SLA is at risk of being missed, there should be a clear escalation path. The Product Development lead owns escalation for Sev 1 and 2 issues. For Sev 3 and 4, the assigned engineer is responsible for flagging blockers before the SLA window closes.

Revisiting These Timelines

These SLAs should be reviewed quarterly or whenever team size changes. If we add dedicated support capacity or grow the engineering team, we can tighten the windows back up. The numbers here are right-sized for the team we have today.

Open Questions

• Is this SLA scoped to Product Development only? As we scale the Citizen Developers program, what does support look like for solutions built by that group? Will the Product Development team be responsible for triaging and resolving issues that come out of Citizen Developer work, or will there be a separate support model for that?

Summary

These changes are not about lowering the bar. They are about setting a bar we can consistently clear. Missed SLAs erode trust faster than generous ones. If the team grows or we add dedicated support capacity in the future, we can tighten these back up. For now, this gives us room to deliver reliable support while still shipping product.

SLA