Observability Engineer
Company: TensorWave
Location: Las Vegas
Posted on: February 13, 2026
|
|
|
Job Description:
Job Description Job Description Our mission at TensorWave Cloud
is to build seamless, secure, reliable, and resilient AI
infrastructure at scale, eliminating barriers and challenging the
status quo to empower builders and support AI innovation. About the
role We are looking for an Observability Engineer who is deeply
obsessed with Grafana, Prometheus, and modern observability
practices. This role exists to ensure our systems are measurable,
understandable, and debuggable at all times. You will own the
observability stack end-to-end — from instrumentation standards to
dashboards, alerts, and signal quality — and work closely with
infrastructure, platform, and application teams to make sure
nothing important fails silently. If you think about metrics before
features, believe bad alerts are worse than no alerts, and treat
Grafana dashboards as first-class products, this role is for you.
Responsibilities Own and evolve our observability and monitoring
platform, with Grafana and Prometheus at its core Design, build,
and maintain high-quality metrics pipelines using Prometheus and
related tooling Create clear, actionable Grafana dashboards that
tell a story — not just charts Define and maintain alerts that are
meaningful, actionable, and low-noise Establish and enforce
observability standards across services (metrics, logs, traces)
Partner with engineering teams to instrument applications correctly
Lead improvements to alerting strategies, SLOs, and SLIs Support
incident response by helping teams quickly understand what broke
and why Continuously evaluate and improve signal quality,
cardinality, and cost Identify observability gaps and eliminate
blind spots before they become outages You Are Obsessed With:
Grafana dashboards that instantly explain system health Prometheus
metrics that are intentionally designed, not accidental Alerts that
wake people up only when action is required Monitoring that scales
with system complexity Observability as a product, not an
afterthought Required Experience Strong hands-on experience with
Grafana and Prometheus Deep understanding of metrics-based
observability Experience designing monitoring and alerting systems
at scale Strong knowledge of alerting best practices (burn rates,
SLO-based alerts, noise reduction) Experience working with
distributed systems and cloud or Kubernetes environments Ability to
reason about system behavior using telemetry Comfortable working
across teams to improve instrumentation and visibility Preferred
Experience Experience with OpenTelemetry Familiarity with logs and
traces (Loki, Tempo, Jaeger, etc.) Kubernetes observability
experience Experience operating observability systems in high-scale
or production-critical environments Infrastructure-as-Code
experience (Terraform, Helm, etc.) What We Bring Mission driven
company Competitive Salary Stock Options 100% paid Medical, Dental,
and Vision insurance Life and Voluntary Supplemental Insurance
Short Term Disability Insurance Flexible Spending Account 401(k)
Flexible PTO Paid Holidays Parental Leave Mental Health Benefits
through Spring Health We’re looking for resilient, adaptable people
to join our team, people who believe in the mission and think at
massive scale. The solutions that worked on a handful of devices
will not work at Exascale. Be prepared to be pushed daily, to learn
a lot, and literally build the future. TensorWave is an equal
opportunity employer, committed to fostering an inclusive and
supportive workplace. All qualified applicants and candidates will
receive consideration for employment without regard to race, color,
religion, sex, disability, age, national origin, or veteran
status.
Keywords: TensorWave, Bullhead City , Observability Engineer, IT / Software / Systems , Las Vegas, Arizona