techone --guide=cloud-management
Cloud and server management: prevention, not firefighting
Long-term preventive maintenance for Azure, AWS, and self-hosted infrastructure. Monthly checks, backups, security patches. You see what is happening, what was fixed, and what is coming.
Last updated:
TL;DR
- What we do
- Monthly preventive maintenance of cloud and servers: security review, updates, backups with restore tests, monitoring, configuration check.
- Why it works
- Most outages are visible weeks in advance. Regular checks catch them in a planned window, not at 2 AM.
- Who it fits
- Companies with a production system and no internal DevOps team, who want predictable costs instead of firefighting.
- Environments
- Azure, AWS, hybrid and traditional on-premises. Same process, tools depend on the environment.
- How to start
- Initial audit and written report with recommendations. Then you decide.
- Who delivers
- Prague-based team, CET timezone, EU contracts, English projects. Long-term partnerships, not one-time fixes.
Why prevention works better than reaction
Business IT can be managed in two ways. Both work, but one costs significantly more money, sleep, and clients.
Friday afternoon. The e-commerce platform starts slowing down, orders get stuck, the database goes down by evening. The weekend campaign runs degraded. Emergency mode all weekend: restart, manual scaling, fixes on the fly. The team is burned out.
The worst part? The retrospective shows the warning signs were visible months in advance. Growing load, missing indexes, nobody was watching. Nobody asked.
We catch problems while they are still small
Missing index, filling disk, expiring certificate, failing backups. These have weeks or months before they become an outage. Regular checks catch them in a planned window, not at 2 AM on a working day.
You never pay for firefighting
Emergency fixes at night and on weekends cost several times more than planned work. Preventive maintenance turns "unexpected crises" into predictable monthly expenses. Every CFO prefers that.
We continuously suggest improvements
Regular checks are not just "is everything running". They are a chance to see where you can cut cloud costs, where to improve performance, what to automate. We come with recommendations, not invoices for fixes.
Transparency instead of "everything runs"
A monthly written report showing what was checked, what was fixed, what is planned next. You know what you are paying for and where your money goes.
The difference between reactive and proactive IT is not in the technology. It is that someone is regularly looking. When a system goes down, the audit does not ask "why did this happen". It asks "what did you do to prevent it".
Reactive IT vs preventive maintenance
The difference between the two approaches is clearest when you compare concrete parameters. Both work, but they produce different outcomes.
| Parameter | Reactive IT | Preventive maintenance |
|---|---|---|
| When problems get solved | Only after an outage | Weeks to months in advance, in a planned window |
| Typical response time | Hours, often nights and weekends | Planned, during business hours |
| Cost of a single fix | Several times higher (emergency rate, overtime) | Included in monthly retainer |
| Cost predictability | Low, crises come unexpectedly | High, fixed monthly fee |
| Audit documentation | Usually missing or fragmented | Monthly written report, auditable |
| Impact on the team | Stress, burnout, resignations | Calm operations, room to build |
| Client relationships | Apologies for outages, lost trust | No outages, no apologies |
swipe to see the full table
What we actually do every month
Preventive maintenance is not a one-time check. It is a regular process with seven core areas we go through with every client. Specific outputs vary by environment. An Azure application looks different from a dedicated server, but the areas are the same.
Security review
Access rights, users, network endpoints. Who has access to what, whether firewalls are in order, whether anyone's credentials have expired. We remove access that is no longer needed. Often the biggest security gap is not what you add, but what you forget to remove.
Updates and patches
Versions of individual components: database, runtime, libraries, operating system. Security patches monthly, service packs quarterly. What is pending, what is outdated, what the vendor no longer supports. Less dramatic than letting the system age until it breaks.
Backups and restore tests
Checking that backups run on schedule and that they can actually be restored. The biggest surprise with backups is that the backup exists but the restore does not work. We test restoration at least once a quarter, more often for critical systems. Retention policies, off-site copies, transaction logs.
Monitoring and log review
Graphs of performance, storage, latency, and error rates for the past month. We look for anomalies: something that was fine last time and is starting to grow. Exceptions in application logs, unexpected restarts, slow queries. Problems are often visible weeks before they become incidents.
Environment configuration check
Whether the cloud or server matches the documentation, whether anyone made a quick config change and forgot to document it, whether costs are creeping up because nobody is watching. This is where we often find the biggest savings: forgotten resources, oversized instances, unused databases.
Review of last month's changes
What changed, who changed it, why. Documentation and changelog updates. This sounds mundane, but it matters during audits and when handing the system over to a new person. A system nobody documented depends entirely on one person's memory.
Planning for next month
What is coming: known campaigns, new releases, seasonal load, upcoming migrations. What we propose for the next check. This is not only "what we will do", but also "what the client should prepare". For example, approvals for larger server purchases, or maintenance windows to plan ahead.
How the engagement works
Preventive maintenance starts with understanding what you are managing. No one-size-fits-all. A small Azure application needs a different approach than a multi-country e-commerce with dedicated infrastructure. If you are still deciding whether and how to move to the cloud in the first place, our cloud migration guide covers that. Preventive maintenance comes into play afterwards.
Initial audit
We go through your environment, identify what is in order, what needs attention, and what is missing entirely. The output is a written report with prioritized recommendations and a proposed scope of preventive maintenance.
Report scope design
We define what the monthly report will cover. For one client compliance is critical, for another capacity planning, for a third cost optimization. We decide what to track, which metrics matter, how quickly we respond to alerts. Scope can be adjusted as your environment evolves.
Monthly reports and actions
Each month we go through the defined areas, send a written report, and make needed adjustments. Larger changes we discuss upfront, smaller ones (like applying patches) we handle directly.
Continuous improvement
The scope is not static. As your environment changes, we adjust it. As we learn new things, we add them to the checks. The goal is not to maintain the status quo, but to gradually improve.
What prevention caught in practice
Theory sounds good, but the real value of prevention only shows in concrete results. Here are typical patterns that routine management catches before they cause an outage. We see them repeatedly across clients.
1. Missing index, two weeks before outage
Database monitoring showed average query time growing week over week. Log analysis revealed a table that had been growing by thousands of rows daily since the last release, and a critical query was scanning the whole thing. Adding the index took 10 minutes in a planned window. Without it, the system would have gone down once the table reached about 500,000 rows.
2. Backups existed, restore did not work
For a new client, we ran a restore test as part of the initial audit. The backup looked fine, but the restored file was half the expected size. It turned out that six months earlier someone had changed the backup destination path, and only part of the database had been backed up since. Nobody knew, because nobody had verified it. We fixed the path and introduced monthly restore tests.
3. Former employee account, active for two years
During an access review, we found an admin account belonging to someone who had left the company two years earlier. They had full production rights, access to backups and to the database. Nobody had noticed because the departure had been handled by HR only, not by IT. We disabled the account immediately and introduced a process linked to offboarding.
4. Certificate about to expire over the weekend
A monthly check showed that the SSL certificate for the main domain would expire in 12 days, on a Saturday. Auto-renewal was configured, but a recent web server update had changed the path to the renew script and nobody had noticed. We fixed it during business hours. Without that, visitors on Saturday morning would have hit an invalid certificate error.
All these situations had one thing in common: they were visible in advance, but nobody was looking. Prevention does not rely on magic, only on someone regularly going through the critical points and noticing what is changing.
Who this is for
Preventive maintenance makes sense for some companies and not for others. And we say so upfront.
Good fit
- You have a production system your business depends on
- You have no internal DevOps team or it is overloaded
- You want predictable costs instead of surprises
- You are planning growth or expansion to new markets
- You need documentation and evidence for audit
- You are looking for a long-term partner, not a one-time fix
Not a fit
- You have an internal DevOps team covering your stack
- The project is not yet in production and has no users
- You are shopping for the cheapest offer
- A yearly audit is enough for you, this is about regularity
- You expect work without documentation or access
Frequently Asked Questions
How much does it cost?
Price depends on three factors: size of your environment (number of servers, databases, applications), scope of the report (what you want us to track), and required response time (business hours vs 24/7 SLA). Typical ranges: a small Azure application with one database costs significantly less than a multi-country e-commerce with own infrastructure. We start with an initial audit, then prepare a specific management proposal.
What is included in the monthly package?
Monitoring and alerting, incident response within SLA, regular updates and security patches, and a monthly report with findings and recommendations. The scope is tailored to what you actually need.
How quickly do you respond to an outage?
It depends on the SLA we agree on. Typical range: 30 minutes during business hours, 15 minutes for critical systems 24/7. Specific SLA depends on the contract. Monitoring runs continuously, so we respond to most problems before you even notice them.
Do you support only cloud, or on-premises servers too?
Both. We manage Azure, AWS, dedicated servers, hybrid environments (cloud plus on-premises servers), and traditional self-hosted infrastructure. The process is the same, only the tools differ by environment.
Can you handle smaller infrastructure?
Yes. You do not need dozens of servers. We work with clients who run a single server and a single database. The scope adjusts to reality.
Our company is growing. How should we scale infrastructure?
Two paths: move to the cloud (pay for actual consumption, scale as needed), or containerize applications and orchestrate them (Docker, Kubernetes) on dedicated servers. We combine both approaches depending on the situation. We start with an audit and propose a plan. If you are currently planning the move to cloud, see our cloud migration service.
Can we start with just an audit, without committing long term?
Yes, the audit is a standalone product. We walk through your environment, analyze it and send you a written report with specific things to fix. Whether it turns into a long-term engagement is your decision. The audit stands on its own as technical analysis. For long-term operations, see our cloud and infrastructure management service.
How is this different from classic IT support?
Classic IT support is reactive: you call when something breaks. This is proactive: we call (or write) when we see a problem coming. Incident response is still part of the service, but the goal is to have as few incidents as possible.
Looking for a partner for cloud management?
Tell us what you run. We'll walk through your environment and propose how to manage it.
Book a discovery call