So, you want to build an intelligent network support center- but you’re not sure where to start?
Author: Seoid Ní Laoire |
---|
August 26, 2019
In operations/network support the goal of many teleco’s is a ‘zero-touch’ network– a fully autonomous network that can operate, optimize and heal itself without the need for human maintenance or intervention. If you’re reading this blog, this goal is of interest to you too. As an industry, we’re still a long way from that kind of autonomy, but many of us are taking steps to lead us in that direction. If we can automate the complex, mundane tasks involved in day to day network operations, we can free people up to work on tasks that best utilize their skills. AI or machine learning, with its ability to adapt and learn continuously, handle complexity and massive data volumes, is a powerful enabler of this.
I’ve spent the last year at Aspire working with a team to develop machine learning solutions, across network operations and optimization. Based on the successes (and the failures), I came up with following high-level steps to guide you (whether you’re a business leader or a technical engineer) in developing your own network operations AI solutions.
Step 1. Define your goal
Is it to avoid critical incidents in your network, speed up your time to resolution or optimize customer experience?
30% of telecos say lack of a strategic direction is the greatest challenge to implementing AI. The key thing to remember is to think big, but to start small and specific- pick one fault to predict, one network to domain to analyse, one subset of your network to train on.
If you’re not sure where to start, examine immediate opportunities for reducing costing and increasing efficiency. But look to the future as well- as today’s problems may not be relevant tomorrow.
Step 2. Choose the right tool
We’ve touched on the benefits of machine learning already but remember it’s a difficult process and an investment of time and money. Think of it like a powerful medicine – it’ll cure you for sure, but unless you’re sick, don’t take it. So how to decide if machine learning is the right tool for your problem? You can’t substitute hands on experience here but asking yourself these two questions can help.
- Is your task repetitive in nature?
- Can it be solved using simple, explicit rules?
If its yes to the first and no to the second, you have may have a candidate for machine learning.
Step 3. Have you got the right ingredients?
Data has been compared to gold, oil, water. Pick your preferred analogy – it’s the essential ingredients for a digital transformation. So, what’s the key to tapping into this wealth? I’ll suggest three things here- dark data, automation and correlation. The first – dark data – is a term used to describe data that is unused in an organisation, either due to lack of resources, lack of processes or lack of skills. It is often unstructured and unclean, making it a pain to work with. You’d be surprised how much of your data might be dark (a recent report published by Splunk estimates that globally, 55% of organisations’ data is dark). The second- automation- is essential. Put time into building a pipeline that automates the collection, storage and cleaning of your data. Trust me – it’s this part which is going to take the most time – but it will pay off in the long term. The last – correlation – is about gleaning intelligence from multiple data sources – network data, customer care data and, at a later stage, even security data. If you can automate the collection, storage and cleaning of data, mine your org’s dark data and combine disparate data flows into one holistic, unified picture of your network you’ll be way ahead of the curve.
Step 4. Have you got the right skills to scale?
So, you found your deep learning expert – or you trained in-house – they’ve taken 4 courses on Coursera, they can confidently tell you the difference between a convolutional neural network and a recurrent neural network. They’ve even built & trained a model on their local machine/Google colab and its nailing it’s prediction accuracy! You’re excited. You ask them how you’re going to deploy this solution to an entire network. They look at you blankly. Andrew Ng didn’t tell me how to do this…
Turns out it’s pretty hard to scale your solution from a proof of concept to a full scale, live deployment. You need a different skill set. Even from a pure ML perspective, you need to keep deployment in mind right from the start. Did you pick a model that will scale well? That trains and responds quickly enough to operate in real time? Did you strike the right balance between the model speed and the accuracy? In a study conducted by Nokia they found that very few AI projects made it to live deployment – and many telecos remain stuck in “proof of concept purgatory”. If you don’t have the skills or resources to make that transition, partnering with someone who does can be the solution.
Step 5. Sounds a bit overwhelming?
Keep open source in mind. Google’s chief decision engineer explains this well, comparing open source to using a microwave to cook your food. Most businesses are interested in applied machine learning – we only need to cook the food, not build an oven from scratch. This shifts some importance from technical skills to decision-making skills – knowing ‘what’s worth cooking’ and what you’re going to do with the food once it’s made.
Putting It All Together
At Aspire, we take a proactive approach to network support. Machine learning helps us to do that efficiently, at the deepest levels of the network. Think of the streaming giant Netflix – they learn the behavior of every individual user, so they can predict and recommend shows you’ll like. Our approach is similar – we learn the behavior of every cell in the network so that we can detect abnormal behavior at the first sign and take action. How did we do it? First, we defined our goal – automating the fault handling cycle. A good place to start for this was gathering data on fault “symptoms”– so we began with early fault detection. Next step was to choose the right tool. Should we use rules-based thresholds? Or was machine learning justified here? We wanted to be able to detect abnormal cell behavior but also capture seasonal trends and predict future cell behavior, making machine learning a worthwhile investment. Like most other teleco’s, inconsistent and fragmented data has been and still is a challenge for us, but we now automate the collection and correlation of multiple data sources. We use this for fault detection and prediction and to form a key component of the data-set for our work in fault diagnosis. Our data science team has a diverse set of skills; with backgrounds in physics, computer science, telecoms and psychology. We focus on continual learning and collaboration – upskilling existing staff and partnering with university researchers. Lastly, we achieved all this using a combination of open source tools, adapted to our needs and combined with our own proprietary scripts. We’re not a ‘zero-touch’ support center – our automation tools and predictive algorithms work alongside our engineers, making the most of their expertise.
If I had any parting advice it would be that AI/ML is a new and exciting technology and like any new technology, we should be comfortable with the risk of failure, embrace it as learning opportunity and continue to push forward and innovate. This is how we keep innovating at Aspire.