We have built a natural language processing based solution which, with minimal setup required, can be customized to handle tickets of any new company. We have built pipelines to retrain the models based on the entities of importance, and also tweak the models based on the changes in vocabulary.
Following are the modules built as part of the pipeline:
- Preprocessing is an essential part of any NLP task, wherein the text is to be cleaned up to bring it to a required format for the information extraction models. This includes normalizing different tenses of words, normalizing synonyms, spell correction etc.
- In the context of this challenge, it is necessary to segregate the mails and identify if a block of text is header or signature or the body of the mail
- Spell correction and normalization of abbreviations are also required, using regular expressions.
- This processed text then needs to be tokenized, which is to split the raw text to a list of words, using popular open source libraries like nltk and spaCy.
- It is observed that in case of technology companies, users tend to attach error log messages in their mail explaining the error, which need to be separated out from the actual context of the mail in order to facilitate the support engineers in easily identifying the error and provide appropriate solution.
- A machine learning based classifier is built to classify each sentence in the text if it is part of the error log or not.
Named Entity Recognition
- As part of information extraction, identifying the terms of interest present in text helps in assigning each ticket appropriate tags.
- From the above example, we can infer that the ticket is related to Docker’s UCP, and the ticket would then be assigned to an engineer working on Ubuntu.
- This tagging is done using statistical models like Conditional Random Fields (CRF) and also using the latest advances in deep learning for NLP , like Long Short-Term Memory (LSTM) networks.
- Issues raised by the users to support teams vary in severity, and it is important to address the most dissatisfied customers first and with at most priority to keep the overall customer satisfaction level intact.
- As a result, it is important to identify the degree of sentiment expressed in a mail and assign tickets to appropriate experienced personnel to handle them.
- The classification model built to address this issue gives a score of sentiment expressed. It keeps track of the context of words in the text, presence of expletives and a few other set of rules in addition to the score coming from the machine learning model.
This pipeline has been customized and deployed to multiple technology companies, whose efficiency has drastically improved in serving their customers.