At RavenPack, we’re in the business of Giving Meaning to Unstructured Data. Leaving the marketing hyperbole aside, we’re constantly looking to innovate and improve our text processing algorithms, models, and the data we feed them. Over the years, we’ve experimented with an endless iteration of approaches and improvements to our core Named Entity Recognition (NER), Classification, and Sentiment analysis tasks, with varying degrees of success.
The final approach we adopted always came down to the basics: precision vs recall, especially at our scale. This became such an obsession for us that we built our own text processing library from the ground up in LISP (we’re always hiring LISP programmers!) to ensure accuracy and consistency in our output.
To better benchmark our performance, we have built the largest sentiment analysis dataset with millions of sentences labeled with a score between -1 and 1. This represents a balanced sample from over 20 years of our classified news data.
“And it couldn’t have come at a better time. NLP’s ImageNet moment had arrived, and we had all the right ingredients to push the envelope at RavenPack further”
Over a series of articles, we plan to publish some of our findings to the NLP community at large. We’re actively building our Text Analysis infrastructure & APIs for use across the community and would love your feedback.
Continue reading in https://medium.com/ravenpack/machine-learning-ravenpack-improving-sentiment-models-with-better-inputs-473f87a31b74