
Building upon an internal data science initiative, GSE IT began investigating innovative ways of using machine learning (ML), specifically around natural language processing (NLP) to analyze large amounts of text.
The goal is to find ways technology and machine learning can help supplement classroom instruction, policymaking, and personalization of learning experiences.

We first began by trying various cloud providers for natural language processing, including Google’s Cloud Natural Language, Microsoft’s Cognitive Services, and IBM Watson. We were able to process simple texts through their service and get back results according to the cloud vendor’s algorithm and dataset.
We then tried building our own algorithm in-house, using the Stanford Question Answering Dataset (SQuAD) to train our model. We also used questions from the Stanford Mobile Inquiry-based Learning Environment to rate and classify questions.

spaCy
spaCy is a free open-source library for Natural Language Processing in Python. We use it for creating word vectors from sentences.

Keras
Keras is a high-level neural networks API, written in Python. We use it for creating the classification model of questions.

TensorFlow
TensorFlow™ is an open source software library for high performance numerical computation. We use it as the calculation system behind Keras.

Cloud Deployment
Amazon Web Services provides servers pre-installed with popular deep learning frameworks such as Keras and TensorFlow.

Stanford High Performance Cluster
Stanford also acquires and deploys high performance computing resources for faculty and student research use.

SQuAD
SQuAD is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles. We use this as the base dataset for our question bank.
We are continually improving our algorithm to achieve better results when classifying open-ended text, and are expanding our use of NLP and ML to other educational use cases.
We are building chatbots with embedded learning proficiency levels to help tutors of English identify the written proficiency of their trainee, as well as to help our community navigate around our buildings through our digital kiosk.