NLP | Office of Innovation & Technology

Chatbots for education

#0071BC

Project Type:

Tech Innovation Initiative

Topics:

Deep learning

Natural Language Processing

Client:

Internal project

Duration:

Ongoing

Overview

Building upon an internal data science initiative, GSE IT began investigating innovative ways of using machine learning (ML), specifically around natural language processing (NLP) to analyze large amounts of text.

The goal is to find ways technology and machine learning can help supplement classroom instruction, policymaking, and personalization of learning experiences.

Process

We first began by trying various cloud providers for natural language processing, including Google’s Cloud Natural Language, Microsoft’s Cognitive Services, and IBM Watson. We were able to process simple texts through their service and get back results according to the cloud vendor’s algorithm and dataset.

We then tried building our own algorithm in-house, using the Stanford Question Answering Dataset (SQuAD) to train our model. We also used questions from the Stanford Mobile Inquiry-based Learning Environment to rate and classify questions.

spaCy
spaCy is a free open-source library for Natural Language Processing in Python. We use it for creating word vectors from sentences.

spacy.io

Keras
Keras is a high-level neural networks API, written in Python. We use it for creating the classification model of questions.

keras.io

TensorFlow
TensorFlow™ is an open source software library for high performance numerical computation. We use it as the calculation system behind Keras.

tensorflow.org

Cloud Deployment
Amazon Web Services provides servers pre-installed with popular deep learning frameworks such as Keras and TensorFlow.

aws.amazon.com/machine-learning/amis

Stanford High Performance Cluster
Stanford also acquires and deploys high performance computing resources for faculty and student research use.

hpcc.stanford.edu

SQuAD
SQuAD is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles. We use this as the base dataset for our question bank.

rajpurkar.github.io/SQuAD-explorer

Outcome

We are continually improving our algorithm to achieve better results when classifying open-ended text, and are expanding our use of NLP and ML to other educational use cases.

We are building chatbots with embedded learning proficiency levels to help tutors of English identify the written proficiency of their trainee, as well as to help our community navigate around our buildings through our digital kiosk.