Supercharge Your Natural Language Processing: Kensho Derived Wikimedia Dataset
Wednesday, April 22 @ 6:00 pm - 8:30 pmFree
6:00 Networking, drinks and food
6:30 Supercharge Your Natural Language Processing With the Kensho Derived Wikimedia Dataset (KDWD)
Speaker: Gabriel Altay, PhD, Kenso ML Engineer
The Kensho R&D group recently released the Kensho Derived Wikimedia Dataset (KDWD), making it easier for everyone to use publicly available data for natural language processing (NLP). The KDWD combines articles from English Wikipedia with knowledge base entries from Wikidata using standard CSV and JSON formats.
Gabriel will walk you through this dataset so you and your teams can get started quickly on domain specific research or enriching your existing technology. Next, he will walk us through some real-world examples and notebooks in Python. The dataset is hosted on Kaggle so you can experiment and play with it from home too.
Bio: Gabriel used to be a computational astronomer and earned his PhD in physics from Carnegie Mellon University. With more than a decade of experience with massive datasets and machine learning, he now mines unstructured data at Kensho (https://www.kensho.com) using the latest NLP.
7:30 questions, play with the data and networking