Online events

[AI Alliance] GneissWeb: Preparing High Quality Data for LLMs at Scale

Thursday 8 May 2025

This event has finished.

Started 17:00 PM

Finished 19:00 PM


Organized by Data, Cloud and AI in Lisbon


Venue: Online/Virtual

Address: Online event on your device
Portugal

See other Online events

Copy this link to share the event with anyone:


Share to social media:

About this event

**Details**

IBM recently released GneissWeb, a large dataset yielding around 10 trillion tokens that caters to the data quality and quantity requirements of training Large Language Models. In this talk i will do a deep dive on the philosophy behind this dataset, where it stands w.r.t the other datasets out there, how to recreate it based on the tools IBM has open sourced and some performance figures with it. This talk will be a followup of the talk given by Shahrokh Daijavad of IBM in the month of March.

**Prerequisites**

This is a follow up to our March 6, 2025 session “Introducing GneissWeb — a state-of-the-art LLM pre-training dataset“:

* Check the [GitHub show notes](https://github.com/The-AI-Alliance/community/blob/main/events/office-hours/2025-03-06__gneissweb.md)

* Re-watch on [YouTube](https://www.youtube.com/watch?v=O3Bocouv5hs&list=PLx3IPY60uZ16yyAJZvdOUFJNJv3MU3Rmy&index=2)

**About the presenter**

Bishwaranjan Bhattacharjee ([LinkedIn](https://www.linkedin.com/in/bishwaranjan-bhattacharjee-7460855)), Senior Technical Staff Member and Master Inventor, IBM Research

**About the AI Alliance**

The [AI Alliance](https://thealliance.ai) is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.


This page last updated Sunday 4 May 2025 at 19:17.

Problems? Report an error or inappropriate listing here.

Information displayed here is provided in good faith but we are not responsible for the content of any listing. Sometimes events can be cancelled or changed at short notice. Please check with the venue or organizer before you travel!

Oh no. Javascript is switched off in your browser.
Some bits of this website may not work unless you switch it on.