No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
My Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device. Visit this site in a desktop browser to access the full set of features.
2019 GTC San Jose
Add to My Interests
Remove from My Interests

S9449 - Building a Distributed GPU DataFrame with Python

Session Speakers
Session Description

We'll discuss the GPU Open Analytics Initiative, an effort to develop a GPU data frame that can handle a large-scale data-analytics workflow and support out-of-core cases in which the data is larger than GPU memory. We'll describe how we divided the problem into two parts, developing an elementary single-GPU data frame to handle in-memory use cases, and then combining multiple single-GPU data frames into a distributed multi-GPU data frame for out-of-core use cases. We'll briefly introduce our distributed GPU data frame and its capabilities. We'll then explain how we scaled out by using Dask, a distributed computation framework in Python, to orchestrate the single-GPU data frames and achieve out-of-core capability with minimal effort. Our idea can be generalized to build custom distributed GPU computation by composing single-GPU libraries.


Additional Information
Accelerated Data Science
Accelerated Data Science, Tools/Libraries
Software
Intermediate technical
Talk
50 minutes
Session Schedule