No
Yes
View More
View Less
Working...
Close
OK
Cancel
Confirm
System Message
Delete
My Schedule
An unknown error has occurred and your request could not be completed. Please contact support.
Scheduled
Scheduled
Wait Listed
Personal Calendar
Speaking
Conference Event
Meeting
Interest
There aren't any available sessions at this time.
Conflict Found
This session is already scheduled at another time. Would you like to...
Loading...
Please enter a maximum of {0} characters.
{0} remaining of {1} character maximum.
Please enter a maximum of {0} words.
{0} remaining of {1} word maximum.
must be 50 characters or less.
must be 40 characters or less.
Session Summary
We were unable to load the map image.
This has not yet been assigned to a map.
Search Catalog
Reply
Replies ()
Search
New Post
Microblog
Microblog Thread
Post Reply
Post
Your session timed out.
This web page is not optimized for viewing on a mobile device. Visit this site in a desktop browser to access the full set of features.
2019 GTC San Jose
Add to My Interests
Remove from My Interests

S9542 - Tensor Core Programmability and Profiling for AI and HPC Applications

Session Speakers
Session Description

Tensor Cores, introduced with Volta GPU architecture, achieve up to 125 TFlops throughput by mixing half- and single-precision floating point operations. We'll show how to take advantage of Tensor Cores for applications in deep learning and HPC. We will discuss how to use mixed precision to decrease memory use during deep learning training and deployment, a technique that allows for larger model sizes. We will also demonstrate programming matrix-multiply-and-accumulate on Tensor Cores for HPC applications. Using NVIDIA Nsight, we will profile an application to understand use of Tensor Cores.


Additional Information
HPC and AI
AI Application Deployment/Inference, HPC and AI
General
Intermediate technical
Talk
50 minutes
Session Schedule