Amazon releases S3 plugin for PyTorch

The Transform Technology Summits start October 13th with Low-Code/No Code: Enabling Enterprise Agility. Register now!

Let the OSS Enterprise newsletter guide your open source journey! Sign up here.

Amazon today launched a plugin for Facebook’s PyTorch machine learning framework that’s designed to help data scientists access datasets stored in Amazon Web Services (AWS) Simple Storage Service (S3) buckets. Designed for low latency, Amazon says the plugin provides streaming data capabilities to datasets of any size, eliminating the need to provision local storage capacity.

“With this feature available in PyTorch deep learning containers, [users] can take advantage of using data from S3 buckets directly with PyTorch dataset and dataloader APIs without needing to download it first on local storage,” Amazon wrote in a blog post. “The Amazon S3 plugin for PyTorch provides a native experience of using data from Amazon S3 to PyTorch without adding complexity in … code.”

The S3 plugin for PyTorch provides a way to transfer data from S3 in parallel as well as support for streaming data from archive files. Amazon says that because the plugin is an implementation of PyTorch’s internal interfaces, no changes in existing code are required to make it work with S3.

The plugin itself is file-format agnostic and presents objects in S3 as a binary buffer, or blob. Users can apply additional transformations on the data received from S3 and extend the plugin to consume data from S3 and perform data processing as needed.

“Laying the foundation to access datasets while training can be critical for many enterprises that are looking to eliminate storing data locally and still get the desired performance. With availability of the S3 plugin for PyTorch, [users] can now stream data from S3 buckets and perform the large-scale data processing needed for training in PyTorch,” Amazon continued. “The S3 plugin for PyTorch was designed for ease of use and flexibility with PyTorch.”

PyTorch growth

PyTorch continues to see rapid uptake in the data science and developer community since its release in October 2016. In 2019, the number of contributors to the platform grew more than 50% year-over-year to nearly 1,200. And an analysis conducted by The Gradient found that every major AI conference in 2019 had a majority of papers implemented in PyTorch, with the volume PyTorch citations in papers growing by more than 194% in the first half of 2019.

A number of leading machine learning projects are built on top of PyTorch, including Uber’s Pyro and HuggingFace’s Transformers. Software developer Preferred Networks joined the ranks in 2019 with a pledge to move to PyTorch in the future. More recently, OpenAI said it would adopt PyTorch for all of its projects going forward.

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member