Learn how to access your S3 account in a few easy steps. This guide will tell you how to configure and connect to your data source, and will provide details about setting various permissions.
Data Source Basics
All data sources have a protocol and a label that you will use to reference your data. For instance, S3 is the protocol we’ll use in this guide and the label will be automatically assigned to your data connection as a unique identifier, but you may change it later if you wish.
Configure a New Data Connection to S3
To create a new data connection, first navigate to Algorithmia’s Data Portal, where you’ll notice there is a drop-down that says “New Data Source” where you’ll see several options:
Select “Amazon S3” and a form will open to configure an S3 connection. Here you will need to enter your AWS credentials.
For programmatic access to S3, authorization is done using an AWS Access Key ID and AWS Secret Access Key. You can learn more about these credentials, including how to create access keys for both root and IAM users, in the AWS Security Credentials documentation. The documentation also includes information about using access keys with S3 and best practices for managing access keys.
NOTE: While an algorithm NEVER sees credentials used to access data in S3, it is recommended that you provide an access key that:
- Can only list, get, and put objects to S3 (i.e., cannot perform other operations on your account)
- Can only access the paths in S3 that you want Algorithmia to access
The S3 connector supports access to S3 buckets with server-side KMS encryption enabled.
Setting Labels For Data Connections
You will need to provide a unique label for your S3 data connector, editable in the “Label” field.
We require these unique labels because you may want to add multiple connections to the same S3 account and they will each need a unique label for later reference in your algorithm. The reason you might want to have multiple connections to the same source is so you can set different access permissions to each connection (e.g., read from one file and write to a different folder).
NOTE: The unique label follows the protocol: “+unique_label://restricted_path”
Setting Path Restrictions for S3 Folder and File Access
The default path restrictions are set to allow access to all paths in your S3 account, but you may want to restrict your algorithm’s access to specific folders or files:
- Access to a single file: “Algorithmia/team.jpg”
- Access to everything in a specific folder: “Algorithmia/*”
NOTE: “Algorithmia*” might match more than you’d like, so if you want to match a directory exactly, end with a “/”.
Here we are setting our path restrictions to everything in the S3 bucket “Algorithmia”:
Setting Read and Write Access
The default access for your data source is set to read only, but you can change this to read and write access by checking the “Write Access” box.
NOTE: Write access also means you can delete anything in the path you’ve specified in the previous step, so be sure that you want read-write-delete access to the path you set in “Path Restriction”. Also, if your data source has read/write privileges, it means that an algorithm that you call also has read/write privileges to your data source.
Accessing Your Data
- client = Algorithmia.client(“YOUR_API_KEY”)
For example, to retrieve and print a file’s contents in Python:
The above examples work when accessing data from a local script or application code. If you’re writing an algorithm and accessing a data source from inside the algorithm, create the client without an API Key parameter:
client = Algorithmia.client()
If you’re using the Data API to call an algorithm that takes a file or directory as input, you can also provide it a file or directory from one of your data sources:
NOTE: An algorithm you call can only access your own data sources. This means that it is NOT possible for an algorithm to read data from your S3 and write that data to an account controlled by another algorithm author (another Algorithmia user). Algorithms do NOT have direct access to any credentials associated with your data sources, and can only access data from a data source using the Algorithmia API.
Data Source Routes and Data API Routes
Once a data source connection has been created and configured, all of the Algorithmia client code for interacting with the Data API for file or directory creation, deletion, and listing will function identically with a data source route and a data API route. The one exception to this is that we do not support generic ACLs for data sources, so the only way to update permissions for a data source is through the data portal where you created your data source connection.
If you’re implementing a new client or using cURL it is preferred to use the following URL structure:
We have tested to ensure that data source paths function in all of our Algorithmia clients, however:
- Python support was added in version 1.0.4
- NodeJS support was added in version 0.3.5
If you have any questions about Algorithmia please get in touch!