Pasted image 20221101113016.png

Amazon S3¶

object based storage
max storage 5 TB
if uploading more than 5gb use multi upload
also has metadata, tags and optional version id
private by default
read after write consistency
3,5k writes and 5k puts per second

S3 vs EBS ¶

Pasted image 20230517135805.png

Requester Pays¶

Requester pays network and downloads cost
use to share large data sets with other accounts
user must be authorized in AWS

S3 event notifications¶

trigger an action (lambda) on api request to the bucket
can also use event bride for more targets and filters
- Event Bridge Capabilities - Archive, replay events, reliable delivery
- Multiple destinations - Step Functions, Kinesis Streams / Firehose

Targets¶

Lambda
SNS
docs/messaging/SQS
EventBride
can take more than 1 min to trigger

S3 Access Logs¶

monitor traffic requests to an S3 bucket
can be used for access and security audits
understand s3 bill

MFA-protected API Access¶

enforce MFA for access to S3 resources

MFA delete¶

enforce MFA for delete object action

Versioning¶

once object versioning is enabled , it can't be disabled - only suspended
- when versioning is enabled each object has a
  - key = name & id incremental int
  - by default uses "delete marker" (soft delete that hides all prev versions)
when object versioning is suspended , AWS still charges you for all the object versions previously generated.
if an object is updated the old version still persists within the bucket
if an object is deleted the new version will be a delete marker, and the old version will still persist within the bucket

Commands¶

S3 Sync¶

copies objects between buckets
does diff
used last modified to update

if versioned only copies current version

aws s3 sync s3://DOC-EXAMPLE-BUCKET-SOURCE s3://DOC-EXAMPLE-BUCKET-TARGET

Storage classes¶

can transition to infrequent access only after 30 days
all classes have 99.99% availability and Durability (11 9's Durability)
EXCEPT One Zone IA (99,5 Availability, 11 9's Durability)

| Storage Class |Life cycle | Use Cases | Minimum Capacity Charge | Minimum Storage Duration Charge | Retrieval Fee | First Byte Latency | Availability Zones Supported | |------------------------------|-----------------------------------------------------------|-------------------------|--------------------------------|----------------|--------------------|--------------------------------------------| | S3 Standard | > * (all other s3 storage tiers) | Frequently accessed data that requires low-latency. | None | None | None | Milliseconds | At least 3 | | S3 Standard-IA | > Intelligent-Tiering > * | Backups, disaster recovery, and infrequently accessed data | 128 KB | 30 days | Per Gb | Milliseconds | At least 3 | | S3 Intelligent-Tiering | > One Zone-IA > * | Data with changing access patterns or unpredictable usage | None | 30 days | None | Milliseconds | At least 3 | | S3 One Zone-IA | Glacier Flexible Retrieval > * | Storing secondary backups or replicas of easily reproduced data | 128 KB | 30 days | Per Gb | Milliseconds | Single Availability Zone | | S3 Glacier | Glacier Flexible Retrieval > * | Archiving rarely accessed data with long-term retention | 40 KB | 90 days | Per Gb | Minutes to hours | At least 3 | | S3 Glacier Instant Retrieval | Glacier Flexible Retrieval > * | Rapid access to archived data with a slight increase in cost | 40 KB | 90 days | Per Gb | Milliseconds | At least 3 | | S3 Glacier Flexible Retrieval| Glacier Deep Archive | Bulk retrieval of large amounts of data with lower costs | 40 KB | 90 days | Per Gb | Minutes to hours | At least 3 | | S3 Glacier Deep Archive | No transitions | Long-term data retention with extremely low-cost requirements | 40 KB | 180 days | Per Gb | 12 hours - 48 hour | At least 3 | | S3 Outposts | No transitions | Hybrid cloud storage extending S3 to on-premises infrastructure | None | None | None | Milliseconds | Depends on the Outpost configuration |

Storage Class	Use Cases	Ideal Use Case Scenario
S3 Standard	-Frequently accessed data. -Important data -Non replaceable	Dynamic website content, mobile applications, real-time analytics, content distribution
S3 Intelligent-Tiering	-long-lived data, with changing or unknow patterns	Data with varying access frequencies, cost optimization for unknown access patterns
S3 Standard-IA	-Long lived data -Infrequent accessed data	Long-term storage of infrequently accessed data, compliance and regulatory data
S3 One Zone-IA	-Long lived data -Infrequent -NON-Critical data -Replaceable data	Secondary copies of on-premises data, transitional data or short-term storage
S3 Glacier Instant Retrieval	-Long lived data -rarely accessed e.g once per qrt.	Rapid retrieval of archived data for urgent access or frequent retrieval
S3 Glacier Flexible Retrieval	-archival data where frequent or realtime access isn't needed (e.g yearly)	Bulk retrieval of large datasets, data migration, data analysis
S3 Glacier Deep Archive	-archival data that rearly if ever needs to be accessed (e.g Legal or regulation data storage)	Long-term data archiving, digital preservation, compliance data
S3 Outposts	Hybrid cloud storage extending S3 to on-premises infrastructure	Hybrid cloud environments, applications with data residency requirements

S3 Standard¶

Default AWS storage class that's used in S3, should be user default as well.
S3 Standard is region resilient, and can tolerate the failure of an AZ.
can transition to all s3 tiers.
Objects are replicated to at least 3+ AZs when they are uploaded.
99.999999999% durability
99.99% availability
Offers low latency and high throughput.
No minimums, delays, or penalties.
Billing is storage fee, data transfer fee, and request based charge.

All of the other storage classes trade some of these compromises for another.

S3 Standard-IA¶

Designed for data that isn't accessed often, long term storage, backups, disaster recovery files. The requirement for data to be safe is most important.

Designed for less frequent rapid access when it is needed.
can transition to:
- S3 Intelligent Tiering
- One Zone IA
- Glacier Instant retrieval
- Glacier Flexible retrieval
- Glacier Deep Archive.
Cheaper rate to store data you will rarely need, but if you do need it, you need it quickly.
~54% cheaper than S3 standard.
Minimum 128KB charge for each object.
- Cost benefits might be negated for smaller objects.
30 days minimum duration charge per object.
Retrieval fee for every GB of data retrieved from this class.
99.9% availability, slightly lower than standard S3.

One Zone-IA¶

Great choice for secondary copies of primary data or backup copies.

If data is easily creatable from a primary data set, this would be a great place to store the output from another data set.

Designed for data that is accessed less frequently but needed quickly.
can transition to Glacier Flexible retrieval & Glacier Deep Archive.
80% of the base cost of Standard-IA.
Same minimum size and duration fee as Standard-IA
Data is only stored in a single AZ, no 3+ AZ replication.
99.5% availability, lower than Standard-IA

S3 Glacier Instant Retrieval¶

Suitable for applications that require infrequent access to archived data but need fast retrieval times when accessed.

Like S3-Standard-IA .. cheaper storage, more expensive retrieval , longer minimum.
can transition to Glacier Flexible retrieval & Glacier Deep Archive.
Has per GB data retrieval fee, cost increases with frequent data access.
should be used for long-lived data, accessed once per qtr with millisecond access.
Minimum duration charge of 90 days, objects can be stored for less but min billing always applies.
Minimum capacity charge of 128 kb per object.

S3 Glacier Flexible Retrieval¶

Archival data where frequent or realtime access isn't needed (e.g yearly) Minutes to hours retrieval.

can transition to Glacier Deep archive.
faster data retrieval, with retrieval times of minutes.
Minimum duration charge of 90 days, objects can be stored for less but min billing always applies.
Minimum capacity charge of 40 kb per object.
objects cannot be made publicly accessible
- any access of data(beyond object metadata) requires a retrieval process.
Retrieval Process first retrieves the data to S3 Standard-IA temporarly.
Retrieval Process offers three retrieval options which vary in speeds:
- Expedited: Data retrieval (1-5 mins)
- Standard: (3-5 hours)
- Bulk: (5-12 hours)
Faster = More Expensive $ $ $

Provisoned Retrival Capacity¶

guranteed up to 150mbs retrival speed

Glacier Deep Archive¶

Archival data that rarely if ever needs to be accessed - hours or days for retrieval.

Retrieval Process first retrieves the data to S3 Standard-IA temporarly.
Retrieval Process offers three retrieval options which vary in speeds:
- Standard: Data retrieval (12 hours)
- Bulk: (up to 48 hours)
First byte latency = hours or days
Objects cannot be made publicly accessible
- any access of data (beyond object metadata) requires a retrieval process.

Livecycle Rules¶

Transiton Actions¶

move to diffrent storage tier after time
can transition to infrequent access only after 30 days
Automates the moving of objects between the different storage tiers.
Can be used in conjunction with versioning.
Lifecycle rules can be applied to both current and previous versions of an object.
These actions can be classified as follows:
- Transition actions – In which you define when objects transition to another storage class. For example, you may choose to transition objects to the STANDARD_IA (IA, for infrequent access) storage class 30 days after creation, or archive objects to the GLACIER storage class one year after creation.
- Expiration actions – In which you specify when the objects expire. Then Amazon S3 deletes the expired objects on your behalf.

Transfer Limitations for Livecycle Rules¶

Pasted image 20221101123650.png

Expiration Actions¶

delte access logs
delete old versions of files
delte incomplete multi part uploads
can apply to paths or full bucket

Tranfer Accelerator¶

reqs: - bucket needs to be enabled - bucket name can't containt periods in the name - bucket name needs to be DNS compatible - fast transfer over long distances - uses cloudfront edge locations - routes via optimized path

Multipard uploads¶

transmit object in chuncks
if one chunk fails, only that cuck needs to be retransmitted
over 100mb should be considered
10 000 max parts, each part can range : 5mb - 5 Gb
last part can be smaller than 5 Mb
parts can fail and be restarted.
improved throughput

S3 Bucket Policies¶

add or deny permissions to objects
can be attached to users groups or buckets
can grant access to other aws accounts
can restrict based on multi coditions for the requets (ip, time, ssl )

ACL¶

can also use ACL to grant another account access

S3 Replication¶

There are two types of S3 replication available.

Cross-Region Replication (CRR)
- Allows the replication of objects from a source bucket to a destination bucket in different AWS regions.
Same-Region Replication (SRR)
- Allows the replication of objects from a source bucket to a destination bucket in the same AWS region.

Why use replication¶

SRR
Log Aggregation SRR
Sync production and test accounts SRR
Resilience with strict Sovereignty requirements CRR
Global resilience improvements CRR
Latency reduction
must have versioning in both buckets
can be cross region or same region
can be bucket in other account
async
use batch replication to replicate existing ones
can't chain replication

Encryption¶

Buckets aren't encrypted, objects are. Multiple objects in a bucket can use a different encryption methods.

Client-Side encryption¶

Objects being encrypted by the client before they leave.
Data being sent the whole time it is sent as cypher text.
AWS has no way to see into the data.
The encryption burden is on the customer and not AWS.

Server-Side encryption¶

Data is encrypted in transit using HTTPS
Data inside the tunnel is still in its original unencrypted form.
Data reaches S3 server in plain text form.
After S3 sees the data, it is then encrypted.
AWS will handle some or all of these processes.

Encryption Option	Use case	Customer Responsabilities	AWS Responsibilities	Tradeoffs
SSE-S3-default	- Provides encryption at rest using S3 managed keys. - provides no admin overhead.	-No additional key management tasks required.	-encryption. -decryption. -key management (generation & rotation)	-Limited control over the encryption keys. -Keys are managed by AWS.
SSE-KMS	- Provides encryption at rest using AWS KMS managed keys.	-Configure and manage the KMS keys	-encryption. -decryption. -key management (generation & rotation)	-more control and additional security features through KMS. -Customer manages and controls the KMS keys.
SSE-C (with Customer-Provided Keys)	-flexible. -customer need to generate & rotate keys themselves. - dont want to handle encryption and decrytion on client side.	-Generate. -manage. - generate encryption keys. -rotate keys.	-encryption. -decryption.	-allow to offload cpu usage on client side. - provides complete control and ownership of key management. -good for heavy regulated enviorments.

Pasted image 20230517155855.png

Server-side Encryption with Amazon S3 - SEE S3 (AES-256) - default¶

aws managed keys
Server Side encryption of objects
aws manged
replication by default

SSE-S3 Caveats¶

Not good for regulatory environment where keys and access must be controlled.
No control key material rotation.
No role separation.
- A full S3 admin can decrypt data and open objects.

SEE-C¶

Customer is responsible for the keys themselves.
Customer still needs to generate and manage the key.
can use same key for all objects or individual keys for every single object.
S3 services manages the actual encryption and decryption
- Offloads CPU requirements for encryption.
S3 will see the unencrypted object throughout this process.

SSE KMS¶

Need to specify need key for object in new bucket
need IAM role to decrypt with source key and encrypt with new key
might get KMS throttling errors, need to ask for service quotas increase
can use multi region kms keys

Bucket keys¶

offload some of the work when using in conjunction with KMS
cloud trail kms events now show the bucket not the object
works with replication .. the object encryption is maintained
if replicating plaintext to a bucket using bucket keys the object is encrypted at the destination side (ETAG changes)

Bucket Policies¶

common use cases:
- grant access to other AWS accounts
- anonymous access to a bucket.
Identity policies are limited to single owner account
Resource Policy allow/deny same or different accounts
- allow /deny anonymous principals

Security Hints¶

hints to choose one over the other , based on the scenario.

Identity: controlling dif resources
Identity: you have a preference for IAM
Identity: Same account
Bucket: when you need to only control the security of s3
bucket: anonymnous or cross-account
ACLs: NEVER - unless you must

Use cases¶

compliance
latency
replication across accounts
log aggregation
sync data to test account from prod

Performance¶

100-200ms
3,5k PUT/COPY/POST/DELETE prefix
5,5k GET/HEAD per prefix
use multi part upload for parallel upload
- recommended for files > 100 MB
- MUST use for files > 5Gb
S3 transfer acceleration to upload to edge location instead of directly to s3, also use for download if files are bigger than 1GB
byte range fetches
- parallelize GETs by requesting specific byte ranges.
- better resilience in case of failures - only retry small byte range
- allow to speed up downloads
- can be used to retrieve only partial data (e.g head of a file)
- multi part download

Select & Glacier Select¶

Retrieve less data using SQL by performing server-side filtering
can filter by rows & columns (simple SQL statements)
less network transfer
less CPU cost client-side.
filter data server side to reduce network cost (100 lines out of 1 mil lines csv)

Pasted image 20230530185528.png

Batch operations¶

modify all metadata and properties
copy objects between buckets
encrypt all unencrypted objects
modify ACL and tags
restore objects from glacier
invoke lambda for each object
use s3 select to filter then batch
S3 Batch operations manages:
- retries
- tracks progress
- sends completion notifications
- generate reports

Vault¶

Objects can not be modified or deleted
used for compliance
Can lock the vault policy from future edits

Server Access Logging¶

If CloudTrail is not enough information you use this

Features¶

including referer
including turn around time

Pre-Signed URLs¶

URL Expiration
- S3 Console - 1 min to 720 mins (12 hours)
- aws cli - 604800 secs - 168 hours (default 3600)
- users given a pre-signed url inherit the permissionso f the user that generated the url for GET / PUT.
use cases
- Allow only logged-in users to download a premium video from your S3 bucket.
- allow an ever-changing list of users to download files by generating URLs dynamically.
- allow temporally a user to upload a file to a precise location in your s3 bucket.

Access Points (AP)¶

simplify security management for s3 buckets
each AP has
- its own DNS name (Internet origin or VPC origin)
- an Access point policy (similar to a bucket policy) -
  - manage security at scale

Pasted image 20230530151206.png

Best practices¶

Use the appropriate AWS Region:
- Choose the AWS Region closest to your users or application to minimize latency and improve performance.
Select the right S3 storage class:
- Choose the appropriate S3 storage class based on your data access patterns and requirements.
- For frequently accessed data, use S3 Standard or S3 Intelligent-Tiering.
- For infrequently accessed or long-term archival data, consider S3 Standard-IA, S3 One Zone-IA, or S3 Glacier.
Enable S3 Transfer Acceleration:
- S3 Transfer Acceleration uses optimized edge locations to speed up data uploads and downloads.
- Enable it for faster transfer speeds, especially for large files or over long distances.
Optimize object key names:
- Avoid using sequential or timestamp-based object key names as it can lead to performance limitations.
- Use a random or hashed naming pattern to distribute objects across multiple partitions for better performance.
Leverage S3 multipart upload:
- For large file uploads (typically over 100 MB), use multipart upload to improve performance and resiliency.
- Multipart upload allows parallelization of upload parts and supports resumable uploads.
Enable S3 byte-range fetches:
- For applications that need to retrieve partial objects, use byte-range fetches to fetch only the required portions of an object.
- This reduces the amount of data transferred, improving performance and reducing costs.
Enable S3 Transfer Manager:
- Use the AWS SDK's S3 Transfer Manager to optimize file transfers by parallelizing the upload or download of multiple parts.
Utilize S3 Select and Glacier Select:
- S3 Select allows you to retrieve only the required data from objects using SQL-like queries, reducing data transfer and processing time.
- Glacier Select enables data retrieval from S3 Glacier archives based on specific query criteria, reducing retrieval times.
Enable S3 Cross-Region Replication (CRR):
- If you have users or applications in different regions, enable S3 CRR to replicate data across regions for faster access.
Monitor and optimize bucket performance:
- Monitor S3 metrics, such as request latency, request rates, and data transfer rates, to identify performance bottlenecks.
- Consider adjusting bucket configurations, such as increasing provisioned throughput, to optimize performance.