TruePrivacy + Amazon S3
Scan S3 buckets for files containing personal data.
Overview
Amazon S3 is AWS's object storage service, widely used for storing files, exports, backups, and logs — many of which contain personal data. TruePrivacy connects to S3 using an IAM Role to scan buckets and objects for files containing personal data, classify the data found, and add it to your data inventory.
This integration helps organizations discover personal data that has accumulated in S3 over time — including customer data exports, analytics log files, HR document archives, and backup files — ensuring it is governed as part of the overall privacy programme.
What TruePrivacy can do
Data types accessed
- •CSV and JSON data exports
- •Log files containing user identifiers
- •Document files with personal information
- •Database backup files
- •Analytics event files
DSR capabilities
- Identify which S3 files contain a specific data subject's records
- Flag files requiring manual review for deletion
- Classify personal data found in scanned files
How it works
- 1
Create an IAM Role for TruePrivacy with S3 read access to the buckets you want to scan, using AWS's cross-account role assumption pattern.
- 2
TruePrivacy scans your specified S3 buckets, sampling file contents to identify personal data in CSV, JSON, Parquet, and text formats.
- 3
Discovered files and their personal data content are classified and added to your data inventory.
- 4
Alerts are raised for buckets or files presenting elevated risk (public access, unencrypted data, no retention policy).
Frequently asked questions
TruePrivacy needs s3:ListBucket and s3:GetObject permissions on the buckets you want to scan. We use cross-account IAM role assumption — you create an IAM Role in your account and TruePrivacy assumes it for scanning. No long-lived credentials are stored.
TruePrivacy currently scans CSV, JSON, JSONL, Parquet, TSV, and plain text files. Binary files (images, videos, compiled binaries) and encrypted files are flagged but not content-scanned. Compressed files (gzip, zip) are decompressed and scanned where supported.
TruePrivacy uses a sampling strategy for large buckets — scanning a representative sample of files to identify personal data patterns, then flagging the bucket for inclusion in your data inventory. For high-risk buckets, you can configure full scanning of every file.
Connect TruePrivacy to Amazon S3 today
Start your free trial and connect Amazon S3 in 15 minutes.