I wonder if this is the same service that they use to scan public Github repositories for Secret AWS keys. I'll admit that I've accidentally committed a private key to a public repo before, and I received an email from AWS letting me know about it shortly after.
I suppose that its in Amazon's best interest to not have people hacking accounts and spinning up the maximum amount of EC2s to mine Bitcoins.
Not for Github, but the TruffleHog project on GitHub might be of interest to you. There is also SourceClear, which does the same for secrets in GitHub.
Note - AWS is monitoring AccessKey use and API thresholds to keep you informed.
>The San Diego-based startup, co-founded by a team that includes two former NSA employees
>Harvest.ai’s flagship, patent-pending AI product is called MACIE Analytics. It uses AI to monitor how a customer’s intellectual property is being accessed in real-time, assessing who is looking at, copying or moving particular documents, and where they are when they’re doing this, in order to identify suspicious patterns of behavior and flag potential data breaches before they’ve taken place. It bills the service as a way to combat the risk of insider attacks.
did they get the idea after seeing what happens at NSA with contractors/whoever downloading data to wherever?
Data insight is targeted at more user oriented unstructured content repositories (CIFS, NFS, SharePoint, OneDrive, SharePoint Online, Box), but the fundamentals are very similar: content classifiaction, data profiling, risk scoring, access pattern anomaly detection, access control remediation.
Classic, selling the poison and the cure. Access controls shouldn't be so convoluted and opaque that it requires a separate service to analyze your configurations. Crazy that we've made such a mess of the security landscape that we need AI systems to tell us if we're leaking info.
Not the case. I've seen seasoned developers (not to single them out) make simple stupid mistakes with the S3 bucket ACL, Permissions, and Policies. The issue has to do with the sheer laziness of "let's create unstructured data buckets, write once and forget it all" mentality. At some point, this sort of service can be useful in identifying the "crown jewels" within the buckets. Beyond that, the ACL is noAccess by default, so I can't agree with your assertion that AWS is somehow making it difficult to sell more services in favor of vendor lockin.
Yes, thank you for linking, but fail to see the correlation. This tool is scanning public HTTPS endpoints based on keywords in its dictionary to discover misconfigured buckets. AWS doesn't manage the bucket Perms/ACL, the customer does. AWS' shared-responsibility model clearly defines all of this. The customer is responsible for the bucket ACL, the same would apply if I ran my stack in a data center and went on to configure Apache/NGNIX with open Directory indexes that allowed anyone to traverse them.
If you have data that matters, it needs dual controls. The idea that a company would place PII on a site publicly accessible and protected only by ACL is ridiculous.
Instead of futzing with machine learning, use network or crypto controls to prevent access, and have a different chain of command manage that access in your company.
CloudTrail is indeed very cheap for customers — we record nearly all API calls and access to AWS resources and deliver these events to our subscribing customers. And the events are delivered for free, outside of S3 and Lambda “Data events” — gets, puts, and function invocation is billed at a very cheap rate.
(We recently released our AWS Lambda integration — you can now record all Lambda function invocations with us!)
Disclaimer: I’m a Software Engineer with the AWS CloudTrail team.
If I'm reading this right, you now have two paid services for detecting CT anomalies: Guard Duty, which is nosebleed expensive, and Macie, which is practically free. What's the difference between the two?
Macie Analayzes a subset of CloudTrail, not all actions and is about historical behavior (though for high sev actions, it is more point in time)
GuardDuty is looking for specific threats/attacks and can combine multiple sources of telemetry for more advanced correlation. E.g. A combination of VPC Flows + CloudTrail + DNS that trigger an alert when formed together while a single CloudTrail event may not have.
Within CT, what are examples of things Macie will catch, vs. things you'd need GD to catch?
If GD weren't so expensive, I wouldn't really care that much. But GD is so expensive that it can be hard to recommend, which is especially weird since the pricing for Macie CT is so low --- even weirder when you note that the pricing for Macie S3 is so high!
A healthy amount of data that looked like PII based on data range, potential secrets in buckets, CSVs, JSONs, Cloudtrail dumps, but also generated reports on dummy data and without fingerprinting of the live data, it wouldn't know what's real or not. The Cloudtrail feature is also useful since it provides user behavior analytics, based on use, etc.
Google's Data Loss Prevention is provided on G Suite and Google Cloud Platform (GCP). Both products use the same unified classifier codebase. G Suite DLP allows admins to enforce policy on Gmail and Drive files. On GCP, the Data Loss Prevention API allows developers to classify and redact sensitive data in virtually any data source in real-time or at-rest (e.g. Google Cloud Storage, BigQuery, AWS Redshift, AWS S3, Salesforce, Slack, on-prem, custom apps, etc.).
DLP API scans are not limited to 20MB and can scale up to virtually any size. API results can be used for programmatic automation of alerts, IAM/ACL settings, or other remediation and can be sent automatically into BigQuery for detailed analysis or reporting. In addition to classification, Google’s DLP API provides data masking tools for structured and unstructured data including format-preserving encryption, bucketing, and tokenization. This helps developers reduce unnecessary PII when collecting, storing, or sharing data.
(Note: I am the Product Manager for DLP API at Google Cloud)
On the one hand, AWS Macie only scans S3. Google DLP API works on S3, Gmail, Drive, GCS, DynamoDB, Redshift, BigQuery, Slack, SQL, Oracle, Oracle RAC, Zendesk, Twilio, Salesforce, and everything you can point an API at. If you want to use the same engine to test all your repos then Google DLP API is the right solution for you.
On the other hand, Macie has a GUI wizard. DLP API is an API. So if you can't code and just want to scan S3 then Macie might be for you, until Google DLP builds a GUI, if there's demand for that.
Someone should do a comparison of how successful each engine is at picking up sensitive data. I suspect Google DLP will be tuned better, but someone should do the test on a dummy data set and release results. That would be the most interesting comparison.
https://aws.amazon.com/macie/pricing/