AWS Big Data Study Notes – AWS Machine Learning and IoT
The cheat sheet is on AWS Machine Learning (ML) and IoT.
Machine Learning (ML) Services
- Amazon Comprehend: natural language processing (NLP) service that uses machine learning to find insights and relationships in a text
- Amazon Lex: advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU) to recognize the intent of the text, to enable you to build applications with highly engaging user experiences and lifelike conversational interactions e.g. chatbots
- Amazon Polly: turns text into lifelike speech
- Amazon Rekognition: image and video analysis
- Amazon Transcribe: automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to their applications
- Amazon Textract: automatically extracts text and data from scanned documents and forms
- Amazon Translate: neural machine translation service that delivers fast, high-quality, and affordable language translation
- Amazon Forecast: delivers highly accurate forecasts, and is based on the same technology used at Amazon.com
- Amazon Personalize: create the individualized product and content recommendations for customers using their applications
- Amazon Comprehend Medical: natural language processing service that extracts relevant medical information from unstructured text using advanced machine learning models
AWS Machine Learning (ML)
- Datasources contain metadata associated with data inputs to ML
- ML Models generate predictions using the patterns extracted from the input data
- Evaluations measure the quality of ML models
- Batch Predictions asynchronously generate predictions for multiple input data observations
- Real-time Predictions synchronously generate predictions for individual data observations
- Pre-split the data – You can split the data into two data input locations, before uploading them to Amazon Simple Storage Service (Amazon S3) and creating two separate datasources with them.
- Amazon ML sequential split – You can tell Amazon ML to split your data sequentially when creating the training and evaluation datasources.
- Amazon ML random split – You can tell Amazon ML to split your data using a seeded random method when creating the training and evaluation datasources.
- ML Models
- Supervised learning
- Regression: predict a numeric value
- Multiclass: predict values that belong to a limited, pre-defined set of permissible values.
- Binary: predict values that can only have one of two states, such as true or false
- Unsupervised learning – not support
- Supervised learning
- Build, train, and deploy machine learning models quickly. Take away the limitation of 100GB training data from AWS machine learning
- SageMaker provides fully-managed and pre-built Jupyter notebooks to address common use cases
- SageMaker supports RL in multiple frameworks, including TensorFlow and MXNet, as well as custom-developed frameworks
- GPU accelerated deep learning, scaling effectively unlimited hyperparameter tuning jobs. P2, P3 and G3 instance type for deep learning
IoT Virtuous Cycle
- Amazon FreeRTOS – IoT operating system for microcontrollers
- Local connectivity libraries (WiFi & Ethernet)
- cloud connectivity and security libraries
- over-the-air updates with code signing
- AWS IoT Greengrass – Extend AWS IoT to the edge
- Connectors: pre-built integrations with third-party services, on-premises software, and AWS services
- Secrets Manager: securely store, access, rotate and manage secrets at the edge (local messages and triggers, local actions, data and state sync, security, local resource access, over-the-air updates, machine learning inference)
- Hardware Security Integration: store private device keys on a hardware secure element
- AWS IoT Device SDK (run on devices), AWS IoT API (HTTP/HTTPS requests), AWS SDKs (language-specific API), AWS CLI (commands)
- AWS IoT Device Tester: Windows/Linux/Mac test automation tool for connected devices
- AWS IoT Core – Secure device connectivity and messaging at scale
- Security and identity: such as X.509 certificates, AWS IAM credentials, or 3rd party authentication via AWS Cognito. Custom authentication and authorization strategy (e.g. JSON Web Token verification, OAuth provider callout)
- Device gateway: Allows secure, low-latency, low-overhead, bi-directional communication between connected devices and cloud and mobile applications
- Message broker: HTTP REST interface to publish, the MQTT protocol directly or MQTT over WebSocket to publish and subscribe.
- Three MQTT patterns: point-to-point, broadcast, and fan-in
- Rules engine: Enables continuous processing of data sent by connected devices. Configure rules to filter and transform the data. Configure rules to route the data to other AWS services
- Rule name, description, SQL statement(simple SQL syntax to filter messages received on an MQTT topic), SQL version, one or more actions(executing the rule e.g. write data to DynamoDB/S3/Kinesis, invoke Lambda function, publish to SNS topic), error handling action
- Device shadow: Enables cloud and mobile applications to query data sent from devices and send commands to devices, using a simple REST API, while letting AWS IoT Core handle the underlying communication with the devices.
- Registry: Register the devices and associate up to three custom attributes (e.g. certificates and MQTT client IDs) with each one. Also, build group registry by categorizing devices into groups.
- AWS IoT Device Management – Maintain fleet health at scale
- Batch fleet provisioning
- real-time fleet index & search
- Fine-grained device logging & monitoring
- Over-the-air updates
- AWS IoT Device Defender
- Audit device configurations
- Monitor device behavior
- Identify anomalies
- Generate alerts
- Investigate and mitigate security issues
- AWS IoT Things Graph – Connect devices and web services with little to no code
- AWS IoT Analytics – Run and operationalize sophisticated analytics on massive volumes of IoT data
- Process, enrich, store, analyze, and visualizes IoT data in one service
- AWS IoT SiteWise – Collect data with the local gateway, and structure and search IoT data from industrial equipment at scale
- AWS IoT Events – Detect and respond to events from data across IoT sensors and application
- Hands-on with DynamoDB
- AWS Data Warehouse – Build with Redshift and QuickSight
- AWS Relational Database Solution: Hands-on with AWS RDS
- Which is Right Hadoop Solution for You?
- Apache Hadoop Ecosystem Cheat Sheet
- Data Storage for Big Data: Aurora, Redshift or Hadoop?
- AWS Kinesis Data Streams vs. Kinesis Data Firehose
- Streaming Platforms: Apache Kafka vs. AWS Kinesis
- AWS Machine Learning on AWS Redshift Data
- Why Use AWS Redshift Spectrum with Data Lake
- How to Design AWS DynamoDB Data Modeling
- When Should Use Amazon DynamoDB Accelerator (AWS DAX)?
- Web Application with Aurora Serverless Cluster
- Top IT Certifications for 2018
- How I Passed AWS CSAA in 3 Months
- How to Pass AWS Certified Big Data Specialty
- AWS Elastic Beanstalk or AWS Elastic Container Service for Kubernetes (AWS EKS)
- How to Use AWS CodeStar to Manage Lambda Java Project from Source to Test Locally
Do you have any question about this article? Leave me a comment and I will try to help. If you liked this article, then please share it by clicking the social media icons. You can also find me on twitter, Google +, Facebook and YouTube.