- Table View
- List View
Data Governance For Dummies
by Jonathan ReichentalHow to build and maintain strong data organizations—the Dummies way Data Governance For Dummies offers an accessible first step for decision makers into understanding how data governance works and how to apply it to an organization in a way that improves results and doesn't disrupt. Prep your organization to handle the data explosion (if you know, you know) and learn how to manage this valuable asset. Take full control of your organization&’s data with all the info and how-tos you need. This book walks you through making accurate data readily available and maintaining it in a secure environment. It serves as your step-by-step guide to extracting every ounce of value from your data. Identify the impact and value of data in your business Design governance programs that fit your organization Discover and adopt tools that measure performance and need Address data needs and build a more data-centric business cultureThis is the perfect handbook for professionals in the world of data analysis and business intelligence, plus the people who interact with data on a daily basis. And, as always, Dummies explains things in terms anyone can understand, making it easy to learn everything you need to know.
Data Governance for Managers: The Driver of Value Stream Optimization and a Pacemaker for Digital Transformation (Management for Professionals)
by Lars Michael BollwegProfessional data management is the foundation for the successful digital transformation of traditional companies. Unfortunately, many companies fail to implement data governance because they do not fully understand the complexity of the challenge (organizational structure, employee empowerment, change management, etc.) and therefore do not include all aspects in the planning and implementation of their data governance. This book explains the driving role that a responsive data organization can play in a company's digital transformation. Using proven process models, the book takes readers from the basics, through planning and implementation, to regular operations and measuring the success of data governance. All the important decision points are highlighted, and the advantages and disadvantages are discussed in order to identify digitization potential, implement it in the company, and develop customized data governance. The book will serve as a useful guide for interested newcomers as well as for experienced managers.
Data Governance Success: Growing and Sustaining Data Governance
by Rupa MahantiWhile good data is an enterprise asset, bad data is an enterprise liability. Data governance enables you to effectively and proactively manage data assets throughout the enterprise by providing guidance in the form of policies, standards, processes and rules and defining roles and responsibilities outlining who will do what, with respect to data. While implementing data governance is not rocket science, it is not a simple exercise. There is a lot confusion around what data governance is, and a lot of challenges in the implementation of data governance. Data governance is not a project or a one-off exercise but a journey that involves a significant amount of effort, time and investment and cultural change and a number of factors to take into consideration to achieve and sustain data governance success. Data Governance Success: Growing and Sustaining Data Governance is the third and final book in the Data Governance series and discusses the following:• Data governance perceptions and challenges • Key considerations when implementing data governance to achieve and sustain success• Strategy and data governance• Different data governance maturity frameworks• Data governance – people and process elements• Data governance metricsThis book shares the combined knowledge related to data and data governance that the author has gained over the years of working in different industrial and research programs and projects associated with data, processes, and technologies and unique perspectives of Thought Leaders and Data Experts through Interviews conducted. This book will be highly beneficial for IT students, academicians, information management and business professionals and researchers to enhance their knowledge to support and succeed in data governance implementations. This book is technology agnostic and contains a balance of concepts and examples and illustrations making it easy for the readers to understand and relate to their own specific data projects.
Data Grab: The New Colonialism of Big Tech and How to Fight Back
by Ulises A. Mejias Nick CouldryA compelling argument that the extractive practices of today’s tech giants are the continuation of colonialism—and a crucial guide to collective resistance. Large technology companies like Meta, Amazon, and Alphabet have unprecedented access to our daily lives, collecting information when we check our email, count our steps, shop online, and commute to and from work. Current events are concerning—both the changing owners (and names) of billion-dollar tech companies and regulatory concerns about artificial intelligence underscore the sweeping nature of Big Tech’s surveillance and the influence such companies hold over the people who use their apps and platforms. As trusted tech experts Ulises A. Mejias and Nick Couldry show in this eye-opening and convincing book, this vast accumulation of data is not the accidental stockpile of a fast-growing industry. Just as nations stole territories for ill-gotten minerals and crops, wealth, and dominance, tech companies steal personal data important to our lives. It’s only within the framework of colonialism, Mejias and Couldry argue, that we can comprehend the full scope of this heist. Like the land grabs of the past, today’s data grab converts our data into raw material for the generation of corporate profit against our own interests. Like historical colonialism, today’s tech corporations have engineered an extractive form of doing business that builds a new social and economic order, leads to job precarity, and degrades the environment. These methods deepen global inequality, consolidating corporate wealth in the Global North and engineering discriminatory algorithms. Promising convenience, connection, and scientific progress, tech companies enrich themselves by encouraging us to relinquish details about our personal interactions, our taste in movies or music, and even our health and medical records. Do we have any other choice? Data Grab affirms that we do. To defy this new form of colonialism we will need to learn from previous forms of resistance and work together to imagine entirely new ones. Mejias and Couldry share the stories of voters, workers, activists, and marginalized communities who have successfully opposed unscrupulous tech practices. An incisive discussion of the digital media that’s transformed our world, Data Grab is a must-read for anyone concerned about privacy, self-determination, and justice in the internet age.
Data, Information, and Time: The DIT Model (SpringerBriefs in Computer Science)
by Hermann KopetzThis SpringerBrief presents the data- information-and-time (DIT) model that precisely clarifies the semantics behind the terms data, information and their relations to the passage of real time. According to the DIT model a data item is a symbol that appears as a pattern (e.g., visual, sound, gesture, or any bit pattern) in physical space. It is generated by a human or a machine in the current contextual situation and is linked to a concept in the human mind or a set of operations of a machine. An information item delivers the sense or the idea that a human mind extracts out of a given natural language proposition that contains meaningful data items. Since the given tangible, intangible and temporal context are part of the explanation of a data item, a change of context can have an effect on the meaning of data and the sense of a proposition. The DIT model provides a framework to show how the flow of time can change the truth-value of a proposition. This book compares our notions of data, information, and time in differing contexts: in human communication, in the operation of a computer system and in a biological system. In the final Section a few simple examples demonstrate how the lessons learned from the DIT-model can help to improve the design of a computer system.
Data Infrastructure Management: Insights and Strategies
by Greg SchulzThis book looks at various application and data demand drivers, along with data infrastructure options from legacy on premise, public cloud, hybrid, software-defined data center (SDDC), software data infrastructure (SDI), container as well as serverless along with infrastructure as a Service (IaaS), IT as a Service (ITaaS) along with related technology, trends, tools, techniques and strategies. Filled with example scenarios, tips and strategy considerations, the book covers frequently asked questions and answers to aid strategy as well as decision-making.
Data Integration in the Life Sciences: 11th International Conference, DILS 2015, Los Angeles, CA, USA, July 9-10, 2015, Proceedings (Lecture Notes in Computer Science #9162)
by Naveen Ashish Jose-Luis AmbiteThis book constitutes the proceedings of the 11th International Conference on Data Integration in the Life Sciences, DILS 2015, held in Los Angeles, CA, USA, in July 2015. The 24 papers presented in this volume were carefully reviewed and selected from 40 submissions. They are organized in topical sections named: data integration technologies; ontology and knowledge engineering for data integration; biomedical data standards and coding; medical research applications; and graduate student consortium.
Data Integration in the Life Sciences: 13th International Conference, Dils 2018, Hannover, Germany, November 20-21, 2018, Proceedings (Lecture Notes in Computer Science #11371)
by Sören Auer Maria-Esther VidalThis book constitutes revised selected papers from the 13th International Conference on Data Integration in the Life Sciences, DILS 2018, held in Hannover, Germany, in November 2018. The 5 full, 8 short, 3 poster and 4 demo papers presented in this volume were carefully reviewed and selected from 22 submissions. The papers are organized in topical sections named: big biomedical data integration and management; data exploration in the life sciences; biomedical data analytics; and big biomedical applications.
Data Integration in the Life Sciences: 12th International Conference, DILS 2017, Luxembourg, Luxembourg, November 14-15, 2017, Proceedings (Lecture Notes in Computer Science #10649)
by Marcos Da Silveira Cédric Pruski Reinhard SchneiderThis book constitutes the proceedings of the 12th International Conference on Data Integration in the Life Sciences, DILS 2017, held in Luxembourg, in November 2017. The 5 full papers and 5 short papers presented in this volume were carefully reviewed and selected from 16 submissions. They cover topics such as: life science data modelling; analysing, indexing, and querying life sciences datasets; annotating, matching, and sharing life sciences datasets; privacy and provenance of life sciences datasets.
Data Integration Life Cycle Management with SSIS: A Short Introduction By Example
by Andy LeonardBuild a custom BimlExpress framework that generates dozens of SQL Server Integration Services (SSIS) packages in minutes. Use this framework to execute related SSIS packages in a single command. You will learn to configure SSIS catalog projects, manage catalog deployments, and monitor SSIS catalog execution and history. Data Integration Life Cycle Management with SSIS shows you how to bring DevOps benefits to SSIS integration projects. Practices in this book enable faster time to market, higher quality of code, and repeatable automation. Code will be created that is easier to support and maintain. The book teaches you how to more effectively manage SSIS in the enterprise environment by drawing on the art and science of modern DevOps practices. What You'll Learn Generate dozens of SSIS packages in minutes to speed your integration projects Reduce the execution of related groups of SSIS packages to a single command Successfully handle SSIS catalog deployments and their projects Monitor the execution and history of SSIS catalog projects Manage your enterprise data integration life cycle through automated tools and utilities Who This Book Is For Database professionals working with SQL Server Integration Services in enterprise environments. The book is especially useful to those readers following, or wishing to follow, DevOps practices in their use of SSIS.
Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2021 (Algorithms for Intelligent Systems)
by Robert Bestak I. Jeena Jacob Selvanayaki Kolandapalayam ShanmugamThe book is a collection of peer-reviewed best selected research papers presented at the International Conference on Data Intelligence and Cognitive Informatics (ICDICI 2021), organized by SCAD College of Engineering and Technology, Tirunelveli, India, during July 16–17, 2021. This book discusses new cognitive informatics tools, algorithms, and methods that mimic the mechanisms of the human brain which leads to an impending revolution in understating a large amount of data generated by various smart applications. The book includes novel work in data intelligence domain which combines with the increasing efforts of artificial intelligence, machine learning, deep learning, and cognitive science to study and develop a deeper understanding of the information processing systems.
Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2023 (Algorithms for Intelligent Systems)
by I. Jeena Jacob Selwyn Piramuthu Przemyslaw Falkowski-GilskiThe book is a collection of peer-reviewed best selected research papers presented at the International Conference on Data Intelligence and Cognitive Informatics (ICDICI 2023), organized by SCAD College of Engineering and Technology, Tirunelveli, India, during June 27–28, 2023. This book discusses new cognitive informatics tools, algorithms and methods that mimic the mechanisms of the human brain which lead to an impending revolution in understating a large amount of data generated by various smart applications. The book includes novel work in data intelligence domain which combines with the increasing efforts of artificial intelligence, machine learning, deep learning and cognitive science to study and develop a deeper understanding of the information processing systems.
Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2022 (Algorithms for Intelligent Systems)
by I. Jeena Jacob Selvanayaki Kolandapalayam Shanmugam Ivan IzoninThe book is a collection of peer-reviewed best selected research papers presented at the International Conference on Data Intelligence and Cognitive Informatics (ICDICI 2021), organized by SCAD College of Engineering and Technology, Tirunelveli, India, during July 6–7, 2022. This book discusses new cognitive informatics tools, algorithms and methods that mimic the mechanisms of the human brain which lead to an impending revolution in understating a large amount of data generated by various smart applications. The book includes novel work in data intelligence domain which combines with the increasing efforts of artificial intelligence, machine learning, deep learning and cognitive science to study and develop a deeper understanding of the information processing systems.
Data Intelligence and Cognitive Informatics: Proceedings of ICDICI 2020 (Algorithms for Intelligent Systems)
by I. Jeena Jacob Selvanayaki Kolandapalayam Shanmugam Selwyn Piramuthu Przemyslaw Falkowski-GilskiThis book discusses new cognitive informatics tools, algorithms and methods that mimic the mechanisms of the human brain which lead to an impending revolution in understating a large amount of data generated by various smart applications. The book is a collection of peer-reviewed best selected research papers presented at the International Conference on Data Intelligence and Cognitive Informatics (ICDICI 2020), organized by SCAD College of Engineering and Technology, Tirunelveli, India, during 8–9 July 2020. The book includes novel work in data intelligence domain which combines with the increasing efforts of artificial intelligence, machine learning, deep learning and cognitive science to study and develop a deeper understanding of the information processing systems.
Data-Intensive Computing
by Ian Gorton Deborah K. GracioThe world is awash with digital data from social networks, blogs, business, science and engineering. Data-intensive computing facilitates understanding of complex problems that must process massive amounts of data. Through the development of new classes of software, algorithms and hardware, data-intensive applications can provide timely and meaningful analytical results in response to exponentially growing data complexity and associated analysis requirements. This emerging area brings many challenges that are different from traditional high-performance computing. This reference for computing professionals and researchers describes the dimensions of the field, the key challenges, the state of the art and the characteristics of likely approaches that future data-intensive problems will require. Chapters cover general principles and methods for designing such systems and for managing and analyzing the big data sets of today that live in the cloud and describe example applications in bioinformatics and cybersecurity that illustrate these principles in practice.
Data Intensive Computing for Biodiversity (Studies in Computational Intelligence #485)
by Sarinder K. Dhillon Amandeep S. SidhuThis book is focused on the development of a data integration framework for retrieval of biodiversity information from heterogeneous and distributed data sources. The data integration system proposed in this book links remote databases in a networked environment, supports heterogeneous databases and data formats, links databases hosted on multiple platforms, and provides data security for database owners by allowing them to keep and maintain their own data and to choose information to be shared and linked. The book is a useful guide for researchers, practitioners, and graduate-level students interested in learning state-of-the-art development for data integration in biodiversity.
Data-Intensive Radio Astronomy: Bringing Astrophysics to the Exabyte Era (Astrophysics and Space Science Library #472)
by Eleni Vardoulaki Marta Dembska Alexander Drabent Matthias HoeftRadio astronomy is irreversibly moving towards the exabyte era. In the advent of all-sky radio observations, efficient tools and methods to manage the large data volume generated have become imperative. This book brings together the knowledge of several different research fields to present an overview of current state-of-the-art methods in data-intensive radio astronomy. Its approach is comprehensive and data-centric, offering a coherent look at the four distinct parts of the data lifecycle: Data creation, storage and archivesData processingPost-processing and data analysisData access and reuse Large data management has been the topic of discussion within the astronomical community for decades. Some relevant areas explored in this volume are: ongoing technological innovations in interferometers and computing facilities; difficulties and possible solutions for the huge processing demands of radio telescope projects such as LOFAR, MeerKat, ASKAP; concepts for reliable and fast storage for archiving; and more. Written by experts across astrophysics, high-energy particle physics, data science, and computer science, this volume will help researchers and advanced students better understand the current state of data-intensive radio astronomy and tackle the major problems that may arise from future instruments.
Data-Intensive Science (Chapman And Hall/crc Computational Science Ser. #18)
by Terence Critchlow Kerstin Kleese Van DamData-intensive science has the potential to transform scientific research and quickly translate scientific progress into complete solutions, policies, and economic success. But this collaborative science is still lacking the effective access and exchange of knowledge among scientists, researchers, and policy makers across a range of disciplines. Bringing together leaders from multiple scientific disciplines, Data-Intensive Science shows how a comprehensive integration of various techniques and technological advances can effectively harness the vast amount of data being generated and significantly accelerate scientific progress to address some of the world's most challenging problems. In the book, a diverse cross-section of application, computer, and data scientists explores the impact of data-intensive science on current research and describes emerging technologies that will enable future scientific breakthroughs. The book identifies best practices used to tackle challenges facing data-intensive science as well as gaps in these approaches. It also focuses on the integration of data-intensive science into standard research practice, explaining how components in the data-intensive science environment need to work together to provide the necessary infrastructure for community-scale scientific collaborations. Organizing the material based on a high-level, data-intensive science workflow, this book provides an understanding of the scientific problems that would benefit from collaborative research, the current capabilities of data-intensive science, and the solutions to enable the next round of scientific advancements.
Data-ism: Inside the Big Data Revolution
by Steve LohrCoal, iron ore and oil were the fuel of the Industrial Revolution. Today's economies and governments are powered by something far less tangible: the explosive abundance of digital data.Steve Lohr, the New York Times' chief technology reporter, charts the ascent of Data-ism, the dominating philosophy of the day in which data is at the forefront of everything and decisions of all kinds are based on data analysis rather than experience and intuition. Taking us behind the scenes and introducing the DOPs (Data Oriented-People), the key personalities behind this revolution, he reveals how consuming the bits and bytes of the masses is transforming the nature of business and governance in unforeseen ways. But what are losing in the process and what new dangers await?
Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else
by Steve LohrBy one estimate, 90 percent of all of the data in history was created in the last two years. In 2014, International Data Corporation calculated the data universe at 4.4 zettabytes, or 4.4 trillion gigabytes. That much information, in volume, could fill enough slender iPad Air tablets to create a stack two-thirds of the way to the moon. Now, that's Big Data.Coal, iron ore, and oil were the key productive assets that fueled the Industrial Revolution. The vital raw material of today's information economy is data.In Data-ism, New York Times reporter Steve Lohr explains how big-data technology is ushering in a revolution in proportions that promise to be the basis of the next wave of efficiency and innovation across the economy. But more is at work here than technology. Big data is also the vehicle for a point of view, or philosophy, about how decisions will be—and perhaps should be—made in the future. Lohr investigates the benefits of data while also examining its dark side. Data-ism is about this next phase, in which vast Internet-scale data sets are used for discovery and prediction in virtually every field. It shows how this new revolution will change decision making—by relying more on data and analysis, and less on intuition and experience—and transform the nature of leadership and management. Focusing on young entrepreneurs at the forefront of data science as well as on giant companies such as IBM that are making big bets on data science for the future of their businesses, Data-ism is a field guide to what is ahead, explaining how individuals and institutions will need to exploit, protect, and manage data to stay competitive in the coming years. With rich examples of how the rise of big data is affecting everyday life, Data-ism also raises provocative questions about policy and practice that have wide implications for everyone.The age of data-ism is here. But are we ready to handle its consequences, good and bad?
The Data Journalism Handbook: How Journalists Can Use Data to Improve the News
by Jonathan Gray Lucy Chambers Liliana BounegruWhen you combine the sheer scale and range of digital information now available with a journalist’s "nose for news" and her ability to tell a compelling story, a new world of possibility opens up. With The Data Journalism Handbook, you’ll explore the potential, limits, and applied uses of this new and fascinating field.This valuable handbook has attracted scores of contributors since the European Journalism Centre and the Open Knowledge Foundation launched the project at MozFest 2011. Through a collection of tips and techniques from leading journalists, professors, software developers, and data analysts, you’ll learn how data can be either the source of data journalism or a tool with which the story is told—or both.Examine the use of data journalism at the BBC, the Chicago Tribune, the Guardian, and other news organizationsExplore in-depth case studies on elections, riots, school performance, and corruptionLearn how to find data from the Web, through freedom of information laws, and by "crowd sourcing"Extract information from raw data with tips for working with numbers and statistics and using data visualizationDeliver data through infographics, news apps, open data platforms, and download links
Data Jujitsu: The Art of Turning Data into Product
by Dj PatilAcclaimed data scientist DJ Patil details a new approach to solving problems in Data Jujitsu.Learn how to use a problem's "weight" against itself to:Break down seemingly complex data problems into simplified partsUse alternative data analysis techniques to examine themUse human input, such as Mechanical Turk, and design tricks that enlist the help of your users to take short cuts around tough problemsLearn more about the problems before starting on the solutions—and use the findings to solve them, or determine whether the problems are worth solving at all.
Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering
by Harsh Chawla Pankaj KhattarGet a 360-degree view of how the journey of data analytics solutions has evolved from monolithic data stores and enterprise data warehouses to data lakes and modern data warehouses. You willThis book includes comprehensive coverage of how:To architect data lake analytics solutions by choosing suitable technologies available on Microsoft AzureThe advent of microservices applications covering ecommerce or modern solutions built on IoT and how real-time streaming data has completely disrupted this ecosystemThese data analytics solutions have been transformed from solely understanding the trends from historical data to building predictions by infusing machine learning technologies into the solutionsData platform professionals who have been working on relational data stores, non-relational data stores, and big data technologies will find the content in this book useful. The book also can help you start your journey into the data engineer world as it provides an overview of advanced data analytics and touches on data science concepts and various artificial intelligence and machine learning technologies available on Microsoft Azure.What Will You LearnYou will understand the:Concepts of data lake analytics, the modern data warehouse, and advanced data analyticsArchitecture patterns of the modern data warehouse and advanced data analytics solutionsPhases—such as Data Ingestion, Store, Prep and Train, and Model and Serve—of data analytics solutions and technology choices available on Azure under each phaseIn-depth coverage of real-time and batch mode data analytics solutions architectureVarious managed services available on Azure such as Synapse analytics, event hubs, Stream analytics, CosmosDB, and managed Hadoop services such as Databricks and HDInsightWho This Book Is ForData platform professionals, database architects, engineers, and solution architects
Data Lake Development with Big Data
by Beulah Salome Purra Pradeep PasupuletiExplore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies About This Book * Comprehend the intricacies of architecting a Data Lake and build a data strategy around your current data architecture * Efficiently manage vast amounts of data and deliver it to multiple applications and systems with a high degree of performance and scalability * Packed with industry best practices and use-case scenarios to get you up-and-running Who This Book Is For This book is for architects and senior managers who are responsible for building a strategy around their current data architecture, helping them identify the need for a Data Lake implementation in an enterprise context. The reader will need a good knowledge of master data management, information lifecycle management, data governance, data product design, data engineering, and systems architecture. Also required is experience of Big Data technologies such as Hadoop, Spark, Splunk, and Storm. What You Will Learn * Identify the need for a Data Lake in your enterprise context and learn to architect a Data Lake * Learn to build various tiers of a Data Lake, such as data intake, management, consumption, and governance, with a focus on practical implementation scenarios * Find out the key considerations to be taken into account while building each tier of the Data Lake * Understand Hadoop-oriented data transfer mechanism to ingest data in batch, micro-batch, and real-time modes * Explore various data integration needs and learn how to perform data enrichment and data transformations using Big Data technologies * Enable data discovery on the Data Lake to allow users to discover the data * Discover how data is packaged and provisioned for consumption * Comprehend the importance of including data governance disciplines while building a Data Lake In Detail A Data Lake is a highly scalable platform for storing huge volumes of multistructured data from disparate sources with centralized data management services. It eliminates the need for up-front modeling and rigid data structures by allowing schema-less writes. Data Lakes make it possible to ask complex far-reaching questions to find out hidden data patterns and relationships. This book explores the potential of Data Lakes and explores architectural approaches to building data lakes that ingest, index, manage, and analyze massive amounts of data using batch and real-time processing frameworks. It guides you on how to go about building a Data Lake that is managed by Hadoop and accessed as required by other Big Data applications such as Spark, Storm, Hive, and so on, to create an environment in which data from different sources can be meaningfully brought together and analyzed. Data Lakes can be viewed as having three capabilities--intake, management, and consumption. This book will take readers through each of these processes of developing a Data Lake and guide them (using best practices) in developing these capabilities. It will also explore often ignored, yet crucial considerations while building Data Lakes, with the focus on how to architect data governance, security, data quality, data lineage tracking, metadata management, and semantic data tagging. By the end of this book, you will have a good understanding of building a Data Lake for Big Data. You will be able to utilize Data Lakes for efficient and easy data processing and analytics. Style and approach Data Lake Development with Big Data provides architectural approaches to building a Data Lake. It follows a use case-based approach where practical implementation scenarios of each key component are explained. It also helps you understand how these use cases are implemented in a Data Lake. The chapters are organized in a way that mimics the sequential data flow evidenced in a Data Lake.
Data Lake for Enterprises
by Tomcy John Pankaj MisraA practical guide to implementing your enterprise data lake using Lambda Architecture as the base About This Book • Build a full-fledged data lake for your organization with popular big data technologies using the Lambda architecture as the base • Delve into the big data technologies required to meet modern day business strategies • A highly practical guide to implementing enterprise data lakes with lots of examples and real-world use-cases Who This Book Is For Java developers and architects who would like to implement a data lake for their enterprise will find this book useful. If you want to get hands-on experience with the Lambda Architecture and big data technologies by implementing a practical solution using these technologies, this book will also help you. What You Will Learn • Build an enterprise-level data lake using the relevant big data technologies • Understand the core of the Lambda architecture and how to apply it in an enterprise • Learn the technical details around Sqoop and its functionalities • Integrate Kafka with Hadoop components to acquire enterprise data • Use flume with streaming technologies for stream-based processing • Understand stream- based processing with reference to Apache Spark Streaming • Incorporate Hadoop components and know the advantages they provide for enterprise data lakes • Build fast, streaming, and high-performance applications using ElasticSearch • Make your data ingestion process consistent across various data formats with configurability • Process your data to derive intelligence using machine learning algorithms In Detail The term "Data Lake" has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together. This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient. By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake. Style and approach The book takes a pragmatic approach, showing ways to leverage big data technologies and lambda architecture to build an enterprise-level data lake.