- Table View
- List View
Data Pipelines Pocket Reference: Moving And Processing Data For Analytics
by James DensmoreData pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack.You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions.You'll learn:What a data pipeline is and how it worksHow data is moved and processed on modern data infrastructure, including cloud platformsCommon tools and products used by data engineers to build pipelinesHow pipelines support analytics and reporting needsConsiderations for pipeline maintenance, testing, and alerting
Data Pipelines with Apache Airflow
by Julian de Ruiter Bas HarenslakData Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines.Summary A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task. About the book Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You&’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline&’s needs. What's inside Build, test, and deploy Airflow pipelines as DAGs Automate moving and transforming data Analyze historical datasets using backfilling Develop custom components Set up Airflow in production environments About the reader For DevOps, data engineers, machine learning engineers, and sysadmins with intermediate Python skills. About the author Bas Harenslak and Julian de Ruiter are data engineers with extensive experience using Airflow to develop pipelines for major companies. Bas is also an Airflow committer. Table of Contents PART 1 - GETTING STARTED 1 Meet Apache Airflow 2 Anatomy of an Airflow DAG 3 Scheduling in Airflow 4 Templating tasks using the Airflow context 5 Defining dependencies between tasks PART 2 - BEYOND THE BASICS 6 Triggering workflows 7 Communicating with external systems 8 Building custom components 9 Testing 10 Running tasks in containers PART 3 - AIRFLOW IN PRACTICE 11 Best practices 12 Operating Airflow in production 13 Securing Airflow 14 Project: Finding the fastest way to get around NYC PART 4 - IN THE CLOUDS 15 Airflow in the clouds 16 Airflow on AWS 17 Airflow on Azure 18 Airflow in GCP
Data Plane Development Kit (DPDK): A Software Optimization Guide to the User Space-Based Network Applications
by Heqing ZhuThis book brings together the insights and practical experience of some of the most experienced Data Plane Development Kit (DPDK) technical experts, detailing the trend of DPDK, data packet processing, hardware acceleration, packet processing and virtualization, as well as the practical application of DPDK in the fields of SDN, NFV, and network storage. The book also devotes many chunks to exploring various core software algorithms, the advanced optimization methods adopted in DPDK, detailed practical experience, and the guides on how to use DPDK.
Data Points
by Nathan YauA fresh look at visualization from the author of Visualize ThisWhether it's statistical charts, geographic maps, or the snappy graphical statistics you see on your favorite news sites, the art of data graphics or visualization is fast becoming a movement of its own. In Data Points: Visualization That Means Something, author Nathan Yau presents an intriguing complement to his bestseller Visualize This, this time focusing on the graphics side of data analysis. Using examples from art, design, business, statistics, cartography, and online media, he explores both standard-and not so standard-concepts and ideas about illustrating data. Shares intriguing ideas from Nathan Yau, author of Visualize This and creator of flowingdata.com, with over 66,000 subscribersFocuses on visualization, data graphics that help viewers see trends and patterns they might not otherwise see in a tableIncludes examples from the author's own illustrations, as well as from professionals in statistics, art, design, business, computer science, cartography, and moreExamines standard rules across all visualization applications, then explores when and where you can break those rulesCreate visualizations that register at all levels, with Data Points: Visualization That Means Something.
The Data Preparation Journey: Finding Your Way with R (Chapman & Hall/CRC Data Science Series)
by Martin Hugh MonkmanThe Data Preparation Journey: Finding Your Way With R introduces the principles of data preparation within in a systematic approach that follows a typical data science or statistical workflow. With that context, readers will work through practical solutions to resolving problems in data using the statistical and data science programming language R. These solutions include examples of complex real-world data, adding greater context and exposing the reader to greater technical challenges. This book focuses on the Import to Tidy to Transform steps. It demonstrates how “Visualise” is an important part of Exploratory Data Analysis, a strategy for identifying potential problems with the data prior to cleaning.This book is designed for readers with a working knowledge of data manipulation functions in R or other programming languages. It is suitable for academics for whom analyzing data is crucial, businesses who make decisions based on the insights gleaned from collecting data from customer interactions, and public servants who use data to inform policy and program decisions. The principles and practices described within The Data Preparation Journey apply regardless of the context.Key Features: Includes R package containing the code and data sets used in the book Comprehensive examples of data preparation from a variety of disciplines Defines the key principles of data preparation, from access to publication
Data Preprocessing in Data Mining (Intelligent Systems Reference Library #72)
by Francisco Herrera Salvador García Julián LuengoData Preprocessing for Data Mining addresses one of the most important issues within the well-known Knowledge Discovery from Data process. Data directly taken from the source will likely have inconsistencies, errors or most importantly, it is not ready to be considered for a data mining process. Furthermore, the increasing amount of data in recent science, industry and business applications, calls to the requirement of more complex tools to analyze it. Thanks to data preprocessing, it is possible to convert the impossible into possible, adapting the data to fulfill the input demands of each data mining algorithm. Data preprocessing includes the data reduction techniques, which aim at reducing the complexity of the data, detecting or removing irrelevant and noisy elements from the data. This book is intended to review the tasks that fill the gap between the data acquisition from the source and the data mining process. A comprehensive look from a practical point of view, including basic concepts and surveying the techniques proposed in the specialized literature, is given. Each chapter is a stand-alone guide to a particular data preprocessing topic, from basic concepts and detailed descriptions of classical algorithms, to an incursion of an exhaustive catalog of recent developments. The in-depth technical descriptions make this book suitable for technical professionals, researchers, senior undergraduate and graduate students in data science, computer science and engineering.
Data Preprocessing with Python for Absolute Beginners: Take your first steps in data preparation with Python
by AI Sciences OUThis book is dedicated to data preparation and explains how to perform different data preparation techniques on various datasets using different data preparation libraries written in the Python programming language.Key FeaturesA crash course in Python to fill any gaps in prerequisite knowledge and a solid foundation on which to build your new skillsA complete data preparation pipeline for your guided practiceThree real-world projects covering each major task to cement your learned skills in data preparation, classification, and regressionBook DescriptionThe book follows a straightforward approach. It is divided into nine chapters. Chapter 1 introduces the basic concept of data preparation and installation steps for the software that we will need to perform data preparation in this book. Chapter 1 also contains a crash course on Python, followed by a brief overview of different data types in Chapter 2. You will then learn how to handle missing values in the data, while the categorical encoding of numeric data is explained in Chapter 4.The second half of the course presents data discretization and describes the handling of outliers' process. Chapter 7 demonstrates how to scale features in the dataset. Subsequent chapters teach you to handle mixed and DateTime data type, balance data, and practice resampling. A full data preparation final project is also available at the end of the book.Different types of data preprocessing techniques have been explained theoretically, followed by practical examples in each chapter. Each chapter also contains an exercise that students can use to evaluate their understanding of the chapter's concepts. By the end of this course, you will have built a solid working knowledge in data preparation--the first steps to any data science or machine learning career and an essential skillset for any aspiring developer.The code bundle for this course is available at https://www.aispublishing.net/book-data-preprocessingWhat you will learnExplore different libraries for data preparationUnderstand data typesHandle missing dataEncode categorical dataDiscretize dataLearn to handle outliersPractice feature scalingHandle mixed and DateTime variables and imbalanced datasetsEmploy your new skills to complete projects in data preparation, classification, and regressionWho this book is forIn addition to beginners in data preparation with Python, this book can also be used as a reference manual by intermediate and experienced programmers. It contains data preprocessing code samples using multiple data visualization libraries.
Data Privacy: A runbook for engineers
by Nishant BhajariaEngineer privacy into your systems with these hands-on techniques for data governance, legal compliance, and surviving security audits.In Data Privacy you will learn how to: Classify data based on privacy risk Build technical tools to catalog and discover data in your systems Share data with technical privacy controls to measure reidentification risk Implement technical privacy architectures to delete data Set up technical capabilities for data export to meet legal requirements like Data Subject Asset Requests (DSAR) Establish a technical privacy review process to help accelerate the legal Privacy Impact Assessment (PIA) Design a Consent Management Platform (CMP) to capture user consent Implement security tooling to help optimize privacy Build a holistic program that will get support and funding from the C-Level and board Data Privacy teaches you to design, develop, and measure the effectiveness of privacy programs. You&’ll learn from author Nishant Bhajaria, an industry-renowned expert who has overseen privacy at Google, Netflix, and Uber. The terminology and legal requirements of privacy are all explained in clear, jargon-free language. The book&’s constant awareness of business requirements will help you balance trade-offs, and ensure your user&’s privacy can be improved without spiraling time and resource costs. About the technology Data privacy is essential for any business. Data breaches, vague policies, and poor communication all erode a user&’s trust in your applications. You may also face substantial legal consequences for failing to protect user data. Fortunately, there are clear practices and guidelines to keep your data secure and your users happy. About the book Data Privacy: A runbook for engineers teaches you how to navigate the trade-off s between strict data security and real world business needs. In this practical book, you&’ll learn how to design and implement privacy programs that are easy to scale and automate. There&’s no bureaucratic process—just workable solutions and smart repurposing of existing security tools to help set and achieve your privacy goals. What's inside Classify data based on privacy risk Set up capabilities for data export that meet legal requirements Establish a review process to accelerate privacy impact assessment Design a consent management platform to capture user consent About the reader For engineers and business leaders looking to deliver better privacy. About the author Nishant Bhajaria leads the Technical Privacy and Strategy teams for Uber. His previous roles include head of privacy engineering at Netflix, and data security and privacy at Google. Table of Contents PART 1 PRIVACY, DATA, AND YOUR BUSINESS 1 Privacy engineering: Why it&’s needed, how to scale it 2 Understanding data and privacy PART 2 A PROACTIVE PRIVACY PROGRAM: DATA GOVERNANCE 3 Data classification 4 Data inventory 5 Data sharing PART 3 BUILDING TOOLS AND PROCESSES 6 The technical privacy review 7 Data deletion 8 Exporting user data: Data Subject Access Requests PART 4 SECURITY, SCALING, AND STAFFING 9 Building a consent management platform 10 Closing security vulnerabilities 11 Scaling, hiring, and considering regulations
Data Privacy: Foundations, New Developments and the Big Data Challenge
by Vicenç TorraThis book offers a broad, cohesive overview of the field of data privacy. It discusses, from a technological perspective, the problems and solutions of the three main communities working on data privacy: statistical disclosure control (those with a statistical background), privacy-preserving data mining (those working with data bases and data mining), and privacy-enhancing technologies (those involved in communications and security) communities.Presenting different approaches, the book describes alternative privacy models and disclosure risk measures as well as data protection procedures for respondent, holder and user privacy. It also discusses specific data privacy problems and solutions for readers who need to deal with big data.
Data Privacy and Crowdsourcing: A Comparison of Selected Problems in China, Germany and the United States (Advanced Studies in Diginomics and Digitalization)
by Lars Hornuf Sonja Mangold Yayun YangThis open access book describes the most important legal sources and principles of data privacy and data protection in China, Germany and the United States. The authors collected privacy statements from more than 400 crowdsourcing platforms, which allowed them to empirically evaluate their data privacy and data protection practices. The book compares the practices in the three countries and develops empirically-grounded policy recommendations.A profound analysis on workers´ privacy in new forms of work in China, Germany, and the United States. Prof. Dr. Wolfgang Däubler, University of BremenThis is a comprehensive and timely book for legal and business scholars as well as practitioners, especially with the increasingly important role of raw data in machine learning and artificial intelligence.Professor Mingfeng Lin, Georgia Institute of Technology
Data Privacy and Trust in Cloud Computing: Building trust in the cloud through assurance and accountability (Palgrave Studies in Digital Business & Enabling Technologies)
by Theo Lynn John G. Mooney Lisa van der Werff Grace FoxThis open access book brings together perspectives from multiple disciplines including psychology, law, IS, and computer science on data privacy and trust in the cloud. Cloud technology has fueled rapid, dramatic technological change, enabling a level of connectivity that has never been seen before in human history. However, this brave new world comes with problems. Several high-profile cases over the last few years have demonstrated cloud computing's uneasy relationship with data security and trust. This volume explores the numerous technological, process and regulatory solutions presented in academic literature as mechanisms for building trust in the cloud, including GDPR in Europe. The massive acceleration of digital adoption resulting from the COVID-19 pandemic is introducing new and significant security and privacy threats and concerns. Against this backdrop, this book provides a timely reference and organising framework for considering how we will assure privacy and build trust in such a hyper-connected digitally dependent world. This book presents a framework for assurance and accountability in the cloud and reviews the literature on trust, data privacy and protection, and ethics in cloud computing.
Data Privacy for the Smart Grid
by Rebecca Herold Christine HertzogPrivacy for the Smart Grid provides easy-to-understand guidance on data privacy issues and the implications for creating privacy risk management programs, along with privacy policies and practices required to ensure Smart Grid privacy. It addresses privacy in electric, natural gas, and water grids from two different perspectives of the topic, one from a Smart Grid expert and another from a privacy and information security expert. While considering privacy in the Smart Grid, the book also examines the data created by Smart Grid technologies and machine-to-machine applications.
Data Privacy Games
by Lei Xu Chunxiao Jiang Yi Qian Yong RenWith the growing popularity of “big data”, the potential value of personal data has attracted more and more attention. Applications built on personal data can create tremendous social and economic benefits. Meanwhile, they bring serious threats to individual privacy. The extensive collection, analysis and transaction of personal data make it difficult for an individual to keep the privacy safe. People now show more concerns about privacy than ever before. How to make a balance between the exploitation of personal information and the protection of individual privacy has become an urgent issue.In this book, the authors use methodologies from economics, especially game theory, to investigate solutions to the balance issue. They investigate the strategies of stakeholders involved in the use of personal data, and try to find the equilibrium. The book proposes a user-role based methodology to investigate the privacy issues in data mining, identifying four different types of users, i.e. four user roles, involved in data mining applications. For each user role, the authors discuss its privacy concerns and the strategies that it can adopt to solve the privacy problems.The book also proposes a simple game model to analyze the interactions among data provider, data collector and data miner. By solving the equilibria of the proposed game, readers can get useful guidance on how to deal with the trade-off between privacy and data utility. Moreover, to elaborate the analysis on data collector’s strategies, the authors propose a contract model and a multi-armed bandit model respectively. The authors discuss how the owners of data (e.g. an individual or a data miner) deal with the trade-off between privacy and utility in data mining. Specifically, they study users’ strategies in collaborative filtering based recommendation system and distributed classification system. They built game models to formulate the interactions among data owners, and propose learning algorithms to find the equilibria.
Data Privacy Management and Autonomous Spontaneous Security: 8th International Workshop, DPM 2013, and 6th International Workshop, SETOP 2013, Egham, UK, September 12-13, 2013, Revised Selected Papers (Lecture Notes in Computer Science #8247)
by William M. Fitzgerald Nora Cuppens-Boulahia Joaquin Garcia-Alfaro Georgios Lioudakis Simon FoleyThis book constitutes the revised selected papers of the 8th International Workshop on Data Privacy Management, DPM 2013, and the 6th International Workshop on Autonomous and Spontaneous Security, SETOP 2013, held in Egham, UK, in September 2013 and co-located with the 18th European Symposium on Research in Computer Security (ESORICS 2013). The volume contains 13 full papers selected out of 46 submissions and 1 keynote lecturer from the DPM workshop and 6 full papers together with 5 short papers selected among numerous submissions to the SETOP workshop. The papers cover topics related to the management of privacy-sensitive information and automated configuration of security, focusing in particular on system-level privacy policies, administration of sensitive identifiers, data integration and privacy, engineering authentication and authorization, mobile security and vulnerabilities.
Data Privacy Management, and Security Assurance: 10th International Workshop, DPM 2015, and 4th International Workshop, QASA 2015, Vienna, Austria, September 21-22, 2015. Revised Selected Papers (Lecture Notes in Computer Science #9481)
by Alessandro Aldini Fabio Martinelli Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Neeraj SuriThis book constitutes the revised selected papers of the10th International Workshop on Data Privacy Management, DPM 2015, and the 4thInternational Workshop on Quantitative Aspects in Security Assurance, QASA2015, held in Vienna, Austria, in September 2015, co-located with the 20thEuropean Symposium on Research in Computer Security, ESORICS 2015. In the DPM 2015 workshop edition, 39 submissions werereceived. In the end, 8 full papers, accompanied by 6 short papers, 2 positionpapers and 1 keynote were presented in this volume. The QASA workshop series responds to the increasingdemand for techniques to deal with quantitative aspects of security assuranceat several levels of the development life-cycle of systems and services, fromrequirements elicitation to run-time operation and maintenance. QASA 2015received 11 submissions, of which 4 papers are presented in this volume aswell.
Data Privacy Management and Security Assurance: 11th International Workshop, DPM 2016 and 5th International Workshop, QASA 2016, Heraklion, Crete, Greece, September 26-27, 2016, Proceedings (Lecture Notes in Computer Science #9963)
by Alessandro Aldini Fabio Martinelli Neeraj Suri Vicenç Torra Giovanni LivragaThis book constitutes the refereed proceedings of the 11th International Workshop on Data Privacy Management, DPM 2016 and the 5th International Workshop on Quantitative Aspects in Security Assurance, QASA 2016, held in Heraklion, Crete, Greece, in September 2016. 9 full papers and 4 short papers out of 24 submissions are included in the DPM 2016 Workshop. They are organized around areas related to the management of privacy-sensitive informations, such as translation of high-level business goals into system-level privacy policies; administration of sensitive identifiers; data integration and privacy engineering. The QASA workshop centeres around research topics with a particular emphasis on the techniques for service oriented architectures, including aspects of dependability, privacy, risk and trust. Three full papers and one short papers out of 8 submissions are included in QASA 2016.
Data Privacy Management, Autonomous Spontaneous Security, and Security Assurance: 9th International Workshop, DPM 2014, 7th International Workshop, SETOP 2014, and 3rd International Workshop, QASA 2014, Wroclaw, Poland, September 10-11, 2014. Revised Selected Papers (Lecture Notes in Computer Science #8872)
by Alessandro Aldini Fabio Martinelli Joaquin Garcia-Alfaro Neeraj Suri Jordi Herrera-Joancomartí Emil Lupu Joachim PoseggaThis book constitutes the revised selected papers of the 9th International Workshop on Data Privacy Management, DPM 2014, the 7th International Workshop on Autonomous and Spontaneous Security, SETOP 2014, and the 3rd International Workshop on Quantitative Aspects in Security Assurance, held in Wroclaw, Poland, in September 2014, co-located with the 19th European Symposium on Research in Computer Security (ESORICS 2014). The volume contains 7 full and 4 short papers plus 1 keynote talk from the DPM workshop; 2 full papers and 1 keynote talk from the SETOP workshop; and 7 full papers and 1 keynote talk from the QASA workshop - selected out of 52 submissions. The papers are organized in topical sections on data privacy management; autonomous and spontaneous security; and quantitative aspects in security assurance.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2022 International Workshops, DPM 2022 and CBT 2022, Copenhagen, Denmark, September 26–30, 2022, Revised Selected Papers (Lecture Notes in Computer Science #13619)
by Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Nicola DragoniThis book constitutes the refereed proceedings and revised selected papers from the ESORICS 2022 International Workshops on Data Privacy Management, Cryptocurrencies and Blockchain Technology, DPM 2022 and CBT 2022, which took place in Copenhagen, Denmark, during September 26–30, 2022.For DPM 2022, 10 full papers out of 21 submissions have been accepted for inclusion in this book. They were organized in topical sections as follows: differential privacy and data analysis; regulation, artificial intelligence, and formal verification; and leakage quantification and applications. The CBT 2022 workshop accepted 7 full papers and 3 short papers from 18 submissions. The papers were organized in the following topical sections: Bitcoin, lightning network and scalability; and anonymity, fault tolerance and governance; and short papers.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2020 International Workshops, DPM 2020 and CBT 2020, Guildford, UK, September 17–18, 2020, Revised Selected Papers (Lecture Notes in Computer Science #12484)
by Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Jordi Herrera-JoancomartiThis book constitutes the revised selected post conference proceedings of the 15th International Workshop on Data Privacy Management, DPM 2020, and the 4th International Workshop on Cryptocurrencies and Blockchain Technology, CBT 2020, held in conjunction with the 25th European Symposium on Research in Computer Security, ESORICS 2020, held in Guildford, UK in September 2020.For the CBT Workshop 8 full and 4 short papers were accepted out of 24 submissions. The selected papers are organized in the following topical headings: Transactions, Mining, Second Layer and Inter-bank Payments. The DPM Workshop received 38 submissions from which 12 full and 5 short papers were selected for presentation. The papers focus on Second Layer, Signature Schemes, Formal Methods, Privacy, SNARKs and Anonymity.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2017 International Workshops, DPM 2017 and CBT 2017, Oslo, Norway, September 14-15, 2017, Proceedings (Lecture Notes in Computer Science #10436)
by Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Jordi Herrera-Joancomartí Hannes HartensteinThis book constitutes the refereed conference proceedings of the 12th International Workshop on Data Privacy Management, DPM 2017, on conjunction with the 22nd European Symposium on Research in computer Security, ESORICS 2017 and the First International Workshop on Cryprocurrencies and Blockchain Technology (CBT 2017) held in Oslo, Norway, in September 2017. The DPM Workshop received 51 submissions from which 16 full papers were selected for presentation. The papers focus on challenging problems such as translation of high-level buiness goals into system level privacy policies, administration of sensitive identifiers, data integration and privacy engineering. From the CBT Workshop six full papers and four short papers out of 27 submissions are included. The selected papers cover aspects of identity management, smart contracts, soft- and hardforks, proof-of-works and proof of stake as well as on network layer aspects and the application of blockchain technology for secure connect event ticketing.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2021 International Workshops, DPM 2021 and CBT 2021, Darmstadt, Germany, October 8, 2021, Revised Selected Papers (Lecture Notes in Computer Science #13140)
by Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Miguel Soriano Jose Luis Muñoz-TapiaThis book constitutes the refereed proceedings and revised selected papers from the 16th International Workshop on Data Privacy Management, DPM 2021, and the 5th International Workshop on Cryptocurrencies and Blockchain Technology, CBT 2021, which were held online on October 8, 2021, in conjunction with ESORICS 2021. The workshops were initially planned to take place in Darmstadt, Germany, and changed to an online event due to the COVID-19 pandemic.The DPM 2021 workshop received 25 submissions and accepted 7 full and 3 short papers for publication. These papers were organized in topical sections as follows: Risks and privacy preservation; policies and regulation; privacy and learning. For CBT 2021 6 full papers and 6 short papers were accepted out of 31 submissions. They were organized in topical sections as follows: Mining, consensus and market manipulation; smart contracts and anonymity.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2019 International Workshops, DPM 2019 and CBT 2019, Luxembourg, September 26–27, 2019, Proceedings (Lecture Notes in Computer Science #11737)
by Cristina Pérez-Solà Guillermo Navarro-Arribas Alex Biryukov Joaquin Garcia-AlfaroThis book constitutes the refereed conference proceedings of the 14th International Workshop on Data Privacy Management, DPM 2019, and the Third International Workshop on Cryptocurrencies and Blockchain Technology, CBT 2019, held in conjunction with the 24th European Symposium on Research in Computer Security, ESORICS 2019, held in Luxembourg in September 2019. For the CBT Workshop 10 full and 8 short papers were accepted out of 39 submissions. The selected papers are organized in the following topical headings: lightning networks and level 2; smart contracts and applications; and payment systems, privacy and mining. The DPM Workshop received 26 submissions from which 8 full and 2 short papers were selected for presentation. The papers focus on privacy preserving data analysis; field/lab studies; and privacy by design and data anonymization.Chapter 2, “Integral Privacy Compliant Statistics Computation,” and Chapter 8, “Graph Perturbation as Noise Graph Addition: a New Perspective for Graph Anonymization,” of this book are available open access under a CC BY 4.0 license at link.springer.com.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: Esorics 2018 International Workshops, Dpm 2018 And Cbt 2018, Barcelona, Spain, September 6-7, 2018, Proceedings (Lecture Notes in Computer Science #11025)
by Ruben Rios Giovanni Livraga Jordi Herrera-Joancomartí Joaquin Garcia-AlfaroThis book constitutes the refereed conference proceedings of the 2nd International Workshop on Cryprocurrencies and Blockchain Technology, CBT 2018, and the 13thInternational Workshop on Data Privacy Management, DPM 2018, on conjunction with the 23nd European Symposium on Research in Computer Security, ESORICS 2018, held in Barcelona, Spain, in September 2018. From the CBT Workshop 7 full and 8 short papers out of 39 submissions are included. The selected papers cover aspects of identity management, smart contracts, soft- and hardforks, proof-of-works and proof of stake as well as on network layer aspects and the application of blockchain technology for secure connect event ticketing. The DPM Workshop received 36 submissions from which 11 full and 5 short papers were selected for presentation. The papers focus on challenging problems such as translation of high-level buiness goals into system level privacy policies, administration of sensitive identifiers, data integration and privacy engineering.
Data Processing Techniques and Applications for Cyber-Physical Systems (Advances in Intelligent Systems and Computing #1088)
by Neil Yen Chuanchao Huang Yu-Wei ChanThis book covers cutting-edge and advanced research on data processing techniques and applications for Cyber-Physical Systems. Gathering the proceedings of the International Conference on Data Processing Techniques and Applications for Cyber-Physical Systems (DPTA 2019), held in Shanghai, China on November 15–16, 2019, it examines a wide range of topics, including: distributed processing for sensor data in CPS networks; approximate reasoning and pattern recognition for CPS networks; data platforms for efficient integration with CPS networks; and data security and privacy in CPS networks. Outlining promising future research directions, the book offers a valuable resource for students, researchers and professionals alike, while also providing a useful reference guide for newcomers to the field.
Data Processing with Optimus: Supercharge big data preparation tasks for analytics and machine learning with Optimus using Dask and PySpark
by Dr. Argenis Leon Luis AguirreWritten by the core Optimus team, this comprehensive guide will help you to understand how Optimus improves the whole data processing landscapeKey FeaturesLoad, merge, and save small and big data efficiently with OptimusLearn Optimus functions for data analytics, feature engineering, machine learning, cross-validation, and NLPDiscover how Optimus improves other data frame technologies and helps you speed up your data processing tasksBook DescriptionOptimus is a Python library that works as a unified API for data cleaning, processing, and merging data. It can be used for handling small and big data on your local laptop or on remote clusters using CPUs or GPUs.The book begins by covering the internals of Optimus and how it works in tandem with the existing technologies to serve your data processing needs. You'll then learn how to use Optimus for loading and saving data from text data formats such as CSV and JSON files, exploring binary files such as Excel, and for columnar data processing with Parquet, Avro, and OCR. Next, you'll get to grips with the profiler and its data types - a unique feature of Optimus Dataframe that assists with data quality. You'll see how to use the plots available in Optimus such as histogram, frequency charts, and scatter and box plots, and understand how Optimus lets you connect to libraries such as Plotly and Altair. You'll also delve into advanced applications such as feature engineering, machine learning, cross-validation, and natural language processing functions and explore the advancements in Optimus. Finally, you'll learn how to create data cleaning and transformation functions and add a hypothetical new data processing engine with Optimus.By the end of this book, you'll be able to improve your data science workflow with Optimus easily.What you will learnUse over 100 data processing functions over columns and other string-like valuesReshape and pivot data to get the output in the required formatFind out how to plot histograms, frequency charts, scatter plots, box plots, and moreConnect Optimus with popular Python visualization libraries such as Plotly and AltairApply string clustering techniques to normalize stringsDiscover functions to explore, fix, and remove poor quality dataUse advanced techniques to remove outliers from your dataAdd engines and custom functions to clean, process, and merge dataWho this book is forThis book is for Python developers who want to explore, transform, and prepare big data for machine learning, analytics, and reporting using Optimus, a unified API to work with Pandas, Dask, cuDF, Dask-cuDF, Vaex, and Spark. Although not necessary, beginner-level knowledge of Python will be helpful. Basic knowledge of the CLI is required to install Optimus and its requirements. For using GPU technologies, you'll need an NVIDIA graphics card compatible with NVIDIA's RAPIDS library, which is compatible with Windows 10 and Linux.