- Table View
- List View
Data Modeling with Microsoft Power BI: Self-Service and Enterprise Data Warehouse with Power BI
by Markus Ehrenmueller-JensenData modeling is the single most overlooked feature in Power BI Desktop, yet it's what sets Power BI apart from other tools on the market. This practical book serves as your fast-forward button for data modeling with Power BI, Analysis Services tabular, and SQL databases. It serves as a starting point for data modeling, as well as a handy refresher.Author Markus Ehrenmueller-Jensen, founder of Savory Data, shows you the basic concepts of Power BI's semantic model with hands-on examples in DAX, Power Query, and T-SQL. If you're looking to build a data warehouse layer, chapters with T-SQL examples will get you started. You'll begin with simple steps and gradually solve more complex problems.This book shows you how to:Normalize and denormalize with DAX, Power Query, and T-SQLApply best practices for calculations, flags and indicators, time and date, role-playing dimensions and slowly changing dimensionsSolve challenges such as binning, budget, localized models, composite models, and key value with DAX, Power Query, and T-SQLDiscover and tackle performance issues by applying solutions in DAX, Power Query, and T-SQLWork with tables, relations, set operations, normal forms, dimensional modeling, and ETL
Data Modeling with Tableau: A practical guide to building data models using Tableau Prep and Tableau Desktop
by Kirk MunroeSave time analyzing volumes of data using best practices to extract, model, and create insights from your dataKey FeaturesMaster best practices in data modeling with Tableau Prep Builder and Tableau DesktopApply Tableau Server and Cloud to create and extend data modelsBuild organizational data models based on data and content governance best practicesBook DescriptionTableau is unlike most other BI platforms that have a single data modeling tool and enterprise data model (for example, LookML from Google's Looker). That doesn't mean Tableau doesn't have enterprise data governance; it is both robust and very flexible. This book will help you build a data-driven organization with the proper use of Tableau governance models.Data Modeling with Tableau is an extensive guide, complete with step-by-step explanations of essential concepts, practical examples, and hands-on exercises. As you progress through the chapters, you will learn the role that Tableau Prep Builder and Tableau Desktop each play in data modeling. You'll also explore the components of Tableau Server and Cloud that make data modeling more robust, secure, and performant. Moreover, by extending data models for Ask and Explain Data, you'll gain the knowledge required to extend analytics to more people in their organizations, leading to better data-driven decisions. Finally, this book will get into the entire Tableau stack and get the techniques required to build the right level of governance into Tableau data models for the right use cases.By the end of this Tableau book, you'll have a firm understanding of how to leverage data modeling in Tableau to benefit your organization.What you will learnShowcase Tableau published data sources and embedded connectionsApply Ask Data in data cataloging and natural language queryExhibit features of Tableau Prep Builder with hands-on exercisesModel data with Tableau Desktop through examplesFormulate a governed data strategy using Tableau Server and CloudOptimize data models for Ask and Explain DataWho this book is forThis book is for data analysts and business analysts who are looking to expand their data skills, offering a broad foundation to build better data models in Tableau for easier analysis and better query performance.It will also benefit individuals responsible for making trusted and secure data available to their organization through Tableau, such as data stewards and others who work to take enterprise data and make it more accessible to business analysts.
Data Privacy Games
by Yi Qian Chunxiao Jiang Lei Xu Yong RenWith the growing popularity of “big data”, the potential value of personal data has attracted more and more attention. Applications built on personal data can create tremendous social and economic benefits. Meanwhile, they bring serious threats to individual privacy. The extensive collection, analysis and transaction of personal data make it difficult for an individual to keep the privacy safe. People now show more concerns about privacy than ever before. How to make a balance between the exploitation of personal information and the protection of individual privacy has become an urgent issue.In this book, the authors use methodologies from economics, especially game theory, to investigate solutions to the balance issue. They investigate the strategies of stakeholders involved in the use of personal data, and try to find the equilibrium. The book proposes a user-role based methodology to investigate the privacy issues in data mining, identifying four different types of users, i.e. four user roles, involved in data mining applications. For each user role, the authors discuss its privacy concerns and the strategies that it can adopt to solve the privacy problems.The book also proposes a simple game model to analyze the interactions among data provider, data collector and data miner. By solving the equilibria of the proposed game, readers can get useful guidance on how to deal with the trade-off between privacy and data utility. Moreover, to elaborate the analysis on data collector’s strategies, the authors propose a contract model and a multi-armed bandit model respectively. The authors discuss how the owners of data (e.g. an individual or a data miner) deal with the trade-off between privacy and utility in data mining. Specifically, they study users’ strategies in collaborative filtering based recommendation system and distributed classification system. They built game models to formulate the interactions among data owners, and propose learning algorithms to find the equilibria.
Data Privacy Management, Cryptocurrencies and Blockchain Technology: ESORICS 2017 International Workshops, DPM 2017 and CBT 2017, Oslo, Norway, September 14-15, 2017, Proceedings (Lecture Notes in Computer Science #10436)
by Joaquin Garcia-Alfaro Guillermo Navarro-Arribas Jordi Herrera-Joancomartí Hannes HartensteinThis book constitutes the refereed conference proceedings of the 12th International Workshop on Data Privacy Management, DPM 2017, on conjunction with the 22nd European Symposium on Research in computer Security, ESORICS 2017 and the First International Workshop on Cryprocurrencies and Blockchain Technology (CBT 2017) held in Oslo, Norway, in September 2017. The DPM Workshop received 51 submissions from which 16 full papers were selected for presentation. The papers focus on challenging problems such as translation of high-level buiness goals into system level privacy policies, administration of sensitive identifiers, data integration and privacy engineering. From the CBT Workshop six full papers and four short papers out of 27 submissions are included. The selected papers cover aspects of identity management, smart contracts, soft- and hardforks, proof-of-works and proof of stake as well as on network layer aspects and the application of blockchain technology for secure connect event ticketing.
Data Privacy and Crowdsourcing: A Comparison of Selected Problems in China, Germany and the United States (Advanced Studies in Diginomics and Digitalization)
by Lars Hornuf Sonja Mangold Yayun YangThis open access book describes the most important legal sources and principles of data privacy and data protection in China, Germany and the United States. The authors collected privacy statements from more than 400 crowdsourcing platforms, which allowed them to empirically evaluate their data privacy and data protection practices. The book compares the practices in the three countries and develops empirically-grounded policy recommendations.A profound analysis on workers´ privacy in new forms of work in China, Germany, and the United States. Prof. Dr. Wolfgang Däubler, University of BremenThis is a comprehensive and timely book for legal and business scholars as well as practitioners, especially with the increasingly important role of raw data in machine learning and artificial intelligence.Professor Mingfeng Lin, Georgia Institute of Technology
Data Privacy and GDPR Handbook
by Sanjay SharmaThe definitive guide for ensuring data privacy and GDPR compliance Privacy regulation is increasingly rigorous around the world and has become a serious concern for senior management of companies regardless of industry, size, scope, and geographic area. The Global Data Protection Regulation (GDPR) imposes complex, elaborate, and stringent requirements for any organization or individuals conducting business in the European Union (EU) and the European Economic Area (EEA)—while also addressing the export of personal data outside of the EU and EEA. This recently-enacted law allows the imposition of fines of up to 5% of global revenue for privacy and data protection violations. Despite the massive potential for steep fines and regulatory penalties, there is a distressing lack of awareness of the GDPR within the business community. A recent survey conducted in the UK suggests that only 40% of firms are even aware of the new law and their responsibilities to maintain compliance. The Data Privacy and GDPR Handbook helps organizations strictly adhere to data privacy laws in the EU, the USA, and governments around the world. This authoritative and comprehensive guide includes the history and foundation of data privacy, the framework for ensuring data privacy across major global jurisdictions, a detailed framework for complying with the GDPR, and perspectives on the future of data collection and privacy practices. Comply with the latest data privacy regulations in the EU, EEA, US, and others Avoid hefty fines, damage to your reputation, and losing your customers Keep pace with the latest privacy policies, guidelines, and legislation Understand the framework necessary to ensure data privacy today and gain insights on future privacy practices The Data Privacy and GDPR Handbook is an indispensable resource for Chief Data Officers, Chief Technology Officers, legal counsel, C-Level Executives, regulators and legislators, data privacy consultants, compliance officers, and audit managers.
Data Privacy and Trust in Cloud Computing: Building trust in the cloud through assurance and accountability (Palgrave Studies in Digital Business & Enabling Technologies)
by Theo Lynn John G. Mooney Grace Fox Lisa van der WerffThis open access book brings together perspectives from multiple disciplines including psychology, law, IS, and computer science on data privacy and trust in the cloud. Cloud technology has fueled rapid, dramatic technological change, enabling a level of connectivity that has never been seen before in human history. However, this brave new world comes with problems. Several high-profile cases over the last few years have demonstrated cloud computing's uneasy relationship with data security and trust. This volume explores the numerous technological, process and regulatory solutions presented in academic literature as mechanisms for building trust in the cloud, including GDPR in Europe. The massive acceleration of digital adoption resulting from the COVID-19 pandemic is introducing new and significant security and privacy threats and concerns. Against this backdrop, this book provides a timely reference and organising framework for considering how we will assure privacy and build trust in such a hyper-connected digitally dependent world. This book presents a framework for assurance and accountability in the cloud and reviews the literature on trust, data privacy and protection, and ethics in cloud computing.
Data Privacy for the Smart Grid
by Rebecca Herold Christine HertzogPrivacy for the Smart Grid provides easy-to-understand guidance on data privacy issues and the implications for creating privacy risk management programs, along with privacy policies and practices required to ensure Smart Grid privacy. It addresses privacy in electric, natural gas, and water grids from two different perspectives of the topic, one from a Smart Grid expert and another from a privacy and information security expert. While considering privacy in the Smart Grid, the book also examines the data created by Smart Grid technologies and machine-to-machine applications.
Data Processing for the AHP/ANP (Quantitative Management #1)
by Yong Shi Yi Peng Gang Kou Daji ErguThe positive reciprocal pairwise comparison matrix (PCM) is one of the key components which is used to quantify the qualitative and/or intangible attributes into measurable quantities. This book examines six understudied issues of PCM, i.e. consistency test, inconsistent data identification and adjustment, data collection, missing or uncertain data estimation, and sensitivity analysis of rank reversal. The maximum eigenvalue threshold method is proposed as the new consistency index for the AHP/ANP. An induced bias matrix model (IBMM) is proposed to identify and adjust the inconsistent data, and estimate the missing or uncertain data. Two applications of IBMM including risk assessment and decision analysis, task scheduling and resource allocation in cloud computing environment, are introduced to illustrate the proposed IBMM.
Data Protection Law: A Comparative Analysis of Asia-Pacific and European Approaches
by Robert Walters Leon Trakman Bruno ZellerThis book provides a comparison and practical guide for academics, students, and the business community of the current data protection laws in selected Asia Pacific countries (Australia, India, Indonesia, Japan Malaysia, Singapore, Thailand) and the European Union.The book shows how over the past three decades the range of economic, political, and social activities that have moved to the internet has increased significantly. This technological transformation has resulted in the collection of personal data, its use and storage across international boundaries at a rate that governments have been unable to keep pace. The book highlights challenges and potential solutions related to data protection issues arising from cross-border problems in which personal data is being considered as intellectual property, within transnational contracts and in anti-trust law. The book also discusses the emerging challenges in protecting personal data and promoting cyber security. The book provides a deeper understanding of the legal risks and frameworks associated with data protection law for local, regional and global academics, students, businesses, industries, legal profession and individuals.
Data Protection in a Post-Pandemic Society: Laws, Regulations, Best Practices and Recent Solutions
by Chaminda Hewage Yogachandran Rahulamathavan Deepthi RatnayakeThis book offers the latest research results and predictions in data protection with a special focus on post-pandemic society. This book also includes various case studies and applications on data protection. It includes the Internet of Things (IoT), smart cities, federated learning, Metaverse, cryptography and cybersecurity. Data protection has burst onto the computer security scene due to the increased interest in securing personal data. Data protection is a key aspect of information security where personal and business data need to be protected from unauthorized access and modification. The stolen personal information has been used for many purposes such as ransom, bullying and identity theft. Due to the wider usage of the Internet and social media applications, people make themselves vulnerable by sharing personal data. This book discusses the challenges associated with personal data protection prior, during and post COVID-19 pandemic. Some of these challenges are caused by the technological advancements (e.g. Artificial Intelligence (AI)/Machine Learning (ML) and ChatGPT). In order to preserve the privacy of the data involved, there are novel techniques such as zero knowledge proof, fully homomorphic encryption, multi-party computations are being deployed. The tension between data privacy and data utility drive innovation in this area where numerous start-ups around the world have started receiving funding from government agencies and venture capitalists. This fuels the adoption of privacy-preserving data computation techniques in real application and the field is rapidly evolving. Researchers and students studying/working in data protection and related security fields will find this book useful as a reference.
Data Protection in the Financial Services Industry
by Mandy WebsterPrivacy and data protection are now important issues for companies across the financial services industry. Financial records are amongst the most sensitive for many consumers and the regulator is keen to promote good data handling practices in an industry that is looking towards increased customer profiling, for both risk management and opportunity spotting. Mandy Webster's Data Protection in the Financial Services Industry explains how to manage privacy and data protection issues throughout the customer cycle; from making contact to seeking additional business from current customers. She also looks at the precise role of the Financial Services Authority and its response to compliance or non-compliance. Each of the Eight Principles of the Data Protection Act are reviewed and explained.
Data Protection vs. Freedom of Information
by Paul TicherThe Freedom of Information Act (FOI) was a milestone in UK legislation and, for the first time, the lid was legally lifted on a lot of what the UK government was doing in the name of the citizens of the country. While the FOI applies only to public sector organisations, it covers a wide range of information. The Data Protection Act, which applies equally in both the public and private sector, had already given individuals the right to find out what information was being held about them, and to insist on having that information kept accurate and up to date. Of course, the Data Protection Act also placed an obligation on organisations to protect the personal data of those people about whom they collected this information and to ensure that this data was not disclosed, either deliberately or accidentally, to anyone not entitled to see it. Clear and practical guidance for data governance professionalsInevitably, information that could and should be disclosed pursuant to a freedom of information enquiry could quite conceivably also contain information that the data controller must protect and herein lies a challenge for those in the public sector. Data management frameworks must be designed with two apparently contradictory objectives in mind: ensuring that information that might have to be disclosed pursuant to an FOI enquiry can quickly be found and provided, while simultaneously ensuring that personal data that has to be protected remains protected. This is a key data governance issue and, until now, there has been little useful guidance on how to tackle this issue for those charged with designing processes and infrastructure that meets these two sets of legal requirements. This pocket guide focuses on and addresses this critical issue, providing clear and practical guidance for data governance professionals on how to resolve this conundrum.
Data Protection: Ensuring Data Availability
by Preston de GuiseThis is the fundamental truth about data protection: backup is dead. Or rather, backup and recovery, as a standalone topic, no longer has relevance in IT. As a standalone topic, it’s been killed off by seemingly exponential growth in storage and data, by the cloud, and by virtualization. So what is data protection? This book takes a holistic, business-based approach to data protection. It explains how data protection is a mix of proactive and reactive planning, technology and activities that allow for data continuity. It shows how truly effective data protection comes from a holistic approach considering the entire data lifecycle and all required SLAs. Data protection is neither RAID nor is it continuous availability, replication, snapshots or backups—it is all of them, combined in a considered and measured approach to suit the criticality of the data and meet all the requirements of the business. The book also discusses how businesses seeking to creatively leverage their IT investments and to drive through cost optimization are increasingly looking at data protection as a mechanism to achieve those goals. In addition to being a type of insurance policy, data protection is becoming an enabler for new processes around data movement and data processing. This book arms readers with information critical for making decisions on how data can be protected against loss in the cloud, on-premises, or in a mix of the two. It explains the changing face of recovery in a highly virtualized data center and techniques for dealing with big data. Moreover, it presents a model for where data recovery processes can be integrated with IT governance and management in order to achieve the right focus on recoverability across the business.
Data Protection: Governance, Risk Management, and Compliance
by David G. HillFailure to appreciate the full dimensions of data protection can lead to poor data protection management, costly resource allocation issues, and exposure to unnecessary risks. Data Protection: Governance, Risk Management, and Compliance explains how to gain a handle on the vital aspects of data protection.The author begins by building the foundatio
Data Quality Engineering in Financial Services: Applying Manufacturing Techniques to Data
by Brian BuzzelliData quality will either make you or break you in the financial services industry. Missing prices, wrong market values, trading violations, client performance restatements, and incorrect regulatory filings can all lead to harsh penalties, lost clients, and financial disaster. This practical guide provides data analysts, data scientists, and data practitioners in financial services firms with the framework to apply manufacturing principles to financial data management, understand data dimensions, and engineer precise data quality tolerances at the datum level and integrate them into your data processing pipelines.You'll get invaluable advice on how to:Evaluate data dimensions and how they apply to different data types and use casesDetermine data quality tolerances for your data quality specificationChoose the points along the data processing pipeline where data quality should be assessed and measuredApply tailored data governance frameworks within a business or technical function or across an organizationPrecisely align data with applications and data processing pipelinesAnd more
Data Quality Management in the Data Age: Excellence in Data Quality for Enhanced Digital Economic Growth (SpringerBriefs in Service Science)
by Haiyan YuThis book addresses data quality management for data markets, including foundational quality issues in modern data science. By clarifying the concept of data quality, its impact on real-world applications, and the challenges stemming from poor data quality, it will equip data scientists and engineers with advanced skills in data quality management, with a particular focus on applications within data markets. This will help them create an environment that encourages potential data sellers with high-quality data to join the market, ultimately leading to an improvement in overall data quality. High-quality data, as a novel factor of production, has assumed a pivotal role in driving digital economic development. The acquisition of such data is particularly important for contemporary decision-making models. Data markets facilitate the procurement of high-quality data and thereby enhance the data supply. Consequently, potential data sellers with high-quality data are incentivized to enter the market, an aspect that is particularly relevant in data-scarce domains such as personalized medicine and services. Data scientists have a pivotal role to play in both the intellectual vitality and the practical utility of high-quality data. Moreover, data quality control presents opportunities for data scientists to engage with less structured or ambiguous problems. The book will foster fruitful discussions on the contributions that various scientists and engineers can make to data quality and the further evolution of data markets.
Data Quality in Southeast Asia: Analysis of Official Statistics and Their Institutional Framework as a Basis for Capacity Building and Policy Making in the ASEAN
by Manuel StagarsThis book explores the reliability of official statisticaldata in the ASEAN (the Association of Southeast Asian Nations), and thebenefits of a better vocabulary to discuss the quality of publicly availabledata to address the needs of all users. It introduces a rigorous method todisaggregate and rate data quality into principal factors containing a total often dimensions, which serves as the basis for a discussion on the opportunitiesand challenges for data quality, capacity building programs and data policy in SoutheastAsia. Tools to standardize and monitor statistical capacity and data qualityare presented, as well as methods and data sources to analyse data quality. Thebook analyses data quality in Indonesia, Malaysia, Singapore, the Philippines,Thailand, Vietnam, Brunei, Laos, Cambodia, and Myanmar, before concluding withthoughts on Open Data and the ASEAN Economic Community (AEC).
Data Quality: Empowering Businesses with Analytics and AI
by Prashanth SouthekalDiscover how to achieve business goals by relying on high-quality, robust data In Data Quality: Empowering Businesses with Analytics and AI, veteran data and analytics professional delivers a practical and hands-on discussion on how to accelerate business results using high-quality data. In the book, you’ll learn techniques to define and assess data quality, discover how to ensure that your firm’s data collection practices avoid common pitfalls and deficiencies, improve the level of data quality in the business, and guarantee that the resulting data is useful for powering high-level analytics and AI applications. The author shows you how to: Profile for data quality, including the appropriate techniques, criteria, and KPIs Identify the root causes of data quality issues in the business apart from discussing the 16 common root causes that degrade data quality in the organization. Formulate the reference architecture for data quality, including practical design patterns for remediating data quality Implement the 10 best data quality practices and the required capabilities for improving operations, compliance, and decision-making capabilities in the businessAn essential resource for data scientists, data analysts, business intelligence professionals, chief technology and data officers, and anyone else with a stake in collecting and using high-quality data, Data Quality: Empowering Businesses with Analytics and AI will also earn a place on the bookshelves of business leaders interested in learning more about what sets robust data apart from the rest.
Data Rules: Reinventing the Market Economy (Acting with Technology)
by Jannis Kallinikos Cristina AlaimoA new social science framework for studying the unprecedented social and economic restructuring driven by digital data.Digital data have become the critical frontier where emerging economic practices and organizational forms confront the traditional economic order and its institutions. In Data Rules, Cristina Alaimo and Jannis Kallinikos establish a social science framework for analyzing the unprecedented social and economic restructuring brought about by data. Working at the intersection of information systems and organizational studies, they draw extensively on intellectual currents in sociology, semiotics, cognitive science and technology, and social theory. Making the case for turning &“data-making&” into an area of inquiry of its own, the authors uncover how data are deeply implicated in rewiring the institutions of the market economy.The authors associate digital data with the decentering of organizations. As they point out, centered systems make sense only when firms (and formal organizations more broadly) can keep the external world at arm&’s length and maintain a relative operation independence from it. These patterns no longer hold. Data transform the production of goods and services to an endless series of exchanges and interactions that defeat the functional logics of markets and organizations. The diffusion of platforms and ecosystems is indicative of these broader transformations. Rather than viewing data as simply a force of surveillance and control, the authors place the transformative potential of data at the center of an emerging socioeconomic order that restructures society and its institutions.
Data Science Concepts and Techniques with Applications
by Muhammad Summair Raza Usman QamarThis book comprehensively covers the topic of data science. Data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. This book synthesizes both fundamental and advanced topics of a research area that has now reached maturity. The chapters of this book are organized into three sections:The first section is an introduction to data science. Starting from the basic concepts, the book will highlight the types of data, its use, its importance and issues that are normally faced in data analytics. Followed by discussion on wide range of applications of data science and widely used techniques in data science.The second section is devoted to the tools and techniques of data science. It consists of data pre-processing, feature selection, classification and clustering concepts as well as an introduction to text mining and opining mining.And finally, the third section of the book focuses on two programming languages commonly used for data science projects i.e. Python and R programming language.Although this book primarily serves as a textbook, it will also appeal to industrial practitioners and researchers due to its focus on applications and references. The book is suitable for both undergraduate and postgraduate students as well as those carrying out research in data science. It can be used as a textbook for undergraduate students in computer science, engineering and mathematics. It can also be accessible to undergraduate students from other areas with the adequate background. The more advanced chapters can be used by postgraduate researchers intending to gather a deeper theoretical understanding.
Data Science Essentials in Python: Collect - Organize - Explore - Predict - Value
by Dmitry ZinovievGo from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.Data science is one of the fastest-growing disciplines in terms of academic research, student enrollment, and employment. Python, with its flexibility and scalability, is quickly overtaking the R language for data-scientific projects. Keep Python data-science concepts at your fingertips with this modular, quick reference to the tools used to acquire, clean, analyze, and store data.This one-stop solution covers essential Python, databases, network analysis, natural language processing, elements of machine learning, and visualization. Access structured and unstructured text and numeric data from local files, databases, and the Internet. Arrange, rearrange, and clean the data. Work with relational and non-relational databases, data visualization, and simple predictive analysis (regressions, clustering, and decision trees). See how typical data analysis problems are handled. And try your hand at your own solutions to a variety of medium-scale projects that are fun to work on and look good on your resume.Keep this handy quick guide at your side whether you're a student, an entry-level data science professional converting from R to Python, or a seasoned Python developer who doesn't want to memorize every function and option.What You Need:You need a decent distribution of Python 3.3 or above that includes at least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and BeautifulSoup. A great distribution that meets the requirements is Anaconda, available for free from www.continuum.io. If you plan to set up your own database servers, you also need MySQL (www.mysql.com) and MongoDB (www.mongodb.com). Both packages are free and run on Windows, Linux, and Mac OS.
Data Science Fundamentals with R, Python, and Open Data
by Marco CremoniniData Science Fundamentals with R, Python, and Open Data Introduction to essential concepts and techniques of the fundamentals of R and Python needed to start data science projects Organized with a strong focus on open data, Data Science Fundamentals with R, Python, and Open Data discusses concepts, techniques, tools, and first steps to carry out data science projects, with a focus on Python and RStudio, reflecting a clear industry trend emerging towards the integration of the two. The text examines intricacies and inconsistencies often found in real data, explaining how to recognize them and guiding readers through possible solutions, and enables readers to handle real data confidently and apply transformations to reorganize, indexing, aggregate, and elaborate. This book is full of reader interactivity, with a companion website hosting supplementary material including datasets used in the examples and complete running code (R scripts and Jupyter notebooks) of all examples. Exam-style questions are implemented and multiple choice questions to support the readers’ active learning. Each chapter presents one or more case studies. Written by a highly qualified academic, Data Science Fundamentals with R, Python, and Open Data discuss sample topics such as: Data organization and operations on data frames, covering reading CSV dataset and common errors, and slicing, creating, and deleting columns in R Logical conditions and row selection, covering selection of rows with logical condition and operations on dates, strings, and missing values Pivoting operations and wide form-long form transformations, indexing by groups with multiple variables, and indexing by group and aggregations Conditional statements and iterations, multicolumn functions and operations, data frame joins, and handling data in list/dictionary format Data Science Fundamentals with R, Python, and Open Data is a highly accessible learning resource for students from heterogeneous disciplines where Data Science and quantitative, computational methods are gaining popularity, along with hard sciences not closely related to computer science, and medical fields using stochastic and quantitative models.
Data Science Landscape: Towards Research Standards And Protocols (Studies in Big Data #38)
by Usha Mujoo Munshi Neeta VermaThe edited volume deals with different contours of data science with special reference to data management for the research innovation landscape. The data is becoming pervasive in all spheres of human, economic and development activity. In this context, it is important to take stock of what is being done in the data management area and begin to prioritize, consider and formulate adoption of a formal data management system including citation protocols for use by research communities in different disciplines and also address various technical research issues. The volume, thus, focuses on some of these issues drawing typical examples from various domains. The idea of this work germinated from the two day workshop on “Big and Open Data – Evolving Data Science Standards and Citation Attribution Practices”, an international workshop, led by the ICSU-CODATA and attended by over 300 domain experts. The Workshop focused on two priority areas (i) Big and Open Data: Prioritizing, Addressing and Establishing Standards and Good Practices and (ii) Big and Open Data: Data Attribution and Citation Practices. This important international event was part of a worldwide initiative led by ICSU, and the CODATA-Data Citation Task Group. In all, there are 21 chapters (with 21st Chapter addressing four different core aspects) written by eminent researchers in the field which deal with key issues of S&T, institutional, financial, sustainability, legal, IPR, data protocols, community norms and others, that need attention related to data management practices and protocols, coordinate area activities, and promote common practices and standards of the research community globally. In addition to the aspects touched above, the national / international perspectives of data and its various contours have also been portrayed through case studies in this volume.