Janert P.K. Data Analysis with Open Source Tools

Файл формата pdf
размером 15,39 МБ

Добавлен пользователем Anatol 31.10.2011 21:05
Описание отредактировано 31.10.2011 23:57

Janert P.K. Data Analysis with Open Source Tools

Publisher: O’Reilly Media | 2010 | Pages: 538
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve - rather than rely on tools to think for you.
Use graphics to describe data with one, two, or dozens of variables
Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments
Mine data with computationally intensive methods such as simulation and clustering
Make your conclusions understandable through reports, dashboards, and other metrics programs
Understand financial calculations, including the time-value of money
Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
Become familiar with different open source programming environments for data analysis
Including:

Data Analysis
What’s in This Book
What’s with the Workshops?
What’s with the Math?
What You’ll Need
What’s Missing
Graphics: Looking at Data
A Single Variable: Shape and Distribution
Dot and Jitter Plots
Histograms and Kernel Density Estimates
The Cumulative Distribution Function
Rank-Order Plots and Lift Charts
Only When Appropriate: Summary Statistics and Box Plots
Workshop: NumPy
Further Reading
Two Variables: Establishing Relationships
Scatter Plots
Conquering Noise: Smoothing
Logarithmic Plots
Banking
Linear Regression and All That
Showing What’s Important
Graphical Analysis and Presentation Graphics
Workshop: matplotlib
Further Reading
Time As a Variable: Time-Series Analysis
Examples
The Task
Smoothing
Don’t Overlook the Obvious!
The Correlation Function
Optional: Filters and Convolutions
Workshop: scipy.signal
Further Reading
More Than Two Variables: Graphical Multivariate Analysis
False-Color Plots
A Lot at a Glance: Multiplots
Composition Problems
Novel Plot Types
Interactive Explorations
Workshop: Tools for Multivariate Graphics
Further Reading
Chapter 6 Intermezzo: A Data Analysis Session
A Data Analysis Session
Workshop: gnuplot
Further Reading
Analytics: Modeling Data
Guesstimation and the Back of the Envelope
Principles of Guesstimation
How Good Are Those Numbers?
Optional: A Closer Look at Perturbation Theory and Error Propagation
Workshop: The GNU Scientific Library (GSL)
Further Reading
Models from Scaling Arguments
Models
Arguments from Scale
Mean-Field Approximations
Common Time-Evolution Scenarios
Case Study: How Many Servers Are Best?
Why Modeling?
Workshop: Sage
Further Reading
Arguments from Probability Models
The Binomial Distribution and Bernoulli Trials
The Gaussian Distribution and the Central Limit Theorem
Power-Law Distributions and Non-Normal Statistics
Other Distributions
Optional: Case Study—Unique Visitors over Time
Workshop: Power-Law Distributions
Further Reading
What You Really Need to Know About Classical Statistics
Genesis
Statistics Defined
Statistics Explained
Controlled Experiments Versus Observational Studies
Optional: Bayesian Statistics—The Other Point of View
Workshop: R
Further Reading
Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That
How to Average Averages
The Standard Deviation
Least Squares
Further Reading
Computation: Mining Data
Simulations
A Warm-Up Question
Monte Carlo Simulations
Resampling Methods
Workshop: Discrete Event Simulations with SimPy
Further Reading
Finding Clusters
What Constitutes a Cluster?
Distance and Similarity Measures
Clustering Methods
Pre- and Postprocessing
Other Thoughts
A Special Case: Market Basket Analysis
A Word of Warning
Workshop: Pycluster and the C Clustering Library
Further Reading
Seeing the Forest for the Trees: Finding Important Attributes
Principal Component Analysis
Visual Techniques
Kohonen Maps
Workshop: PCA with R
Further Reading
Intermezzo: When More Is Different
A Horror Story
Some Suggestions
What About Map/Reduce?
Workshop: Generating Permutations
Further Reading
Applications: Using Data
Reporting, Business Intelligence, and Dashboards
Business Intelligence
Corporate Metrics and Dashboards
Data Quality Issues
Workshop: Berkeley DB and SQLite
Further Reading
Financial Calculations and Modeling
The Time Value of Money
Uncertainty in Planning and Opportunity Costs
Cost Concepts and Depreciation
Should You Care?
Is This All That Matters?
Workshop: The Newsvendor Problem
Further Reading
Predictive Analytics
Topics in Predictive Analytics
Some Classification Terminology
Algorithms for Classification
The Process
The Secret Sauce
The Nature of Statistical Learning
Workshop: Two Do-It-Yourself Classifiers
Further Reading
Epilogue: Facts Are Not Reality
Appendix Programming Environments for Scientific Computation and Data Analysis
Software Tools
A Catalog of Scientific Software
Writing Your Own
Further Reading
Appendix Results from Calculus
Common Functions
Calculus
Useful Tricks
Notation and Basic Math
Where to Go from Here
Further Reading
Appendix Working with Data
Sources for Data
Cleaning and Conditioning
Sampling
Data File Formats
The Care and Feeding of Your Data Zoo
Skills
Terminology
Further Reading
Appendix About the Author
Colophon

Чтобы скачать этот файл зарегистрируйтесь и/или войдите на сайт используя форму сверху.
Регистрация

Узнайте сколько стоит уникальная работа конкретно по Вашей теме:
Сколько стоит заказать работу?

Смотри также

Подробнее

Cuesta H. Practical Data Analysis

Раздел: Искусственный интеллект → Интеллектуальный анализ данных

Packt Publishing, 2013. — 339 p. — ISBN: 978-1-78328-099-5. Transform, model, and visualize your data through hands-on projects, developed in open source tools. Overview 1) Explore how to analyze your data in various innovative ways and turn them into insight. 2) Learn to use the D3js visualization tool for exploratory data analysis. 3) Understand how to work with graphs and...

9,75 МБ
добавлен 12.03.2014 01:32
описание отредактировано 12.03.2014 07:08

Подробнее

Han J., Kamber M. Data Mining: Concepts and Techniques

Раздел: Искусственный интеллект → Интеллектуальный анализ данных

Second Edition. — Morgan Kaufmann, 2006. — 743 р. This book explores the concepts and techniques of data mining, a promising and ourishing frontier in database systems and new database applications. Data mining, also popularly referred to as knowledge discovery in databases (KDD), is the automated or convenient extraction of patterns representing knowledge implicitly stored in...

27,31 МБ
дата добавления неизвестна
описание отредактировано 05.09.2010 13:58

Подробнее

Nisbet R., Elder J., Miner G. Handbook of Statistical Analysis and Data Mining Applications

Раздел: Искусственный интеллект → Интеллектуальный анализ данных

Academic Press, 2009. — 864 p. — ISBN: 0123747651. Robert Nisbet, Pacific Capital Bank Corporation, Santa Barbara, CA, USA John Elder, Elder Research, Inc. and the University of Virginia, Charlottesville, USA Gary Miner, StatSoft, Inc. , Tulsa, OK, USA Description The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book...

41,49 МБ
дата добавления неизвестна
описание отредактировано 08.05.2010 23:04

Подробнее

Provost Foster, Fawcett Tom. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking

Раздел: Финансово-экономические дисциплины → Статистический анализ экономических данных

O’Reilly, 2013. — 408 p. — ISBN: 978-1449361327. Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand...

15,75 МБ
добавлен 14.11.2013 21:00
описание отредактировано 21.11.2022 03:49

Подробнее

Tufféry S. Data Mining and Statistics for Decision Making

Раздел: Компьютерная литература → R

Wiley – 2011, 704 pages ISBN: 0470688297, 9780470688298 Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need...

6,18 МБ
добавлен 15.04.2012 21:21
описание отредактировано 22.06.2017 16:47

Подробнее

Мюллер А., Гидо С. Введение в машинное обучение с помощью Python

Раздел: Искусственный интеллект → Машинное обучение (Machine Learning)

М.: O’Reilly Media, 2017. — 392 с. Машинное обучение стало неотъемлемой частью различных коммерческих и исследовательских проектов, однако эта область не является прерогативой больших компаний с мощными аналитическими командами. Даже если вы еще новичок в использовании Python, эта книга познакомит вас с практическими способами построения систем машинного обучения. При всем...

13,28 МБ
добавлен 20.02.2017 01:10
описание отредактировано 14.07.2024 20:39

Главная

Наверх