On March 11th, I attended the third webinar in the SAS: The Mavens of Big Data webinar series "A Geek's Guide to Big Data: The Hadoop Ecosystem." Tamara Dull,
the Director of Emerging Technologies, SAS Best Practices, was the
presenter and provided good information regarding some of the big data
lingo and Hadoop.
Tamara provided big data definitions and discussed the Hadoop ecosystem.
Open Source: The term "open source" refers to something that can be modified because its design is publicly accessible.
Open Source Project: A
collection of related functions that is 100% open source, developed by
volunteers and managed and distributed by open source community. Firefox
and Hadoop are examples. Hadoop is used to store and process data.
Open Source Distribution: A
collection of related projects (such as Hive or Pig) that may contain
both open and closed source (proprietary) projects or modules. They are
managed and distributed by vendors.
Open Source Data: Data that can be freely used, reused and distributed with the requirement to attribute and share alike.
Big
data is not new. What is new is that we now have the technology to be
able to store and process both structured and unstructured data such as
email, social media data, xml data, video, photos, GPS data, sensor data
and mobile. The Hadoop has emerged as the best way to handle massive
amounts of data, including not only structured data but also complex,
unstructured data as well.
The Hadoop platform consists of two key
services: a reliable, distributed file system called Hadoop Distributed
File System (HDFS) and the high-performance parallel data processing
engine called Hadoop MapReduce. For most businesses doing some big data
projects, Hadoop in most cases is an ecosystem. The Hortonworks data
platform is a good example of the Hadoop ecosystem. Learn more about it here.
According
to a Wikibon (Wikibon is a professional community solving technology
and business problems through an open source sharing of free advisory
knowledge.) report, only 36% of companies surveyed are using Hadoop, and
mostly as proof of concept.
The final webcast on 3/25: Generating Value From Your Data - with . If you are not already signed up, I highly recommend that you do. And, the previous three (excellent webinars) can be viewed on demand.
I'm
in the process, as a member of the International Legal Technology
Association's Business Management Content Creation team, of putting
together a webinar series on big data and legal from a variety of
angles. These pre-recorded webinars will be available to both members
and non-members. Some topics include:
Big Data & Legal Prediction
Big Data & eDiscovery
Big Data and Pricing
Big Data and Privacy Issues
Big Data - Beginning Statistical Analysis
Big Data - Advanced Statistical Analysis
Stay tuned to learn more.
Originally published on LinkedIn.