Blogs

A Geek's Guide to Big Data

By Deborah Dobson posted 03-16-2015 16:23

  

On March 11th, I attended the third webinar in the SAS: The Mavens of Big Data webinar series "A Geek's Guide to Big Data: The Hadoop Ecosystem." Tamara Dull, the ‎Director of Emerging Technologies, SAS Best Practices, was the presenter and provided good information regarding some of the big data lingo and Hadoop.

Tamara provided big data definitions and discussed the Hadoop ecosystem.

Open Source: The term "open source" refers to something that can be modified because its design is publicly accessible.

Open Source Project: A collection of related functions that is 100% open source, developed by volunteers and managed and distributed by open source community. Firefox and Hadoop are examples. Hadoop is used to store and process data.

Open Source Distribution: A collection of related projects (such as Hive or Pig) that may contain both open and closed source (proprietary) projects or modules. They are managed and distributed by vendors.

Open Source Data: Data that can be freely used, reused and distributed with the requirement to attribute and share alike.

Big data is not new. What is new is that we now have the technology to be able to store and process both structured and unstructured data such as email, social media data, xml data, video, photos, GPS data, sensor data and mobile. The Hadoop has emerged as the best way to handle massive amounts of data, including not only structured data but also complex, unstructured data as well.

The Hadoop platform consists of two key services: a reliable, distributed file system called Hadoop Distributed File System (HDFS) and the high-performance parallel data processing engine called Hadoop MapReduce. For most businesses doing some big data projects, Hadoop in most cases is an ecosystem. The Hortonworks data platform is a good example of the Hadoop ecosystem. Learn more about it here.

According to a Wikibon (Wikibon is a professional community solving technology and business problems through an open source sharing of free advisory knowledge.) report, only 36% of companies surveyed are using Hadoop, and mostly as proof of concept.

The final webcast on 3/25: Generating Value From Your Data - with . If you are not already signed up, I highly recommend that you do. And, the previous three (excellent webinars) can be viewed on demand.

I'm in the process, as a member of the International Legal Technology Association's Business Management Content Creation team, of putting together a webinar series on big data and legal from a variety of angles. These pre-recorded webinars will be available to both members and non-members. Some topics include:

Big Data & Legal Prediction
Big Data & eDiscovery
Big Data and Pricing
Big Data and Privacy Issues
Big Data - Beginning Statistical Analysis
Big Data - Advanced Statistical Analysis

Stay tuned to learn more.

Originally published on LinkedIn.

1 comment
39 views

Permalink

Comments

03-26-2015 10:13

Thank you for this blog post. Open Source Software is always of interest to me and I missed the fact that this Big Data series was happening. I'm going to check out the webinars on demand now.
Cheers!