Host: Data Science Dojo
Guest: George Firican
Celebrating Data Literacy Month by attending an event hosted by @Data Science Dojo
George Firican is an award winning Data Governance Leader and Founder of LightsOnData He talks about #data, #analytics, #datagovernance, and #datamanagement
This presentation was all about the 10 V's of Big Data George is a VIP who shares a Vast amount of facts about big data, including some Vibrant examples
However, you can't jump right in without talking about the elephant in the room
How big is big data?
Smartphones have about 40,000 apps added to the app store each month
There are 40,000 Google search queries every second
Facebook has 2.2 billion monthly active users
Over 1 billion people consume coffee every day (we know George is one of them ☕)
294 billion emails are sent on a daily basis
The Milky Way has 300 billion stars
This sure does make data appear to be big, but George thinks that this term can be misleading and likes to think of it as complex data instead
Many of us are familiar with the 3 V's
- Volume
- Velocity
- Variety
Basically, there is a lot of data, it is generated all the time, and there are many different kinds
The other types of data are a little more complex, George gives perfect examples of each
Variability
This is often confused with variety
A coffee shop sells different types of coffee, this is a variety of offerings
However, if you go to that coffee shop 3 days in a row, order the same thing, but each day it tastes different... that is variability
One other really neat example George shared was that variability of data is what makes sentiment analysis so complicated
Since the same word can have several different meanings, it is very hard for a computer to identify it's sentiment without context, which involves human interpretation (for now, until the singularity 🤖)
Veracity
When you get a box of chocolates, how do you estimate how good they are without tasting them?
You can visually look at them, or see who makes them
and only if they came from Switzerland you know they are good 😂
The strange part with this V is that as the previous V's increase, Veracity tends to decrease... plummeting into the unknown
Validity
How accurate and correct is the data for it's intended use
A watch works fine for telling time, but is not accurate to the millisecond so you would likely not be able to identify a tiebreaker from a race
Also, if you are looking at registration data from a conference, you may only see the person's job title or company. Yea this is accurate, but if you want to know the names of the individuals, it is not valid
Volatility
How old is the data before it is considered irrelevant, historic, or not useful
You may need the data to be really old if you are analyzing a trend
On the flipside, you may need to know only the most relevant data which could cause only day-old data too old (or as George calls it, dark data)
Vulnerability
This is all about security and privacy (topics near and dear to my heart)
Some types of data is considered to be PII (personally identifiable data)
Sharing your twitter handle alone is fine, but if you pair it with your name, it becomes personal. Therefore, things can get real bad in the event of a data breach
Visualization
It is very challenging to visualize big data
Not only is it nearly impossible to run traditional graphs when trying to plot a billion points, it is not scalable and impacts response time
Value
Data is meaningless if it does not bring value, but how to express value is the hard part. George shared a few examples on how collecting data can be valuable to a company.
They all seem a little sketchy IMO, but I understand the value (to an extent, but I am super cautious about data with my audit and cybersecurity background)
One example is the epic story of How Target Figured Out A Teen Girl Was Pregnant Before Her Father
In Conclusion
To learn more from George, check out his online courses and follow him on LinkedIn
Information from the host, Data Science Dojo:
For upcoming webinars & crash courses, please visit here
For the webinar recordings & queries, professional networking, and data science resources, join us here on LinkedIn
Follow us on Instagram for data science sliders, infographics, memes, and short videos
Comments