top of page
Writer's pictureMonica Kay Royal

Event Review: IMPACT 2022

Updated: Oct 26, 2022

The Data Observability Summit

Hosted by Monte Carlo


Monte Carlo hosts an annual Data Observability event which spotlights some of the industry’s most prominent voices, as well as the broader community of data leaders paving the way forward for reliable data.


This event was a hybrid event, with the virtual folks on the Hopin platform (I love this platform) and 3 in-person welcome receptions in New York, Bay Area, and London. The event spans across two days and includes 8 keynote sessions and 18 breakout tracks based on two themes, Data Leaders and Technical Architects. Both days ended with a virtual entertainment session by OrchKeystra who brought the best of jams!! What a perfect way to end a day.


Day 1


The start of the session began a little rocky with some technical difficulties, but honestly I haven’t been to a conference that doesn’t have these little hiccups, I think it oddly makes things more human.


Before the session officially kicked off, we saw a chat between Barr and some of her partners, 1 week prior to this event, talking about the right way to kick it off. There were many ideas, a lot centered around Halloween (my favorite holiday, so I was excited). I love what they settled on, “Data Engineer Remix: Emails Mad at Me! ("Somebody Mad at Me" Parody)”, this was the funniest parody I have heard recently. Check out the Halloween jam here, courtesy of Jacq Riseling and Will Robins.


Barr Moses, CEO & Co-Founder at Monte Carlo formally kicked us off by introducing Data Observability as the topic and mentions how hot it is to be a Data Engineer in tech right now. Data Observability is an organization’s ability to understand the health of the data in their systems. This makes data more important than ever but we all know that data still has its bad days. Barr shared that Unity, the cross-platform game engine, lost $100M because of bad data and Equifax assigned wrong scores to millions of customers. Goes to show, bad data spares no one.


One good news is that a new Gartner Research Report reflects growing interest in data observability 🏆



Separating Signal from the Noise: The Prediction Paradox

Nate Silver

Founder and editor-in-chief, FiveThirtyEight


Nate’s book “The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t” started us off with a great quote


If the future was easy to predict, we wouldn’t need data analysis

Today, Nate shared with us the 11 Guidelines for High-Quality Data Analysis (the first three of which came from his book):


  1. Think probabilistically

  2. Know where you’re coming from

  3. Try, and err


The highlight with these three items is that we live in a world where uncertainty is all around us. We all have different perspectives, so we need to think about what happens when we come together as a company or as competitors. Typically when a crowd gets together, they are more wise than the individual. But there is a balance when learning new information in how to deal with it in relation to the information that you already know. Do you take it as face value and move on or do you update your prior knowledge (think Bayes Theorem).


Nate then references a great complimentary book: The Wisdom of Crowds by James Surowiecki.

Three takeaways / overlapping items:

  • Crowds should be diverse in things such as skill sets, perspectives, etc.

  • Independence, if they can’t provide their point of view they can’t bring much to the table

  • Trust, abide by the rules and everyone working in their best interest


The remaining guidelines:


  1. Define the problem

  2. Treat data as guilty until proven innocent

  3. Kick the tires

  4. Break it down

  5. Learn the Rules of the Game (which I have now just lost, and you as well, IYKYK)

  6. Don’t lead with your gut

  7. Don’t be afraid to be critical

  8. Focus on process, not results


Basically, you need to define the problem so everyone knows the goal and is set up to be successful in finding the solution. Once you find a solution, make sure that it is the right one. Test it, question it, ask for a second opinion, visualize it, phone a friend, anything you need to do to make sure that the solution makes sense. Know that there will be obstacles, hills, holes, and maybe some wet spots you have to leap over. Just be sure to set yourself up for success at the beginning, be prepared and enjoy the process.



Extraction and Abstraction: A Conversation with

Co-founder of Looker Architect of LookML and Malloy


Rafa Jara Simkin, Enterprise Sales Leader @ Monte Carlo talks with Lloyd about his journey, from how he became the most well known entrepreneur in the data space to how he created LookML and Malloy.



When you first built LookML, what problems were data teams facing?


Lloyd has been a CTO at a lot of companies, which has the responsibility of making sure everyone in the organization knows what is going on and the best way to do that is through data. He wanted to build tooling so that people can analyze the data to understand what is going on in real time. Looker was his 4th product, and mentioned that the first versions of LookML were not very pretty. He learned that it is hard to make solutions look nice and have people understand how to use it.


Did you see LookML becoming as big as it is?


He didn’t know what kind of company it was going to be, whether it would be a software, product, services, or even a consulting company. He was never sure it was going to work until it was working.


Outside of solving a problem, what is the biggest takeaway you can share?


He saw Looker as more than just providing software. He saw it as an education company, with the job to educate new ways to work with and look at data. He knew that you need to make a safe environment for customers in order to make them successful for you to be successful.


Can you share your perspective on data reliability, quality, management with different tools in the data stack?


Building reliability in software is really hard. This includes things like version control, testing of changes, having a good team, etc. Continuous integration and continuous testing paired with good engineering software practices are super important.


What is Malloy?


Honestly, this one was hard to capture on paper.

What I was able to capture was that Malloy is all open source with a goal of being used everywhere. Anywhere you are typing SLQ, you will be using Malloy. Currently, Malloy supports BigQuery, DuckDB, and PostgreSQL.

Lloyd’s advice: just try it. Here is a link


Do you have any entrepreneur tips?

  • Acquire lots of different types of skills

  • When you are small, know that you will not have enough people to do it all very well. When you start to grow, you can hire people to fill the missing gaps. Loved this analogy: it’s like throwing a bunch of paintballs at a wall. You want the whole wall to be covered. Some of the wall is covered very well but there are some holes… that is where you can focus to hire.

  • Have a clear sense of mission, what you are doing, what you are trying to achieve, and what is your value proposition. Do not focus on the how up front, just know what problem you are trying to solve.

  • Listed to your customers



Build Your House on Rock, not Sand

Head Of Analytics, Understood


The basis for this session was that data needs a strong foundation, otherwise there will be chaos.


Yay analogies! 🏡

If you ask a friend who just purchased a new home, ‘What is your favorite thing?’

  • Answers: ‘The finished basement’, ‘The upstairs game room’, ‘The farm sink in the kitchen’

  • NOT Answers: ‘The floors don’t creek’, ‘The water is not brown’, ‘The heater doesn't catch on fire’


People are rarely ever focused on what matters the most, things like these are just expected to exist and work well. But without a strong foundation, your house (data) is going to collapse.


The other thing is that a lot of executives have FOMO with all the hype and buzzwords flying in and out of the scene. They see the things that get people’s attention and they don’t want to be left behind. However, sometimes they skip the part that matters… do we even have the data (or more importantly, the right data / good data). No one wants to experience the tragic garbage in, garbage out solution.


Without paying attention to the things that need attention, you will get nowhere

Q&A Time!

Do you recommend a data quality tool like Informatica Data Quality, if you have a data observability tool in place?


Depends. First, figure out what it is that you want to look at and what questions you are asking. (This guy has all the analogies!)

If you have 10 windows in your house, do you want 30 monitors or just 1 monitor on the back door?

Think through what is in your stack, where the failure points are, and how to best mitigate those failures.


What tips do you have for data teams trying to make "unsexy" ideas like data governance and MDM more attractive?


Talk about what is going to happen if they do not pay attention to these things. Think about ‘what’s in it for them’ and explain why it is important to them in order to get people on board. From Monica’s experience: if you can put it in money terms, you have a better shot.


When should you implement a Data Observability solution?


There isn’t a straight answer to this, but do not wait until your data is broken (this is reactive). The earlier you can get on it, the easier it will be.



Data Fitness for a Healthy Business

Sr Director of Data, Strava


There was A LOT of really good stuff here! Shailvi walks us through definitions of bad data, why businesses should care, where bad data is introduced, how to identify bad data, how to figure out the reason for bad data, and how to prevent bad data!


What is Bad Data?

  • Inaccurate data, incomplete data, misleading data

Bad Data is NOT

  • Data showing you something you don’t want to see


Why should business care?

  • Bad data is costly in many different ways

  • The more widespread bad data is in organizations, the worse it becomes

  • It lowers trust

  • Damaging to your representation

  • Loss of revenue/increase in costs

  • Bias = harm (harm to real people through bias, hidden biases, $$ and people’s lives)

  • Liabilities

  • Productivity & morale


Speaking of morale, people should be an organization's first priority. If your people are low on morale, everyone is going to have a bad time.


Bad data can be introduced at any of the phases of the Data Lifecycle:

Define > Log > Transform > Analyze > Share


“Definition” phase

  • Uneven feature definition

  • Myopic definition

  • Incorrect input parameters

Problems arise with how strict or lose a definition is (example: organizations wanting to know how many users they have but each department can define users differently)


“Logging” phase

  • Incorrect tracking (classification)

  • Faulty pipeline

  • Inconsistent timeframes (stale data)


“Transforming” phase

  • Un-intuitive rules (clear data dictionaries helps)

  • Meaningless aggregations (if all transformations happen at once, can’t work backwards to get back to the raw data)

  • Logical errors

As a baker, a lot can go wrong from the raw ingredients to the finished product.


“Analyzing” phase

  • Ambiguous problem definition (everyone needs to be on the same page to answer the same question)

  • Inapplicable model / formulae

  • Biased algorithm (can cause harm to underrepresented groups In the population)


“Sharing” phase

  • Faulty reporting

  • Misinterpreted results

  • Unintended downstream use


Diagnosing Bad Data

How bad data shows up

  • Results that don’t match

  • Suspicious results




Look for the obvious reasons

  • Was it bad data, or bad assumptions?

  • Was the dataset built for your use case?

  • Was there a known bug or pipeline issue?


THEN, Go through data lifecycle in reverse order

  • At which phase is the data still accurate?

  • In the phase where it breaks, what could the causes/reasons be?




Shalvi shares this cool Sanity Checklist

Really helps to discuss this with others as well, to validate initial assumptions





Curing bad data is a cross-functional effort

Prevention is better than cure!
  • Reconcile terminologies

  • Automate everything - no manual stuff

  • Simplify your curated datasets

  • Have a consistent governance & ownership philosophy

  • Audit at regular intervals



How to Make Data Governance Fun for Everyone

Head of Product Data Governance, Adidas


First off, if you know Tiankai, you know that he makes data fun for everyone!!

We started the session off with a song, all about Data Governance (a great way for us data professionals to explain what we do to our parents/spouse 😂)


Jacq Riseling then asks all the juicy Data Governance questions and Tiankai shares everything!



How do you incentivize data governance?


Data Governance is a long term initiative which can be hard to incentivize. Some don’t think that it is worth purpusing or even want to talk about it.

A couple things to help:

  • Start with the right mindset

  • Get commitment of a stakeholder

Tiankai thinks that Data Governance needs some rebranding, it is oftentimes seen as the police but Data Governance has the interest of the whole company and is a great thing!



How to get stakeholders to care?


Let them know what’s in it for THEM. Different teams will have different answers, so cater your communications to your audience.

As I like to see it, this is not a Halloween costume, there is not a one size fits all approach.


What about external stakeholders?


This depends on the industry. Financial services and pharmaceuticals are highly regulated and must follow certain laws. Here, it is really just about making it happen. Other non-regulated industries, like retail, make things a bit harder because there needs to be a mutual agreement to make them care about Data Governance more. Again, it is all about communication and to share what’s in it for them.


What is the role of empathy in Data Governance?


There are a lot of definitions of Data Governance, but in summary it is all about making data usable in the right way for the right people. We should empathize with the user side because they are the ones trying to use the data.

So think about the users, ask the right questions and actively listen.


How do you make Data Governance more creative? +

How make data fun for leadership, peers, collaborators, and external stakeholders?


  • Use analogies and examples

  • Using unconventional and unexpected formats like songs, podcasts, marketing campaigns with prizes, and gamification

Lyric from his song: ‘If data is the force, we are the Jedi Council’

Because the Data Governance council sounds boring… and Star Wars is way more fun!



What is the role of Data Quality in Data Governance?


Data quality is a key outcome of good Data Governance. Data Quality will turn into a KPI, which Data Governance will be measured upon. What is good Data Quality can be tricky to define and needs to be agreed to by everyone.

Data Quality can be quite emotionalizing and people can be frustrated by the topic. Data Quality is sometimes used as a catch all for any bad data, even if not related, it is ‘just different than expected’.


Top 3 Data Governance challenges:

  • Data Ownership: Data Governance team cannot fix everything alone

  • Data Quality: everyone needs to agree on the definition

  • Perception of Data Governance: can result in getting Data Governance involved too late


Day 2


Starting at the end, I just wanted to say how much I loved ending the two sessions rocking out to the music stylings of OrchKeystra. Y’all should really go check him out!!


Back to Day 2


What's Next for the Modern Data Stack?

Founder's Panel


CEO & Founder at dbt Labs


CEO at Fivetran


CEO Founder


Lior Gavish, Co-founder at Monte Carlo kicked us off with a panel to talk about What’s Next for the Modern Data Stack. Unfortunately, I only caught the very end because my computer decided to reboot itself, my earbuds needed to charge, and my mouse died (no animals were harmed). After I got my life back in order, I was able to catch the end where there was some discussion around data contracts and it seems that they have evolved since I was dealing with contracts back in my day (not really that long ago). It seems that they are more complicated now as they have to deal with more than just establishing accountability and ownership for the data. I would be interested in learning more here.



Data Reliability in the Lakehouse:

Fireside Chat


CEO and Co-Founder, Monte Carlo


CEO of Databricks


Can you take us through the Databricks origin story?


Back at UC Berkeley, Ali and his team started getting more and more funding from Silicon Valley, and at that time they got to see what other tech companies were working on. They noticed all the cool things they were doing: leveraging data, using data science, and machine learning, predicting things like couples breaking up (2009 Facebook). He wanted to bring that type of tech to the world, and give to everyone. Through a lot of trial and error, they decided to ‘just do it themselves’ and were able to start their own company.

One of my favorite ‘failures’ Ali shared was that they tried sending students into orgs as intern trojan horses to try to get the org to adopt the technology. It was worth a try!


How do you think about your customers, when thinking about building a community?


Being open source gives you the opportunity to have a lot of people using / downloading your software. This is an advantage as it allows you to start building out a community and get recognized. However, it is still difficult to get adoption at the same level as with B2C companies. So the next question was, how to monetize. They saw that there was a gap in having users download / use their software and monitoring / configuring the software to fit best with the org. This is where they found their secret sauce, they create the open source software, get mass usage, then offer the SaaS version and charge for the maintenance. EASY!


What was it like to move into the role of CEO?


Going from a tech expert to managing a bunch of functions you know little about was quite different and there were a lot of questions that needed to be explored.

  • What is a good CEO?

  • How do you know what good looks like ?

  • How do you give advice on something you know very little about?

  • How do you get your team to respect you and get along with each other while executing towards a common goal


How to crack the code:

  • Learn really really fast

  • Get out of your comfort zone

  • Talk to people who have done it and are great at it

  • Meet as many expert CEOs and pick their brain, you will start seeing patterns

At end of the day, each company has a strategy and you need to make things fit into that strategy


Lakehouse: How did we get here? Why now?


What has changed the most is the amount of data. Data warehouses cannot be the answer to everything because not everything can be stored in a data warehouse anymore. This was the start of the switch to data lakes. However, depending on your use case you need more than just a data lake. BI needs a warehouse structure but data science, ML, and real time use cases can live outside of the data warehouse.


These two solutions started creating problems because in some instances you now have two copies of your data, one in the lake and one in the data warehouse. This is a problem because the policies are a little different files vs.tables) and you need to manage changes in the two locations. Databricks wanted to simplify things and did so by making data useful where it was, in the lake, by basically building a house on that lake.


What are the trends and adoption of AI and ML?


Most organizations were really excited about the AI buzzword back then when Databricks started. But at that time, organizations didn’t have a clue what was involved, they didn’t have the readiness or understand the basics.


So then everyone started collecting data, building strong data teams, and understanding the data stack. This led to the data community becoming more and more important for the organizations.


What is Unity Catalog and how does it transform data governance?


Unity catalog is about specifying who has access to the data. It is a unified governance solution for all data and AI assets including files, tables, machine learning models and dashboards in your lakehouse on any cloud.


As an example, it is helpful if you are a regulated company since you need to focus on data quality and privacy. Your data needs to be appropriately locked down, encrypted, and provisioned on an as needed basis.


What’s next on the roadmap, what are you looking forward to?


Biggest goal is to simplify the lakehouse data stack. Simpler autoML, data processing, and ability to share the dashboards and integrate into other tools.


What is your favorite book?


This is a book for successful leaders, how they built up habits that are pretty bad, and how to change those bad habits.



Making Data Teams Greater Than The Sum Of Their Parts

SVP, Head of Data and Insights at New York Times


High performing teams:

  • Achieve their goals

  • Stick around


To do this they:

  • Communicate clearly

  • Trust and support each other

  • Have the right set of skills within the team

  • Have complementary skills (as individuals) and make user of them (as a team)


Team Values :

  • Independence

  • Integrity

  • Curiosity

  • Respect

  • Collaboration

  • Excellence


How to hire for values:

Ask how they solved a particular problem

  • Listen to their response, are they bluffing

  • Were they interested in the solution, or just wanted to get the job done

  • Did they include the right people, in the solution and communication



Building a Data Culture at a High-growth Startup

Director of Data & Analytics at Cribl


Building a data culture is an ongoing process and you can look at it from the perspective of a data philosophy.


Data Philosophy

  • Data is a Product

  • Customer’s First


If you create a report and no one uses it, does it provide value

Philosophy in Action

Communication

Over communicate, this gives the visibility and allows stakeholders to see what tracks.

  • Public channel

  • Announcement



Documentation

Absolutely necessary, try to dedicate 30min/week, makes answering questions easier for both parties, include last-updated timestamp, include standardized terminology and have stakeholder sign off

  • Guides

  • Definition docs


Transparency

Every employee has access to data, ticket system for requests and prioritization so they know the SLA, ask the stakeholder to select prioritization if there are too many requests

  • Data for all

  • Prioritization


Q&A Time!

How do you convince people to adopt your data culture?


Generally data folks tend to be less bias, but it can be a little harder if you have comfort using a particular tool. When you have stakeholders that want to use different technologies, it is important to investigate if that tool fits and where that tool fits. Leadership values are very important in adopting data culture, support means everything.



Building trust around data in a fast growing scale up business

Global Data Governance Lead at Contentsquare


Data Governance Strategist at Contentsquare


Upside: skyrocketing growth, so need for near real time business performance monitoring

Challenge: teams, processes, data evolving and growing extremely fast

Problem: manual data checks, time consuming, low efficiency on internal reporting, dashboard downtimes


How are we cracking and solving the problem?

Our purpose is to make data accessible understandable and reliable


Little spoiler, the solution is…

… to treat every data team output as a data product.


Execution of Data Governance and Data Quality

  • Data prioritization

  • Data catalog and data lineage

  • Data protection

  • Data quality

  • Data policies and processes


To get your data in a good place with data quality, you can identify bad data by conducting data quality checks. This can be done by implementing data quality rules which involve monitoring, alerting, and a process to fix the data.


This can come with a few challenges, including alert fatigue and a lack of ownership and user engagement.


Key Takeaways:

  • Clear ownership in Data Governance can be a game changer for data quality

  • Make data collection side accountable (operations, information systems), not only technical/data teams are responsible for data issues

  • Breakdown your big issues (high volume of alerts = zero engagement from the business side)

  • Balance value creation and foundations to yield performance reporting scalability



9 Predictions for Data in 2023

Venture Capitalist


Product, Monte Carlo


  1. Cloud Manages 70% of Data Workloads by 2024

  2. Data Workloads Segment by Use

  3. Metrics Layers Unify Data Architectures

  4. LLMs Change the Role of the Data Engineer

  5. WASM Becomes Essential in Data

  6. Notebooks Win 20% of Excel Users with Data Apps

  7. Cloud-Prem Becomes a Norm

  8. Data Observability

  9. Decade of Data Continues


Q&A Time!

What do you think the impact of synthetic data will be? where will the value be - data amplification or privacy?


Synthetic data is creating data.

This would provide value with privacy and computer vision use cases.

Privacy use cases: since capturing and using PII includes a lot of restrictions, it would be easiest to create anonymized data to use for the models (like fake credit card and social security numbers).

Computer vision use cases: when trying to train self-driving cars during bad weather conditions, there are not a lot of hours of data for extreme situations.


Which one of your predictions are you most confident of? Which one will have the largest impact?


Tomasz has put together these predictions for about 5 years and averaged a success rate of about half!

  • Tomasz believes the cloud prediction is the most straight forward

  • Mei likes #8



The Role of Decision Bias at Data-Driven Companies

Nobel Prize winning economist and author of Thinking, Fast and Slow and Noise: A Flaw in Human Judgment


CEO and Co-Founder, Monte Carlo



Daniel got the chat really excited, many were star struck.

Barr asks Daniel all the questions we are dying to know!


Intuition is flawed and needs to be controlled

How do you define human intuition?


Daniel says that the standard is kind of a biased definition. It is knowing something without knowing how or why you know it. He mentions that it is biased because to know something it has to be true.


He says intuition is best defined as believing you know something and now knowing why you believe it. He also mentions that many of our intuitions are on the mark, some are poor.


What does poor intuition look like?


He shares an example from his book:

Imagine a young woman graduating from university.

Fact: she was reading fluently when she was 4 years old.

Question: What is her GPA?


What is striking is that everyone has a number in mind and a pretty good idea, not 4 but probably better than 3.8. How did that number get generated? It was actually an intuitive number as it just came to us.


Some may think that they have a good idea when you hear about a young person, you have an idea how clever they are at a young age. You can put a percentile number to that, probably high to mid 90th percentile. GPAs can be related to the same percentiles. However, that is DEAD WRONG.


You cannot have a best guess at all about her GPA, because there is very little information and lots of things could have happened to her since she was 4 years old.

The proper way to guess is to figure out the average GPA at the university and base it on that.


What is the role of intuition when it comes to AI and ML?


The interesting thing with ML is that the output of ML is much like intuition. It is a black box and you don’t know where the prediction came from. You are not sure how all the data was used to get to the prediction. This looks very much like intuitive predictions.


Something else interesting, AI predictions are much more accurate than people's intuitions. They just use the data better than what humans are capable of.


What can ONLY humans do?


AI is a baby, it is only about 10 years old, this is just the beginning. You can’t see the limits; therefore Daniel said that he can’t imagine what humans can do that is beyond AI. What he does know is that what we have in our head is a computer. Therefore, other computers can simulate what we can do. However, it is going to take a long time.


He goes on to say that AI will not be like humans, it will be better than humans. In the foreseeable future, humans are going to be in charge, and try to control that beast. But in a few decades, the capabilities of AI will increase greatly and a lot of people are worried about that.


What is the difference between system 1 and system 2 thinking?


There are two ways ideas come to mind

Question: 2+2 > Answer comes to mind without a second thought

Question: 17*24 > nothing immediately comes to mind but you can generate an answer if you think hard and long enough.


Thoughts happen to you (system 1), intuitive thoughts

More controlled, generate, more control, reasoning (system 2) like preparing your tax return or reading a map.


Daniel shares an interesting problem with the interaction between system 1 and 2 thinking. He says that a lot of the time we tend to think we are all system 2, that we are a conscious and controlled person aware of everything that we do. But a lot of times we are just accepting our system 1 thinking.


Puzzle Example:
A bat and a ball cost $1.10 in total. 
The bat costs $1.00 more than the ball. 
How much does the ball cost?

People often fail, system 1 generates an answer and system 2 just accepts the answer. Only some people slow down, are skeptical, and confirm whether the answer is correct. This predicts a lot about someone and their style of thinking.


Can you train yourself to think a certain way, with a certain system?


It is difficult to improve the way that you think. You can recognize situations where your intuitive thinking is wrong. If you recognize this, you should slow down in those situations.


Be suspicious of your own intuition


The Future of Data Observability

Head of Products at Monte Carlo


Head of Product Marketing at Monte Carlo


Biswaroop and Jon closed out the conference with a big announcement on the future of data observability, introducing the Data Reliability Dashboard. This tracks data platform user metrics, number of data quality monitors, table uptime, and recent incidents, providing key indicators of data health over time.


The Data Reliability Dashboard will focus on three main areas that will help leaders better understand the data quality efforts that are happening in their organization:

  1. Stack Coverage

  2. Data Quality Metrics

  3. Incident Metrics and Usage


The team at Monte Carlo also announced two other new capabilities that we’re excited about, including:

  1. Visual Incident Resolution

  2. Integration with Power BI















Another great conference with great people, great learnings, and great music!!




Happy Learning!!

Recent Posts

See All

Comments


bottom of page