Event Review: Big Data & Analytics West Summit

The only big data event dedicated to trends and technology celebrating data innovation in Western North America, hosted in Vancouver for the 3rd year. This year’s edition was a hybrid in-person / virtual event featuring interactive sessions, curated breakout learning, and face-to-face networking opportunities.

As a virtual attendee, I must say that this event was hosted on a very interactive and fun platform, by Hubilo. There were all kinds of opportunities available to get involved, including the main stage, exhibitors hall, social wall, Q&A, and a leaderboard! I tried my best to stay at the top and promote this review 🤓 I also really liked the pop out feature for the mainstage so you can still view/listen in while perusing the rest of the platform.

As for the speakers and companies that were represented at this conference, I realized that I have not been to many conferences outside of the States and I learned about so many new companies out there! I may have gotten carried away with this summary because of how much I learned 😊

The event didn’t start until later in the morning for me, so that gave me some time to visit all the exhibitors and check out some neat products.

Product Highlights:

Microsoft Intelligent Data Platform

SAS Reimagine Marketing Podcast

SAS The future of consumer experience infographic

Alteryx Maturity Assessment

Cloudera CDP One

Denodo Infographic: The Total Economic Impact™ Of Data Virtualization

Collibra Data Intelligence Cloud

Scott Taylor kicked off the event with the utmost enthusiasm as he shared that the hotel is already playing Christmas music in the elevators. After some housekeeping items, thank you to the sponsors, and a much deserved shout-out to the Hubilo app we were on our way to an exciting and educational day!

Day 1

Keynote Panel: Tools To Build A Data-Driven Culture

Leveraging Processes And Methods To Cultivate A Data-Driven Culture Within Your Organization

Andrea Hanley, Director Of Data And Analytics, Calgary Airport Authority

Melisa Albas, Director Of Enterprise Insights, First West Credit Union

Andrew Hall, Vice President, Data & Analytics, Canada Drives Group

Moderator- Mark Lowerison, Co-Founder And Chief Technology, B153 LTD

This panel brought a diverse set of individuals, coming from some interesting industries making it extra interesting to hear their data stories.

One highlight that I found particularly interesting were the applications of data analysis in airports. Andrea shared that with data, you are able to determine which shops and which items contributed to the most revenue. Additionally, you are able to tie those sales with who was purchasing the items due to the location of the arrivals. So, if you ever experienced a last minute gate change, they might be trying to get to closer to a particular store to purchase some items 😉

Andrew works at a familiar company called Canada Drives, which I gathered was a Canada version of Carvana. This was particularly fun to discover that other companies exist out there trying to make the car buying experience better for consumers. Data culture is huge with this company in both making sure that the consumers and the company can make better decisions.

Melisa shared an example of how data culture and community building led to a massive decrease in call wait times at First West Credit Union. When they first started building their data culture, they were able to get leaders to join and become mentors who contributed to making improvements through communication and collaboration. Goes to show that great things can be accomplished when communities work together.

Mark asked about some data challenges and the overall consensus was the human side of things with data culture. That and access to the data (as several people in the audience agreed as well).

Industry Expert: Evolution Of Governance

Exploring The Future Of Governance

Kimberly Nevala, Director Of Business Strategies, SAS Best Practices & Advisory Business Solution Manager, SAS

Kimberly shared with us the ethos of your data ecosystem and the right approach for success.

Three key takeaways:

Gain a real sense of clarity and understand your organization’s values and how decisions are made. One way to do this is not by reading the mission statement but to figure out who and what seem to drive decision making and changes in the organization.
Define the risk appetite of the organization. This can be either risk tolerant or risk adverse, but the factors that go into this are the mindset of leadership, how your organization is regulated, and the relationship with employees and customers/partners.
Understand how good your organization is at creating guardrails at making decisions, even outside of analytics. Something that can help here is to identify items such as standards, policies, and procedures.

Over the years we have all seen new data trends and buzzwords, ‘data driven’ being one of Kimberly’s least favorite. This is because no one really understands what exactly it means, and I completely agree. The main objective for data is how to better use it for data analytics to improve decision making.

What I found the most notable about this session was the way that people and organizations perceive uncertainty. Kimberly has seen organizations reacting to uncertainty by trying to control everything, trying to predict what is going to happen in the future and make rules, policies, and procedures to help. The problem here is that this is where data governance goes to die. Good governance shouldn’t be viewed as a strict rulebook, it should enable teams to be confident in making decisions, even in the face of uncertainty.

Kimberly mentioned that it can be really easy to get overwhelmed by changes, new capabilities, and understanding how to control and manage them at the organization level. It can also be hard to not get stuck in your own head about uncertainty, but the only way we can be effective is by starting. And the only way to start is to just start 🙂

Case Study: Healthcare NLP

Healthcare Applications With Natural Language Processing

Raymond Ng, Director Of Data Science Institute, Canada Research Chair In Data Science And Analytics, University Of British Columbia

NLP is one of my all time favorite subjects, I have had some pretty cool projects in this space throughout my career. For those new to the topic, natural language processing is the ability of a computer to understand text and audio in a similar way to humans. There are several complexities to this however, like sentiment analysis (the sarcasm as Scott called it), which is what makes the topic extra fun!

Raymond’s expertise is the application of NLP in healthcare. While this is still predominantly based on structured data, there are huge opportunities to use these techniques with unstructured data such as medical reports, text messages, and even audio.

One of the biggest advancements in NLP this decade was the creation of pre-trained language models, such as Google’s BERT (Bidirectional Encoder Representations from Transformers). BERT was created in 2018 by Google from English Wikipedia and contains 2,500 million words. Since then, there have been many variants designed to be ‘fine-tunable’ with specific tasks and domains (BioBERT, PubMed BERT, RoBERTa).

Raymond shared a couple use cases which I thought were fascinating.

NLP used to predict returns to medical facilities by assessing mental health through social media posts
NLP used to predict dementia by the changes in patient language patterns over time

Keynote Case Study: An Organization in Transition

Redefining Who We Are With Data

Ryan Hum, CIO & VP Of Data, Canada Energy Regulator

This session was both informative and inspirational, a crowd favorite, as Ryan shared the power of data. CER is a company that oversees how energy moves in Canada and works to keep energy moving safely across the country. They receive a massive amount of physical historical documentation being stored in binders and boxes off-site and oftentimes forgotten. And the digital documentation that they have is not easily searchable because the file names are hardly descriptive and/or the information.

Ryan and his team were able to create a search tool, called BERDI (Biophysical, Socio-Economic, and Regional Data and Information), which provides easy access to this documentation, including regulatory data on Canada’s land and water, weather and wildlife, species at risk, environmental protection, public safety, and more.

The most notable thing about this tool was how others reacted to it. There were many Blackfoot Elder’s in the community that expressed how it could benefit the community and they were joyed that others can now access and learn from all that has been collected for years.

Ryan shared the term for museums, which are viewed as ‘death lodges’, and expressed how happy he was that the BERDI tool was able to save this data.

What started as a data journey to improve efficiency turned into the use of data to build trust, create dialogue, and build relationships. Ryan might have coined a new data term

Data Strategy

Recovering Your Data And Analytics Function After Covid

Karen Beckmann, Senior Director, Information Technology, Rocky Mountaineer

Rocky Mountaineer offers luxury train journeys through Western Canada and the American Southwest. When Covid hit, the company screeched to a halt as they could not operate for 18 months. It was predicted that the tourism sector could be set back by $1 trillion. During this time they lost profit, people, knowledge, and had to find a way to pivot.

One thing they were able to do was launch a US route in 2021, but this wasn’t achieved easily. Staff was 50% what it was prior to the pandemic, the great resignation hit, the reservation system was not finished, and it was not connected to finance.

When Karen joined in April 2021, she needed to figure out the new approach and had a plan to react. Once the organization and team were on the right tracks, they were able to start to build value through a vision to implement data as a service.

This new vision was viewed as a stool, with 3 legs:

Data / Report Governance
Data Architecture
DataOps Build Squad

After stabilizing the stool, they were able to put the cherry on top of the self-service. They realized that they can never meet everyone’s needs so they needed the help from leadership to build communities and help communicate, train, and mentor others through the self-service journey. They defined success once they saw an increase in trust, engagement, and efficiency as well as a decrease in ticket reconciliation requests.

Industry Expert: Data Virtualization

Enabling A Data Mesh With Data Virtualization

David Kerner, Regional Vice-President And General Manager, Denodo Canada

Two big clarifications:

Data mesh is something you build, not buy
Data mesh is an organizational construct, not a technological construct

The traditional monolithic approaches to data are that you copy your data from different sources and move it to a data store, like a warehouse or a lake, using an ETL pipeline. This can get very messy very quickly and comes with its challenges as you need cross-functional domain oriented source teams, hyper-specialized dat & ML platform engineers, and cross-functional domain oriented consumer teams. Data mesh can help.

Data mesh moves from a centralized data infrastructure managed by a single team to a distributed organizational paradigm. This does assume you will be working with a distributed data architecture, but the good news is that no one has ever fully succeeded in getting their data into one place. Some important factors of a data mesh are:

Organizational units (domains) are responsible for managing and exposing their own data. They know the data best, so they are fully equipped for this!
Data should be treated as a product and be easily discoverable, understandable, secured, and usable by other domains.
Domains are able to create and manage the data products themselves through a self-serve data platform.
Data products created by the different domains need to interoperate with each other. This requires agreement about the semantics!

Using data virtualization to enable your data mesh provides additional benefits:

Reusability, since these platforms are able to expose data to each type of consumer in the form most suitable for them
Polyglot consumption, which is a fancy way to say that data consumers can access data using any technology, not just SQL
Top-down modeling, which helps with the interoperability
Data Marketplace, think of it as a shopping mall to get your data products

While the benefits here are impactful, it is important to note that implementation is a process and it will not happen overnight.

Afternoon Keynote: Data For Your Benefit

How Credit Unions Are Flipping The Script On The Use Of Data

Jeremy Coughlin, VP Of Enterprise Analytics, Coast Capital Savings Federal Credit Union

I’m sure many have heard ‘Data is the new oil’ but have you heard of any of these?

Jeremy opens his session by sharing the evolution of data. My favorite, ‘Data is the new bacon’... the good stuff in the sandwich. Even though the way we think about data and use it has changed throughout the years, defining responsibility is still not very clear and ownership continues to be a hot topic, especially in the security and privacy world.

Open Banking is a concept which is a secure way to provide access to consumers’ financial data, all contingent on customer consent. Driven by regulatory, technology, and competitive dynamics, Open Banking calls for the democratization of customer data to non-bank third parties, consumers, etc. It basically recognizes that you own your data.

Open Banking is just the beginning. We can expect to start seeing Open Finance with non-banking data like insurance, pensions, and taxes. We can also expect to see Open Data for all other data like email, social media, and even health. But this can only work if there is trust in the organizations.

If you are interested in learning more, you can reach out to Jeremy or join the Vancouver Analytics Board

Industry Expert: Alteryx

Analytics For All, All For One – The Four Components Of Enterprise Analytics

Chris Smallwood, Manager, Solutions Engineering, Alteryx

The analytics gap is wide, and growing. While 92% of organizations continue to invest heavily in AI and analytics, only 19% feel that they’ve truly established a data driven culture.

According to the International Institute of Analytics, most companies still use localized analytics for reports, scoring at an average of 2.2 on the typical analytic maturity scale.

More surprising statistics:

How do we close this analytics gap? By making analytics FOR ALL (the 4 E’s)

make analytics Easy
cover Everything
be Everywhere
enable Everyone

Chris shared the top trends he sees for 2023 are everyone investing in some kind of cloud migration strategy, an increase in unstructured data, and the growth of geospatial data.

Industry Expert: Carto

The Rise Of The Cloud Native Spatial Analysis

Eva Adler, Data Scientist, Carto

Eva agrees with Chris’ predictions but she believes spatial data is already HUGE! It’s like they planned this hand off before their sessions. She works at Carto which is the leading cloud native Location Intelligence platform.

Location intelligence comes from visualizing and analyzing volumes of data in the context of location. While viewing information in the context of location, like on a map, app, or dashboard, it can provide unique insights to help make sustainable and resilient decisions.

Traditionally, location data has been siloed away, stored in warehouses separate from operational data in organizations, and do not connect to other data architecture. But the location intelligence platform paired with a cloud native architecture enables the use of this data, in an easy way with SQL.

Eva got into some details around Geospatial Hierarchical Indexes which is a system using a hexagonal grid to make queries run faster and was quite fascinating. However, I cannot do this part justice, so I will link to CARTO so you can learn more.

Case Study: Data Monetization

Data Monetization Through Scalable AI: Practical Lessons Learned From The Pursuit Of Buzzwords

Ian Hargreaves, Data Science Fellow, ATB Financial

Let’s get buzzy!

This is a great picture to share to both set the tone and share the amazing art that DALL-E can create. As one might expect, the definition of AI can get a bit winded but it really just boils down to something that helps solve a problem.

AI systems can do all sorts of cool things and can be found in every industry. Check out all these applications that fall into categories of sense, think, and act.

NLP and speech to text are some of my favorites that I have had the pleasure of working with in my career which helped make compliance testing easier (I didn’t have to read through pages and pages of contracts or listen to customer calls anymore!).

Recommender systems were probably one of the first exposures that the masses were exposed to via Netflix predicting what shows you would like to watch based on previously watched shows.

Robotic process automation might sound futuristic, but really it just means automating really boring processes that you have to do on a daily basis (no more Ctrl+C, Ctrl+V!).

Global revenues generated by AI systems continue to climb, and the pact is accelerating. By 2025, it is predicted to reach $32 billion. However, 55% of organizations that are actively investing in AI do not yet have a machine learning model in production (i.e. actually generating revenue). So what’s the deal?

It’s hard to manage stakeholder expectations for AI projects because it is a challenge to gain alignment on the problem we’re solving for and creating clarity around project outcomes. I had one experience where clients thought we were bringing in Rosey the Robot to clean offices when we proposed implementing RPA tools.

Ian shared The AI Canvas to help with these challenges by outlining the details of each AI project. This can then be shared with stakeholders to hopefully better gain alignment and understanding.

But how do we tell a clear story about the value that the AI product can create? Ian recommends getting creative. It’s one thing to show numbers of increased revenue, clients, or time saved. A fun way to show value was an example of total documents processed illustrated as a pile of papers next to a building. Now that’s a lot of value!

Day 2

Keynote: Data Metrics

The Data Art Of Good Goals

Vandana Mohan, Director Of Data Science & Engineering, Shopify

Always a great way to start the day with a talk about goals and quotes!

If you aim at nothing, you will hit it every time

Good goals are better than ‘something’. They should be:

Accessible: shared understanding and alignment
Actionable: sensitive to actual work in team’s control
Attainable: stretch sustainability

Vandana likes to look at goals like yoga, a good stretch!

Good goals = glue for effective orgs

It is important to note that goals can be applied to different levels of the organization, and each level’s goals are specific

Team / Individual: Planning & Prioritizing
Partners: Collaboration
Managers: Accountability

Creating goals is typically a 5 step process where you define the

What > Why > How > Measurement > Target

Vandana ended the session by sharing some critical DOs and DO-NOTs

Industry Expert: Data Mesh

How To Reach The Holy Grail Of Data Mesh

Martin Fiser, Head Of Professional Services, Keboola

Everyone’s talking about the data best, but what should we watch out for and where’s the real value?

There are 4 Principles of Data Mesh

Domain Ownership
Data as a product
Self-serve data infrastructure
Federated computational governance

How to put it into production and what should you watch out for?

Fragmentation: how many different domain specific stacks are you dealing with
Technical Debt: debt can accumulate over time
Human Factors: data contracts can create issues
FinOps: it is important to define the value of each product

It's particularly important to call out governance here because there are rules to the game. Some data governance tools are adding additional layers of complexity, while others (like Keboola) bake them in, including:

Data Access: access management is important to securing your data
Data Lineage: someone really should understand where the data comes from
Audit Trail: logging usage and changes to the data
Data Catalog: central ‘shop’ for data with usage tracking

Remember, it is more of an organizational approach, not about the technology (🤙🏻callback to David Kerner’s session).

Thought-Leader Panel: Driving Data Innovation

How To Take Your Big Data To The Next Level

Dr Fred Popowich, Scientific Director, Big Data, Simon Fraser University

Raymond Ng, Director Of Data Science Institute, Canada Research Chair In Data Science And Analytics, University Of British Columbia

Moderator- Mark Lowerison, Co-Founder And Chief Technology, B153 Ltd

This panel was extra interesting as we got to hear the perspectives from people in the academic industry. One fun fact shared by Mark is that you can conduct research using ‘Google Scholar’ which is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines and provides a simple way to broadly search for scholarly literature.

Mark started the discussion by asking what are some key application areas for adoption of AI and ML. Fred is particularly interested in the health industry and thinks that this will be a great place to provide value because people can relate to both personal and public data with the recent availability of pandemic related data. Raymond agrees and thinks that there are advancements with governments as they now realize the need for current and frequently updated data, again due to the pandemic.

Mark shares that there are a lot of people in the academic industry because learning is their passion but asks about how other people are keeping up to speed with upskilling. Raymond believes that a big part of the University’s role is trying to train and educate people that are in the workforce by offering opportunities to study on weekends. He also emphasized the benefit of focusing on interdisciplinary fields and understanding how one field can relate to and benefit others. Fred thinks that people still view Universities as just a place to go to get a degree rather than upskilling. He feels that education and training should be more than just degree programs. He also finds the interdisciplinary topic interesting and agrees to the importance.

The last topic Mark wanted to discuss was around advancements in data sharing platforms, like Kaggle. Fred mentioned that hackathons are making it easier now for people to be able to use datasets and provide code through notebooks and share their work with others. Raymond said that we actually still have some issues with data sharing that we need to address. For instance, the pandemic data cannot be shared across provinces.

Mark closes the panel with an uninformative metric which only NLP can quickly find the answer:

What does Whinny the Pooh, Chance the Rapper, Alexander the Great have in common?

… 'THE' 😂

Case Study: Client Experience

Driving Client Experience Through Data & Analytics

Dylan Roth, Managing Director Customer Experience Metrics, ATB Financial

Customer Experience is the sum of all customer interactions with your brand. It’s about how the customer feels about the experience. Because of these two facts, customer experience (CX) is hard to master because there are a lot of variables and not a lot of hard facts.

One misconception about CX is that it is just a white glove service to clients. However, the challenge is CX is relevant for every organization and how you frame it will help you succeed.

There are two questions you need to answer to succeed at CX.

How does CX connect to your strategy? (it absolutely needs to tie in with strategy)
How does CX drive value? (and of course, value = $)

Defining metrics and KPIs are helpful to outline what good looks like and how to prioritize. For example, a car dealership might focus on customer satisfaction and the overall car buying experience. When creating these metrics, it is critical to ensure that you are setting realistic targets. Additionally, if the metrics are not performing well, make sure to help support and improve the metric

Don’t kill the metric owner

Industry Expert Workshop: Data Governance And Democratization

Data Connects Everything: The Role Of Data Governance In Driving Data Democratization For Better Decision Making

Mark Kohout, Director, Governance, Adastra

Sudipta Chakraborty, Practice Lead, Data Management, Adastra

Data is the representation of the organization and data culture is a shared understanding of the ways in which we work. The term data governance, however, is still opaque and Mark is going to address this with us here. He believes that if data connects everything, so should data governance.

Data Governance is an ongoing business function focused on ensuring data is trusted, understood, and usable. It provides the necessary people and process framework for systematically delivering business knowledge about your data, and for making decisions about it.

Data Governance Benefits:

Data is Secured
Data is well understood
Data is trusted
Data is rationalized and mastered 🏆

The rest of the session was a detailed view of some tooling and a demo from Sudipta. If you are interested in learning more, check out Adastra

Customer Retention

Predicting Churn To Improve Customer Retention

Rajat Paliwal, Lead Ml Engineer, Save-On-Foods

Mirko Moeller, Senior Data Scientist, Save-On-Foods

I found this session particularly interesting because I can totally relate to the idea of being loyal to a particular grocery store 😀

Rajat and Mirko walked us through a somewhat technical presentation on how they were able to predict churn rate, which I found to be quite an interesting technique. According to Investopedia, the churn rate is the rate at which customers stop doing business with an entity. It is most commonly expressed as the percentage of service subscribers who discontinue their subscriptions within a given time period. The twist here is that grocery stores do not have subscriptions so there is no indicator on when someone has actually stopped shopping, you are basically just ghosted.

Before getting into how churn works here, let’s look at why we care about churn in the first place. Everyone likes a good ‘why’:

Figure out who is likely to stop doing business
Intervene before they leave
Targeted marketing/offers

To accomplish this, churn is viewed as a time to event. Instead of predicting a binary label (churned/not churned), they predict time to the next event (event = instore or online shopping). If the expected next event is predicted to be too far in the future, it is considered the customer churned. They found this approach to be intuitive, flexible, and reusable, which were all advantages. Here is an illustration to help:

And incase any Data Science nerds are interested in specifics, they used the Weibull Distribution 🤓

Case Study: Data Catalog

Three Must-Have Artifacts For The Data Driven Organization

George Firican, Director Data Governance & Business Intelligence, University Of British Columbia

This session was one of my favorites because the topics George shares are always super relatable and easy to follow. I am glad that this was after lunch, because this one made me hungry 😋

Jumping to the end, every data driven organization needs to have these three artifacts:

Business Glossary
Data Dictionary
Data Catalog

Now for the ultimate question: Which of these is a sandwich?

If you have been in the data industry for a while, you might have guessed the answer…

it depends!

What?

It kind of depends on who is answering. I like to have open perspectives on the world and can believe that all are sandwiches, leave no food behind.
It really depends on the definition of a sandwich. According to the USDA, ‘A sandwich is a meat or poultry filling between two slices of bread, a bun, or a biscuit.’. There goes grilled cheese (yes, even though it has sandwich in the name)

This is why a business glossary is KEY since it is a document for all the business metadata, including definition. The data dictionary is similar, but goes more into technical detail. And as for the data catalog, you can think of that as a way to gather all your different sandwiches into one space to be able to query them all (think Amazon!).

Here is a great representation of all three

To learn more from George, visit his LightsOnData page 💡

Case Study: NLP

Building Data Pipelines: Integrating Transforms Like BERT Into Your Analysis

Rob Davidson, Director Of Data Science, ICTC-CTIC

BERT stands for Bidirectional Encoder Representations from Transformers and is a machine learning technique for natural language processing (NLP) pre-training developed by Google in 2018. BERT was pretrained from unlabeled data extracted from the BooksCorpus with 800M words and English Wikipedia with 2,500M words. One fun fact here is that the BooksCorpus includes a lot of romance and mystery novels.

BERT is good at sequence-to-sequence based language tasks like text summarization, sentence prediction, conversational response generation, and question answering. BERT is also good at natural language understanding tasks which include word sense disambiguation and sentiment analysis (Scott’s favorite).

There are various domain-specific BERT models, which Raymond shared a few during Day1, including: