What a COVID-19 project taught me about Data Science and Designing Visualizations
When rows of Excel and trend charts of COVID-19 doesn’t make sense, what stories will it tell when we further visualize it? How do we ship visualizations to tangible experiences powered by data? Is Data Visualization data science? What are the challenges with COVID-19 PH data? This is the backstory, questions we asked, and lessons learned from viz.tracecovid.ph. My first data project!
I started writing this on the 3rd or 4th holiday since the quarantine got implemented. Over the past two months, a sizeable portion of my after-office hours has been dedicated from Netflix to DCTX.ph. DCTX is a volunteer group by non-profit organization DevCon designing and developing scalable gov tech solutions for free in response to the pandemic while closely working with IATF through DICT & DOST.
DCTX and TraceCOVID project started when DOST Secretary called DevCon founder to help on the government’s pandemic response. A few days after, the team is on a Zoom call planning the first project — a Contact Tracer for the Philippines.
The calls felt like an army assembly of highly capable, passionate, and privilege-mindful citizens who want to help the country and the government fight this virus. With our hearts, time after work, minds, and coding-designing machines, we wanted to help.
However, few project prototypes, designs, and days after, the project faced a roadblock. There weren’t available protocols set yet for data privacy, standardized data sources, and the government agencies are still mobilizing about new protocols. Few new priority DCTX projects came in as well with direct requirements from IATF, and the TraceCOVID team was stuck between the legal and health protocols on visualizing health data which developers can easily execute but might face privacy and ethical risks until sorted. Few more meetings after, the resources had to pivot to a more urgent direct project from IATF which later on became the rapidpass.ph contactless checkpoint verification system.
The project pivoted from contact tracing to visualization instead and here’s the phase 1 the team worked hard to ship:
So in this post, let me share some reflections on viz.tracecovid.ph, a COVID-19 data visualization tool for the public and policymakers. Random reflections on leading a design team in shipping a visualization project in response to these uncertain times
1. Viz is part of data science.
Data visualization is data science. Wait. Is it? I had to recheck the visual Prof Erika shared. Yes, Data Viz or visualization is one of the fundamental aspects or parts of data science that involves communicating data. And working in advertising, the challenge of communicating data into stories that inspire or demand action excites me.
While TraceCovid.ph phase 1 zeroed in visualization and didn’t have predictive models and machine learning (yet), It appeared to me that data viz is a critical aspect of data science that evoke emotions, insights, and directions to make data-informed decisions. I’ve also been intrigued by complex data science and visualization projects after dropping by some of the MSDS program demos at AIM, so I knew I wanted to have the project see the light of day even if about 80% of the resources are all hands-on deck to the new RapidPass project. I’m glad we were able to launch after about 5 pivots!
2. Data viz helps make sense of Excel files data.
The first “tracing” visualization I found inspiring was Singapore’s website https://co.vid19.sg/singapore/cases project above. It clearly communicates the “common spaces” and “network” of cases. In a few seconds, you see how a handful of cases started from the airport, from a foreign destination, or a place of worship with minimal cognitive load or thinking time for readers. But make no mistake, visualization is a science and an art. We later learned that it’s called a network map to categorize cases by location or links to better identify commonalities. Colors also help visually segment instantly while medical jargons have health protocols to follow and still a challenge to translate. It felt like a fun puzzle we don’t understand but excited to solve. A mapping visualization that you see in Professor Sergio Marquina or the connections of suspects in trying to solve an espionage storyline. But new inspirations introduce new challenges.
Should TraceCovid be over a map? Are confirmed patients fine tagging their residential location? Where do we get the common trace data? What technology libraries and frameworks for these kinds of visualizations? Do these work well and lightning-fast on mobile?
3. Data and viz projects provide a different set of Technical gaps.
The technical gaps became a major roadblock. Due to reprioritization to help the inter-agency IATF for checkpoint management and improving safety for armed personnel in verifying checkpoint access, 90% of DCTX resources got reallocated to RapidPass. Trace had working prototypes and designs but had less technical leadership support. While we have a capable progressive web application (PWA) team, the challenge was to find the right data source and translate Figma product designs to working front-end interfaces that will smoothly work on a mobile phone browser. We had to cut the exploration stage. Next, we had to ask:
Not because it looks beautiful, it means it’s practical and fast on mobile. Are there patterns that designers should keep in mind when designing? Is this even feasible based on available data?
Slack channel discussions turned to a quiet quiet chamber. Being a virtual community, I’m sure junior to mid-level experts are thinking twice to take the lead and handle the pressure. I had to reach out to data science communities to recruit volunteers. Quarantined nights and new designs passed by, no one could provide definite answers if the designs we render are feasible, what patterns should we follow, and who can commit to building them. We felt like excited architects and planners who had no clue if the designs we are doing are feasible, secure, and doable. We have an idea of what works and what excites, but no clue yet how to make it work end to end. We were in limbo.
4. Usability of data and visualizations.
Towards the end of the first quarantine month, Sir Jess from Cebu and Dwin from GIS community stepped up work on prototypes. Both GIS professionals and not professional developers, they painstakingly looked into all the available curated data sources, figure out if the “epidemiological link“ column is the “trace” we’ve been trying to figure out. For about 3 to 4 times an update a week after long days at work, concepts turn into screenshots rendered in a map layout. The team is excited despite knowing that the prototypes won’t be enough to launch beta.
Few more volunteers signed up to DCTX and we skimmed the database for technical and management teams who might be able to help move this. Then with the help of the volunteer engagement team, we found PM volunteers came in to help. Jo works as a product manager from Canva.com in Australia hopped on an onboarding call but asked questions the team forgot to cover. She reminded some important questions to the team as the prototypes turn into 70% ready interim designs.
Who will use this again? What value are we creating for them? Do we need a dark map when people are familiar with Google Maps? Isn’t this view overwhelming? Do we expect people to understand what PUI and PUMs mean?
When a team of hackers, GIS experts, and designers work together on a project, it can end up with a fancy technology looking for a problem to solve. That time, we were a group of volunteers with a peg and technology stack who wanted to ship something fast and help. And without a client to guide the direction yet, we realized it was a technology project looking for a problem to solve.
Despite frustrations, we had to pause and be reminded of the basics of empathy in human-centered design. This allowed us to reframe the design — despite technical and data source limitations — to two main users: the public and policymakers. There wasn't a major revamp, we just had to adjust the default view and order of data sets in order of who will use it. By that time, we had a map view that can trace the epidemiological link of confirmed cases. Little did we know that DOH will eventually disable the “epidemiological links” a few weeks after while we are excited to be the only one to visualize the links on a website.
5. The responsibility of data and visualizations
Another earlier challenge the team had to face was the ethical and legal risks of storing sensitive health data from users published by the government. The first contact tracer project envisioned was to be the “Foursquare or Swarm” for crowd-sources or self-reports so the government can visualize, trace, and track at-risk cases by location. While technology and design to execute was never in question, some operational challenges came about.
If a user reports having symptoms, do we tag them as PUI or PUM? What protocols will be in place? Who will own and manage their data — DOH or IATF? How do we guarantee the accuracy and truthfulness of reports? If a mall or building has several reports and positive cases, what call to action should we provide users who have checked-in to that area? Will we be in trouble if we tag a location or establishment as high-risk area that people should avoid?
Eventually, these questions were answered and new questions were raised as new projects from other initiatives came about like SafeSpace.ph, NoVID, and suddenly a lot more tracing platforms. Though not a data project, an example of the ethical health protocols to take note would be the case of https://fightcovid.app/, it was an initiative designed for scared individuals who want to see if they are at risk or not while decongesting emergency services through a symptom checker and automated triaging recommendations. They hacked and launched in a few days but they had to switch off their questionnaire after getting a call from DOH on the protocol and implications of the recommendations.
But because of the FIGHTCOVID.app initiative, I believe DOH had to expedite creating the local and standard guidelines for health questionnaires that lead to v2 iterations https://fightcovid.app/.
Now, they now have 15 additional local dialects. Wow! Talk about the inclusive scale and reach when technology is properly framed with stakeholders. The same lessons apply to visualize any health-related data.
6. You can’t explore and interpret the data you don’t have
Data cleaning is probably one of the least glamorous and strenuous parts of the project. Trash in, trash out. Scrapping LGU posts and formatting. I’ve been lurking on GeoViz and data science groups since graduate school days and the most common problem is the lack of digital, structured, and clean data.
Imagine visualizing a bar graph or a pie chart of your annual expenses for a school project but the rows provided to you are in JPEG format which you have to encode, most are in paper somewhere in your bag, some expenses are spelled out instead of numeric format, and the transportation category expenses are labeled differently such as Transpo, Grab, Byahe, TE, and the likes. Cleaning alone will take hours. This is how frustrating it can be for a data project because of a lack of structured data source
As we continually figure out the data source and resources to build TraceCovid, we bumped into the data stream of DOH from a FB tutorial and Github code repository source on visualizing local COVID-19 data. Sir Wilson referenced it as ArcGIS source which apparently means a cloud-based GIS mapping software that connects people, locations, and data using an interactive map.
Finally, we just need to write DOH to be the official visualizer of their data and that boosts the direction of the project! A team member reached out to their friends in DOH who maintain ARCGIS until we got told to not use it. Though publicly available via a URL, the guarantees weren’t in place for it to be the official source of “DOH data” even if it’s being used by other projects at that time.
That’s how the basics of obtaining COVID-19 data was like during the first few weeks. Until DOH created a structured data drop schedule that allows projects and the public to access raw data in Comma-separated values, CSV format, and labeled almost consistently.
The ArcGIS URL now displays “{“error”:{“code”:499,”message”:”Token Required”,”messageCode”:”GWM_0003",”details”:[“Token Required”]}}”
Meanwhile, the public DOH data drop can be viewed and downloaded here http://bit.ly/DataDropArchives. Who would have thought a national government agency could create an open data workflow in 2020?
7. GIS and Viz tools beyond Google Maps API.
Having been in digital production for 7 years and 200+ projects shipped, I was confident I can easily navigate through a visualization project. In fact, I’ve attended a Google Maps API for developers workshop before and did a few projects that visualize promos and content on a map. This should be manageable. Or so I thought.
Apparently, designing data visualizations on a base map is a disciple and vertical on its own. I’ve heard about GIS projects used for disaster planning and mapping, but I’ve only come to appreciate it better through this project
A geographic information system (GIS) is a system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data. The keyword of this technology is Geography — this means that some portion of the data is spatial. In other words, data that is in some way referenced to locations on the earth.
8. No-code, no problem with tools like Flourish
I stumbled upon a comprehensive visualization on Facebook posted by Sir Wilson Chua. I first met Sir Wilson Chua from an Ecommerce talk in PICC. Since then, I followed his projects which turn out to include data science projects ranging from forecasting dengue outbreaks based on geographic data of stagnant water in the Philippines, monitoring air quality via a DIY tool, and independently monitoring internet speed in the Philippines.
As it turns out, there are tools that let you upload a set of data and visualize them in exciting and insightful formats. If Tableau is the Wordpress of data visualization, Flourish is probably the Squarespace or Wix
Built by ex-journalists, Flourish.studio allows you to quickly turn your spreadsheets into stunning online charts, maps, and interactive stories. Next time I have a structured data set with thousands of data rows, Flourish will be my top of mind.
9. The leadership barriers to operationalizing Data Science projects
Few weeks from the launch, the next challenge we have is to get local government unit data for suspected and probable cases. For a time, the NCR data looked like Manila has the least risks. But only because our team couldn’t find a public post yet that we can scrape and encode for Manila’s local data. The next challenge becomes the lack of structured data from LGUs since DOH only publishes confirmed cases.
This reminds me of GIS-data science group discussion during the Taal Volcano evacuation in January. The argument was that there’s a lack of on-ground data for evacuation sites that hinder volunteer GIS-Designer-Developers to visualize it in their projects in real-time and help.
Why? The on-ground frontline teams are usually overwhelmed and prioritizing to survive. Visiting our home weeks after Taal Volcano, I brought my drone to take some photos and report it. But the ashfall was still thick, muddy, and aftershocks woke me up about 10x while trying to sleep, and all I wanted to prioritize was rescuing my family, getting the most important documents, and evacuate. I didn’t care about documenting it with a drone anymore.
For the cases of LGUs, entering fields in a Google Sheet or answering a form after a long day in the field becomes the least of priorities if you expect it to be done as “add-on” tasks.
Realizing from management school, this is a challenge and responsibility for leadership to operationalize, and resource plan and allocate. Questions like these shouldn’t be the burden for people on-ground or front-liners. Instead, these questions should be raised to higher-ups. If it’s a leadership priority, resources will be allocated.
Meanwhile, Sir Winston also shared the infrastructure design above on DOH data which visualizes how multiple sources of data have to be housed in boxes of infrastructure, and interoperably to talk to other platforms. Once the infrastructure is in place, the service agreements and resources are the next challenges.
Here are some frameworks visualizing the role of infrastructure, software, hardware, and people-ware in making data projects successful:
10. Data integrity and governance.
Despite monumental efforts for structured data transparency, the DOH data drops won’t go without criticisms. The DOH dashboard got redesigned and became a Tableau data visualization embed instead and we see more promising startups like Senti, ThinkingMachines, AIGov, and more working with the government. There were several speculations on controlling the data release to fit the preferred narrative but I stopped following all the notifications on daily cases count at that point. Then I was surprised when Stef Sy, Founder of Thinking Machines released a statement that made to my feed:
I’ve been following Thinking Machines since they launched in the Philippines as a data science consultancy startup. The idea of a high-speed and future-ready startups being able to work with the national government at scale for a life-or-death situation such as COVID-19 data transparency suddenly made me optimistic about a future post-pandemic Philippines.
A big challenge for data projects? Governance.
So, what’s next? What now?
First, shout out to the amazing Viz.tracecovid.ph phase 1 team: Math, Jess, Dwin, Jo, James, Cassie, John, Vanessa, and the rest of the team. Proud to have shipped this with you all!
Next, from the lens of privilege, I am deeply grateful to contribute my time and talent to DCTX.ph projects like viz.tracecovid.ph, and seeing it through until the launch. It was a tiring but meaningful escape and response from the pandemic anxiety we all face.
This is my first data science project and easily one of the most meaningful projects I’ve worked on. Aside from feeling good, I’m confident the hands-on experience of learning by doing will come in handy for future enterprise projects and products I design.
Currently, I took a leave as project co-lead for Trace v2 to focus on launching the 4th project of DCTX but I’m excited about the tangible directions of the project. And while a lot of government efforts are yet to be questioned, I want to be optimistic about the kind of DCTX collaborations and partnerships startups, companies, and individuals are offering to the government, and are getting shipped.
Government projects that you could only wish for — from protected bike lanes pilot in Edsa, to QR technologies used for identity verification, experimenting future use of emerging technologies like drones, and best in class supply chain systems adapted by the government, to Commerce inflection point — are becoming immediate case studies on the fly.
Hopefully, this is a glimpse of human-centric, data-inspired, and tech-enabled governance of the future Philippines made possible today.
For now, I need to get a new pair of glasses and off to another priority project. Para sa Bayan.
Hey, thanks for reaching upto this point! What an article to write, hehe. Tracecovid.ph is heading on to a major pivot for phase two. If you’re interested to help and volunteer your talent, DCTX needs willing hearts, minds and hands. Head on to http://dctx.ph and volunteer. Head on now and click that link. http://dctx.ph. See you in Slack?