This fall, as part of Scientists in Synagogues, Temple Shalom in Dallas, TX has offered a five-session program that has explored how Judaism guides us in using the power of data science, in everything from student admission to colleges to health care. This introductory talk was given on September 12, 2021 by Wayne Applebaum, Ph.D.Read Transcript
You know, the whole thing about science and religion is the difference a lot of times between what we can do, and what we should do. Data Science is no different from that. You could use it for good, you could use it for evil. What we hope to get into today is: what is data science? And how do you become a good consumer of it? We’re not going to get into formulas and stuff like that. What we’re going to get into is the practical, basic human knowledge that shapes data science, and in fact, any science.
So what is data science? Data science is simply using data to make decisions. And as such, it covers a lot of ground. And today’s session is going to be more of an introduction. And in the next sessions, Dr. Barry Lachman is going to talk about his work in the area of food insecurity, and we’ve got sessions on medicine, on how standardized testing should be used, and also data privacy. I’ve probably left one out. But we have four more sessions after this one, to take deeper dives into the area. What we hope will come out of this is some continuing discussion, not only on those areas, but on the areas that we haven’t had time to touch on.
Rabbi Jonathan Sacks, of blessed memory, wrote a book called The Great Partnership.” And it’s about the partnership between God and science. And it’s a tremendous book – if you ever have some time, I advise you to read it. He also wrote, in 2016:
“In the beginning, God created heaven and earth. This isn’t, as people sometimes think, an early attempt at science. It is more poetry than prose. What it was originally was it a rejection of myth, the idea that the universe is a place of clashing, capricious forces seen as gods. Instead, the Bible sees the universe as a place of law, governed by order, and seeks to create in us a sense of wonder, knowing how vast the universe is and how small and vulnerable we are.”
That wonder is heightened by discoveries like the Hubble Telescope, where an international team of astronomers discovered that there are 10 times as many galaxies as we previously thought – 2 trillion of them! Think about it. With sophisticated scientific knowledge, we didn’t even expect the existence of 90% of the galaxies that we now believe there are.
Meanwhile, in the opposite direction, physicists at the University of Manchester announced that they have begun the search for a new God Particle, the so-called Sterile Neutrino, that’s even harder to detect than the Higgs Boson, which may help us understand how the universe came from being, matter over antimatter, after the Big Bang.
This is extraordinary stuff, reminding us yet again that the more we know, the more we discover that we don’t know. Science and religion converge here. As Richard Dawkins put it, “The feeling of awe and wonder that science can give us is one of the highest experience which the human psyche is capable of. All of us believers, agnostics, and atheists, surely agree on that.”
And Rabbi Sacks goes on to say, “What worries me, as I turn the pages of Genesis, is to read how quickly humans betrayed their roles as guardians of the natural universe, and turned instead to violence, murder and war. And it worries me, as I turn the pages of the daily news, is that we’re doing it again, as if we learned nothing from the intervening centuries. What science and religion should both be teaching us right now is humility and awe, knowing how small we are in the scheme of things, and how dangerous it is to destroy what it took almost an infinite time to create.”
That, I think, brings together our knowledge, our responsibility, of how to use data science, or any science, responsibly. Albert Einstein had a famous quote. He said, “God does not play dice with the universe.” And later he retracted that quote, which is not so famous. He did say it, but he was referring to the random nature of some that was involved in quantum physics. What he went on to say was that God created a set of laws and physics and things – which have rules under which things operate. Yes, there is a sense of randomness, but that randomness is really something that we know is free will, something that we can predict, if we have enough information.
What data science is, it’s a lot about making sense of things. Today, we’re kind of in an information blizzard. So I wrote, with regards to – since we’re a Reform congregation, I didn’t feel it was that bad to say – “find the pig?”
And I’ll tell you, I haven’t found it yet. But it makes the point. We have so much data coming at us, more data than ever before. And how do we create some understanding around this – something like this? How do we find the important part? And now you can find the pig easily.
But what data science does – and here’s an interesting thing. A number of years ago, Target (who we’re mostly familiar with) wanted to figure out if a customer was pregnant, even if they didn’t want us to know. Could you do that? So they went to their friendly neighborhood statistician, who started studying the problem. And he found out that if you saw that a woman, in July, bought a diaper bag or a purse that was large enough to double as a diaper bag, vitamins containing magnesium, zinc and assorted other substances, and cotton balls, and disinfectant wipes, and a number of other things, you could be 87% certain that she was due to give birth in August. That’s scarily precise.
And why did Target want to know this? Because they’re marketers. You know, if I send somebody there, if I send somebody coupons, and make them a customer, and show them I’m concerned about their unborn child, I can own that customer relationship for life. Then a bit of reality corrected, and this is kind of what I call the law of unforeseen consequences that plays a big role in what we do. They got a note, they got a call from an irate father, who said, “Are you encouraging my daughter to get pregnant? You’re sending her all this baby stuff. She’s only a teenager. How could you be – how could you do this?” About a week later, he called them up and said, “I owe you an apology. There are things going on in my own home that I was not aware of.” So is this good marketing or just plain creepy?
There are, really, three questions that you should ask when you consider data science, because if you don’t answer these questions correctly, then I will tell you that nothing you do mathematically will make any difference.
I want to tell you that data science, at least being a consumer of data science, is all very simple. Everybody believe me? There are a few questions that you need to ask. The first question is: what question am I trying to answer? What is the basic question? Because how I ask the question, and what the question is, is critical. For example, the 1948 presidential election featured the largest poll ever conducted by a private institution, and it was designed to predict the presidential election. And if you remember your history, there’s a great picture of Harry Truman holding up a Chicago Tribune paper that says “Dewey Wins.” That’s because they were certain that Dewey won, would win, based on this poll.
That was the question they were asking. The second thing you’ve got to ask, though, is, what is the acceptable answer, or the acceptable information, to answer this question? And this is where a lot of things go awry. The question they were asking is “Who’s going to win the presidential election?” The information that they were gathering – they had a bunch of pollsters who they gave quarters to “Call up 50 people a day and we’ll pay you,” or, “Get in touch with 50 respondents today.”
Well, there’s a lot of ways to get in touch with people. And in 1948, probably the easiest one was to call them on the phone. This, unfortunately, introduced bias to the data. If you think back to 1948, who had telephones? The telephone was more of a luxury than it is today. You know, it would be akin to saying “Call only landlines” today, which would just be silly. So they ended up with a non-representative sample. And although they answered their question, they failed on the second piece of the puzzle, which is “Get the information that is acceptable to you.”
Now, the third piece is: what information is available, and how do I collect it? And this is another big issue of data science today. Since the internet came into being, when I was a young tyke doing my graduate work, getting data was the toughest thing. You needed stuff to analyze. So if you had data, that was great. On the other hand, what we have now is too much data, data that’s not described well, data that might be biased. And it’s easy to get. And when I’ve worked with companies, I found that a lot of them wanted customer loyalty data. “Well, we don’t have that.” So they made up something that stood for customer loyalty, whether it was or not. And I’ve had meetings with the CEO of Colgate-Palmolive, where he’s told me, “Don’t believe that report, it’s garbage.” But we have to. So within the area of data science, we have to ask those three questions to validate what we do.
Now, you may think that “data science is good,” or “data science is evil.” It’s neither. But there are a lot of things that have been around for a while. Think about how Amazon and Disney are similar. One is an online seller, used to be an online bookseller, and the other runs amusement parks. And let’s just focus on the amusement park thing.
What Disney taught Amazon, and a lot of other people, was how to control the user experience. When you went to a Disney park, how did you get there? If you went to the one in Florida, you came across either by the monorail or the boat, but it put you in this same place – Small Town America. And what’s the first thing you smelled when you came off the boat? Popcorn. Doesn’t that popcorn smell good? And from then on, you’re in Disney’s hands, and they’ve shaped your experience.
Now, what does that have in common with Amazon? It turns out, a whole lot. Every time you click on something in Amazon, they’re learning something about you. Even if they don’t know who you are, they’re combining that with everything else they know about everyone else, and using that to shape your experience, to bring you click closer to hitting the Checkout button. It’s all about taking that huge stream of information and creating the experience.
So, Amazon, Google, and a lot of other [companies], especially Facebook, make money from creating a user experience. You know those annoying little ads that follow you around, if you’ve gone on a trip to Amazon? It’s just good business. It costs a few tenths of our hundredths of a cent to place those ads. And they’re not terribly effective, but they don’t have to be. All they have to be is effective enough for Amazon to turn a profit. And they do.
Let’s go one step further and look at the area of assumptions. And assumptions have a lot to do with the humanness of data science. Consider when you meet someone for the first time. How long does it take for you to size them up? Is it a minute? Is it an hour? Is it after a cup of coffee? Is it after playing a round of golf with them? Well, research has shown it takes about seven seconds. And that seven seconds is really shaped by a lot of your assumptions, biases, and things that happened to you in the past, whether they’re valid for that person or not.
So that’s how quickly we make up our minds. And I’m asking you to slow down and look at your assumptions. Suppose you go into a casino, and you stop at a roulette wheel and watch it come up red five times in a row. You decide to make a bet. And how do you bet? Do you bet red or do you bet black? If you understand statistics, and if you believe – and that’s the key word – that the roulette wheel is fair, it doesn’t matter which you bet, it’s a 1-in-32 chance that the next one will come up either red or black.
But here’s the key, too. I ask, after observing what I observed: How can I make the assumption that it’s a fair wheel? I’ve observed five reds, I have not observed the black. How does the smart money bet? Well, the smart money bets that the wheel is somehow biased towards red. So given the information you have, you end up rejecting your assumption that it’s a fair wheel, and go on and bet the way you think it’s going to come up.
Looking at data gives us a chance to look at our biases, and how our biases shape our use of the data, how – you know, we have the area of gerrymandering. In gerrymandering, you make the assumption that a group of people is going to focus away, and then you draw the guidelines to give yourself an advantage. The boundaries of the electoral districts, which we see, provides, given the numbers of people, a bias towards the party that’s in charge – let’s not go any further.
So the question that comes up is: “Knowing what we know about data, about people, and about assumptions, how does Judaism provide us with the guardrails to shape data science to be ethical?”
Harriet Bell provided me with a list of elements of data ethics, and it’s a pretty interesting list that I’ll invite you to consider now. And the elements of that, of those ethical guides, are:
Veracity: does it conform to the facts? Is it accurate? Is it a real view of the world? Then that opens up a can of worms, especially when we look at the various news feeds.
There’s also autonomy: freedom from external control or influence. Who is going to benefit from shaping the data this way?
Justice, righteousness, equitableness, or moral rigor – is it just to use the information this way?
Beneficence – the act of charity, doing good.
[Non]malfeasance: first you shall do no harm.
And fidelity: faithfulness to what you pledge to do.
And if we look at this structure, we can say: How does our knowledge of Judaism shape what we’ve learned – what we need to do? I’ll close with what I think is one more interesting example, and then hopefully our panel will have some comments. Given that we know that viruses mutate over time, and vaccines prevent a virus from finding a new host, if there are a large number of opportunities for a virus to mutate because a lot of people didn’t take the vaccine, think about what that does to the possibility of a virus mutating. The answer is: it shoots way up and makes it like, “Hey, that which doesn’t kill you doesn’t make you stronger. That which doesn’t kill you mutates and tries again.”
So, there you have it – at least some basics to data science. Thank you all for coming out today and missing the first half of whatever football game you were going to watch. And our next program will be in two weeks, and Barry Lachman will be leading it – I know enough about his research that it’s going to be really interesting, and it doesn’t raise as many ethical questions, but certainly gives us what we as Jewish can be expected to do with data science. So thank you all for coming out, and hopefully you’ll judge and consume data better.
(This post is part of Sinai and Synapses’ project Scientists in Synagogues, a grass-roots program to offer Jews opportunities to explore the most interesting and pressing questions surrounding Judaism and science. As part of the program, Temple Shalom in Dallas, TX has been holding a series on Judaism and data science. This talk was adapted from a session recorded on September 12, 2021).