I’m Steven Pastor, but I’m also known as the Rabbi’s (Janine Jankovitz’s) husband. Hopefully today I can espouse some knowledge to you all about artificial intelligence. I work at the Children’s Hospital of Philadelphia as a bioinformatics scientist. Essentially, in my job, I use computers to solve biological problems and math to solve biological problems.
So, you’re probably wondering: why would you need a computer to solve a biological problem? Well, a lot of times, patients come to clinical attention, and we obtain some kind of data from them. But those data sets are quite large. Think about if you had a very large Excel spreadsheet – you could not open it on your computer. It’s too big. It could be gigabytes, terabytes, petabytes in size, and have millions of columns and billions of rows. Because of that, any one person can’t possibly manually find out any information in those data. So, we require a computer for that; I need automation to be able to prioritize what I’m looking for. So that’s essentially what my role is. I’m an engineer and a scientist. I develop software, I use software, occasionally I use artificial intelligence-based software. That’s basically my expertise.
The point of today is to give you all the requisite background, such that in future sessions of Scientists in Synagogues, you can ask or answer more pointed questions. I want you all to continually think about how this ties into your Judaism, but it doesn’t have to. Maybe you can just use it as a learning platform.
Actually, the title “artificial intelligence” is a misnomer – I gave you some clickbait to bring you in here today. So, what’s the first thing that comes to mind when you hear the term “artificial intelligence”?
I always think of Rosie from The Jetsons. I don’t know why. Ever since I was a kid, that’s been my model of artificial intelligence. Rosie’s a maid, but she can transform into a broom, and a dust pan, and all kinds of other things. I’m going to provide a pretty vague definition of artificial intelligence, but I want to bring it in a little more to something that’s understandable.
So I like to define it as – and I wrote it down to make sure I don’t get this wrong – “the intelligence of any machine.” Too often, we try to define intelligence, and put it on a pedestal, and make it such a positive attribute. But try not to do that. Artificial intelligence is the intelligence of any machine. So that means it can reason. It doesn’t have to, but it can. It can act, it can adapt, but most commonly, it’s used for a specific problem. Rosie’s specific problem is to be a maid, or a car that can automatically drive, and so on and so forth.
It may come as a surprise to you that artificial intelligence is actually several decades old as a discipline. In 1956, at Dartmouth, there was a seminal meeting between mathematicians and scientists at a multi-week conference where they brainstormed on everything artificial intelligence. And so that event is often thought of as where the academic discipline of artificial intelligence came about, so that people like me could study it at a university.
I’m going to focus on a subfield of artificial intelligence today called machine learning. And the reason why is because that’s the one that you all predominantly come into contact with – ChatGPT, or Alexa, or Netflix recommending you the next video to watch, or Google, when you see results, and other recommendation systems. So that stems from this field here – machine learning.
What these two circles essentially mean is that all of machine learning is artificial intelligence, but not all artificial intelligence is machine learning. You can think of it as a machine that’s learning by itself – that sounds grand. It does this from data; you input data, and from those data, it will learn. Why does it do this? It can identify patterns in the data. We do this every day, right? We identify patterns in what we’re looking at. We learn from it, and then we take that knowledge and we apply it to something new. Students also do it every day, all the time – they go to class, they read books, they learn, they take that experience, and they translate it into expertise. And that’s what a machine learning algorithm or model or software does. I will use those three terms interchangeably.
So today we’re going to answer three questions from that little blue circle.
One, how do we communicate with machine learning models and algorithms? What do data look like?
Two is perhaps the most important one. And that is: how does a machine learning model learn from data? That’s going to be a high-level overview – I can’t teach calculus.
And the third one is: We will go through the whole process, similar to students – how does a machine learning model learn from data? When we give it new things, when we ask it to predict something new, how does it do that? What’s the workflow to that? And so today, you all will do that for me. You’re going to make a machine learning model on my computer today.
Let me delineate artificial intelligence from machine learning. So, machine learning is artificial intelligence, but to define artificial intelligence, let’s go back to Rosie. If I programmed Rosie explicitly to do absolutely everything – if I manually told her, “If you are asked to do something, do this thing” – and I, step by step, explicitly laid those processes out – she’s not learning from data. Those are explicit instructions. When you interface with Rosie, though, you would consider her to be artificial intelligence.
Pretend there’s a million of me that can computationally program every single scenario ever that you would think of. That would be called “hard-coding.” There are no patterns to be learned in that case. It’s just a response from a stimulus – essentially, from an input. And that could include an artificial-intelligence example that does not require machine learning. The explicit definition is that you have to give it solid data.
So, how do we communicate with a machine learning model? How do we communicate with software? What is data? What are those data? What do they look like? You and I are communicating with one another now. You’re listening to what I’m saying, and you’re seeing my gesticulations – not that you should learn those, but in that way, you can understand and learn from me. So, how do we give that same kind of information to a machine learning model?
On the left, that’s a Rottweiler, and on the right is a King Charles Spaniel. Those are two very different breeds of dogs. What are the features that distinguish a Rottweiler from a King Charles Spaniel. Hair? Spots? Different ears – floppy long ears, versus shorter ones?
So, essentially, a machine learning model does something no different than what you and I just did. We identified features that distinguish these two dog breeds from one another. So when we’re feeding data into a machine learning model, we can do the same thing. We can identify features that will differentiate different aspects of our data from one another, biased to that model. And I’ll show you what that looks like.
So you need some ability to be able to send external stimuli and then translate it back to data. Size will be the next column – large or small. Color – black, brown, yellow, and then weight – 100 lbs, 8 lbs, 75 lbs. This is just a subset of features – we’re not going to provide every exhaustive feature ever of a dog. It’s a very simplified, hypothetical example. So, similar to what we already did, we describe different features of the dogs.
Similarly, when you’re in a classroom setting and the teacher provides the answer to something so you can learn it later – we can do that with a machine learning model in a way. But here we have the answer. We know a Rottweiler is generally large in size, it’s usually black in color. And, it’s typically around 100 pounds in size. The chihuahua is all brown and 7 lbs. But it can be yellow. Now, you can’t really give a machine learning model the information in this way – you have to provide numbers, and we have to translate it in a way that it can understand.
You can split a color up into however many classes that it has. This is called one-hot-encoding, if you want to know the term. For example, you can change the “color” column to now have three different columns. You’ll have a color black, color brown, and color yellow. And so it just becomes zeros and ones again. Is this binary? The Rottweiler is black. The Chihuahua is not black. The lab is not black, and so on. So that’s one way you can do it. There are other ways, but that’s a pretty common way to do it. And so now, up until this point, you still have a matrix that’s nothing but ones and zeros, that I will show a much prettier version of second.
So the final product is more or less down here. It’s a much cleaner representation and a little bit more realistic as to what the machine learning model would take in. The input data to a machine learning model is typically a matrix, or like you would see an Excel spreadsheet that you work in, or that Richard probably presents at board meetings. Usually, it’s some kind of a matrix – columns that delineate the different features, and then numbers. That’s it.
So how does a machine learning model now take these data, which we have translated to numerical values, and learn from it, such that it can identify patterns from those data and use it to make predictions in the future or describe the data?
So, everyone’s favorite current 2023 topic? Housing prices. In general, the larger the house you have, ignoring any other features, the more expensive it is. Let’s say as you increase square footage, you increase the price of a house. You can go on to Zillow.com and you can look at your neighbor’s house and see how much it costs. It doesn’t even have to be on the market, and usually you’ll find some kind of an estimate – the “Zestimate”, or whatever they call it. And so how do they estimate that? You already have an intuition for it – square footage, but also, does it have a pool? Does it have central A/C? Zillow can estimate that data by using houses in the neighborhood or similar houses. It takes those data and extrapolates the pattern from it, so we can predict the price of that house. All it’s doing is identifying a trend in the input data.
So I showed you the matrix before. It’s not as intuitive to us how we would identify a pattern in a bunch of ones and zeros, so instead I have plotted here, on the X-axis, the is square footage, and on the Y-axis, the price of a house. Can a volunteer draw a line of best fit, and then tell me what that pattern means? You’re going to be machine learning. The line means that the average square foot of the house, like here, is this average amount of money, so as you increase the square footage, you’re increasing the price of the house. And a machine-learning model essentially aims to do something similar. You already have an intuition. It’s identifying trends, patterns, in the data.
One of the advantages to machine learning is you can have these thousands or millions or billions or trillions of features and identify patterns from them. I can’t draw a 10D graph here, but I can draw a 2D graph; it’s pretty intuitive. And that’s the crux of the machine learning model. It takes an input of data and it aims to identify a trend, a pattern, in the data.
The most common error in machine learning is overfitting, when some aspect of the data set causes the machine to start giving accurate predictions for the data you used to train it, but not for new data – so it won’t predict future values very well. If you have a graph with a trend line, if your new data set has a value way above it, the model is going to start severely underestimating, for example, our house prices – whereas if you take the original trend line, it’s going to be a lot closer. What this model is doing is that it’s learning too well, causing it to learn poorly in the future. At first you get the correct lower square footage with a lower price, but then it kind of eventually says something that’s a little less intuitive, removing other factors, so that a higher square footage house actually becomes pretty cheap. And so that’s under-bidding. At that point, it has a very high bias. So, let’s pretend you have a house that is 6,000 square feet. If you want to predict the price, it’s going to be biased downward, saying that it’s only about $200,000. That’s probably not right.
So that is a very important concept, and I want you all to think about it when you read your favorite media and hear about a machine learning models and ChatGPT. What are the data that it’s being trained on? And is it fitting to those data well enough such that if you ask it a new thing, it can predict it well enough?
But you do also need some semblance of bias. In machine learning, you have to have a “trail.” You have to be biased enough such that if you see a low square footage house, it’s a low price. But if you’re too biased, then you won’t be able to predict it in the future very well. Once you translate the data into a format the model can understand, it will find trends to fit those data. Just for now, realize that this is what it’s attempting to do: “finding a trend in the data such that I can use it for something else in the future.”
So now let’s go to the last question. And for that, I need a lot of volunteers. I need four of you to come up here, because we are going to build our own machine learning model. It’s a very simple model. We’re going to spell out “YMCA.” We’re going to train on each student doing a pose, a Y, an M, a C or an A, on my webcam. The model will learn what a Y looks like, what an M looks like, what a C looks like, what an A looks like, and then when someone will come up here and give another pose, and it’ll identify whether or not it’s a Y, an M, a C or an A. Not a very useful model, but what I’m attempting to show here is that we provide input data, it learns from those data, and then when we provide something new, it’ll identify that new thing.
When the identification fails, you can provide more data to the model. And that’s another important principle of machine learning. If you provide more data over time, that should increase its performance, or at least you hope it would. But you have to provide it in a way that’s accurate as well, so if a Y is not close to a Y, it’s not going to perform well on that Y.
In the previous question, I talked about bias invariance, and that’s very important. You have to be able to have input data that’s appropriate for each scenario, so it can learn appropriately for next time. But if we’re using machine learning models that are being used on people, think about the implications of that. You have to be able to have representation for multiple different groups, or else it’s not going to perform as well. What if you have someone who’s not quite physically able to do an M very well, but it’s close enough? You have to be able to consider that scenario as well. In these models that are being deployed all over the place, integrated into our phones, and increasingly used in our daily lives, it’s imperative that one thing you could do, as an audience, is ask for data transparency. What are these models being trained on? Whose data are they using? If they’re using your data, did you consent to that? And so on.
If it’s being only used on certain groups and in certain ways, you have to deal with potentially extreme bias invariance. Facial analysis currently struggles with problems concerning the training set. Think about the current wars – what kind of propaganda could be training the AI? Conversely, what about people who think anything is propaganda, when it’s potentially reality? We live in a world where things like faking a video can occur, so then you have skeptics the other way as well. So, it’s becoming very important that as a society that we start to really understand what data sets are being used in some of these operations. I showed very simple, non-impactful things, but I’m just trying to implicate the point to you that you can take any of this information, and still apply it, and be critical of some of these things that you’re using or reading about in your daily lives.
What you’re seeing here is a typical machine learning workflow. Initially, you translate the data into a language the model can understand – the matrix, the numbers, those are the things that it will take into the model. It then learns from the data and provides the best fit, or at least you hope it provides the best fit, and then you test that on new information. If one of the students came up here and did an “S” or something, it would probably fail on that. Before you ever show it to the world, you could go through this process ad infinitum, giving it new data, massaging the data, and so on and so forth. A lot of my job, I spend in this little circle here. It’s actually spent over on the left side (“data translated and input into model”).
In the previous example, we provided the answers in the data set – we gave the prices of the houses and the square footage. That way, when Zillow sees a new house, if you have the square footage and the price (in this example, if you had 1,000 sq. ft., it would be $100,000), if house is 2,000 square feet, it’ll price it at $200,000.
So, you’re wanting it to predict that part. Now, in this training data, again, we provided the answers in this column. We can also fit the data that do not have the answers, so that it’s learning from data, but we don’t know what the prices are. When we have the answers – the labels – that’s called supervised machine learning, because as a user, we’re telling it, “This is the answer. Learn that based on this.”
But there’s another type of machine learning called unsupervised machine learning. An example is clustering. What you see up here at the very top is a news article, and on the bottom, 1, 2, 3, 4, are recommendations you get because you read this article.
The top article is “Giant panda gives birth to rare twin cubs at Japan’s oldest zoo.” We know there’s not a person at Google somewhere reading every single article and putting it into a category. The AI can’t do that, either – it’s not saying this article is about a giant panda, right. It doesn’t have answers. Instead, it looks for patterns in the data another way – for example, words. “Giant panda” is in how many other articles?
Also, if you look at the list, all of the articles also mention Japan. But there’s a little bit of nuance, in that sometimes it can even infer patterns as well, because not every article here has the word “Japan” in it. The commonalities even get a little more specific – they’re all about Tokyo’s Ueno Zoo. It also has the ability to infer information as well. It can take a lot of these words that appear in these other articles, and it knows that it’s talking about the same thing based on those words. But it can also rope in another article when it changes from “Japan” to “Tokyo.” This is an example of unsupervised machine learning. All these recommendations come about because it’s just comparing each of the words to the article that you originally read.
It might even pull out an article that has nothing to do with it and could be 12 years old – for example, maybe another panda some other time in Japan gave birth to twin cubs. I don’t know how common this occurrence actually is, but the machine could do that. And it’s because of this example that it’s explicitly looking at the words in the articles, the title.
You can also see that Google appears to be inferring that Tokyo’s Ueno is “Japan’s oldest zoo.” That could be there’s an inference that it has in its systems already, or it could be doing semi-supervised learning targeting, or it could be both. Maybe someone else is including some of the features, such that it could also use some of those, but then it also was allowing it to kind of cluster on its own, as well. Machine learning is not intelligent enough to know that it’s getting conflicting information, and that’s a problem, too. It’s up to whomever is generating the data and translating the data to do so in a way that the machine can use.
With this approach to facts, how computers understand truth and posterity becomes a concern. When you have conflicting histories being entered into a computer database, how does it what’s right and what’s not? How are future generations going to know what exactly is true? So that’s the work the human has to do. The machine can only take you so far, until we get actual artificial intelligence.
It’s also about transparency in data. I actually want to challenge you all: you hear a common trope where lot of people say “I have nothing to hide.” I mean, everyone here probably has nothing to hide, or very little to hide, but also, if you’re using a social media website and so on and so forth, you’re the product, and so they’re using your data, and you can imagine some of the implications of that. And so, do you want someone like that telling you what a historical event was, or what an actual event was? It doesn’t matter to Facebook, unfortunately.
So, how is machine learning used in ChatGPT? ChatGPT is what’s called a large-language model. What it’s essentially doing is, if you think of it as a very elaborate chatbot, you can type in a question, and it’ll output an answer. You’re not talking to a sentient being. It’s nothing like that. All it’s trying to do is predict what you’re going to say next. This is called tokenizing. And that’s why I showed the clustering example; it’s taking some sets of the words on your screen and trying to infer what’s going to happen next. If I say, “I’m getting in my car to drive to the…” Most people will then say “store,” or something like that. It’s inferring that I’m getting into my car and driving somewhere and going somewhere, and that’s essentially part of what it’s doing.
Basically, ChatGPT is trained on, well, the internet. It’s taking input from things like Wikipedia and Reddit and other social media websites. We’ve already asked it a bunch of questions about things. Could you imagine talking with something that’s trained on some of those websites? There’s probably going to be some implicit bias, and it’s obviously going to get some things wrong. And it’s a large-language model, so it can’t do some things very well. It will fail at math. It will fail at some very basic things. You can gaslight it, too, if you want.
I showed a couple of different models, but I think ChatGPT has three models in it that use some supervised learning and some semi-supervised learning. And there were even some human volunteers – you should read about that on Wikipedia, by the way, and look that up – that sift through some of the more problematic content. So, if you try to ask it something way more problematic – and I’m not going to say it out loud because I don’t want to get in trouble – it will tell you that you can’t do that. But then, of course, you can gaslight it too, if you want.
Rabbi Janine Jankovitz tried to stump ChatGPT with a question about kitniyot: “Create a meal plan for Passover for a Jewish person from Spain who converted by Ashkenazi rabbi.” And the reason that she asked it this specifically is because Ashkenazi Jews (Jews from Eastern Europe) and Sephardic Jews (Jews from the Spanish regions) have different rules about what food is allowed on Passover. If you are Sephardic, you are more likely to have rice on your table on Passover, whereas an Ashkenazi person would not. And then you have to take into account that if you are a convert to Judaism, if you are from Spain, you would expect that the answer to be that you are Sephardic. However, if you convert with an Ashkenazi rabbi, converts always take on the culture of the community they convert with. So the question about kitniyot was supposed to be a trick question about whether or not the ChatGPT would offer a person rice to eat on Passover if they’re from Spain. And the answer is that they shouldn’t, because if they’re Ashkenazi, then they wouldn’t eat rice.
Here’s an example of where the question about human intelligence and machine learning arises. You might ask, “Wait a minute, quinoa, is that allowed?” And you might go either to Google and look up the Ashkenazi rules regarding quinoa on Passover, and the Ashkenazi rule is “You can eat it.” Or you could go and ask a knowledgeable Jew, maybe the Rabbi, whether or not this is correct. So, it can be a really helpful tool, but just like any tool on the Internet, you need to fact-check it. It’s not going to get everything 100% correct. And having knowledge – starting from a place of knowledge – you can use it to help you simplify things. So for example, if you asked ChatGPT for help with saying something in a tactful way, you would need a prompt to start with, and you would have the intelligence already to read it and say,”Yeah, this is tactful.”
I’m an older millennial. So for me, my first however-many years as a student, I had no internet. I had to go to the library, and use the card catalog, and so on and so forth. And then suddenly one day I had access to the internet and I could do whatever I wanted. It wasn’t that simple. But this is pretty similar. And right now, currently, it’s a tool. It has flaws and produces logical fallacies. It’s not perfect. But it’s likely that this is something that’s going to be integrated more and more into our daily lives – too late, it already is. Therefore, those who can use it now will probably have more of an advantage later.
One example of a use case I came up with was “if I wanted to gain muscle in weight training.” I work out, and know a decent bit about weightlifting and creating a nutritional plan. So I said, “Act as a personal trainer. I want you to provide a meal plan for someone to gain muscle and give me a specific meal plan for the whole week.” I gave it a bunch of other specifics. And what it did respond with was, rather than giving me a meal plan, it said, “Okay, I can do that. However, I need to ask you a few questions about yourself. I need to know how much you weigh, how often you work out, what’s your preferred method of working out…” It’s a little bit different, in that it has a higher specificity. Rather than you having to go to a search engine and keep clicking links and filtering out nonsense and noise, this is usually a little more specific. So you can interact with the AI rather than with a search engine.
So what you can do is it’s a little more tailored as a communication, as opposed to a Google search. Google search is just talking at Google, and then it’s not personalized, whereas this is more like a communicative tool. You can have a conversation with it. You can ask it about something. And then you can ask it to refine those results, and so on. So one thing you can do is use the “act as” strategy. If you’re a cantor, you could tell ChatGPT, “You’re a cantor, and this is your knowledge. This is what you know. I want you to provide me information about this particular subject, and only in this way. Don’t give any details. Just give me the answer.” And then it becomes an expert. So in a way, it can mimic the behaviors of the expertise of that profession to an extent.
So, hopefully today was informative for you all. And there are a lot of things that we did not cover, of course. The ethical implications are obvious. That’s what probably drove a lot of you here today. But how does anything in science, AI, tech, math, whatever – does that have any kind of an influence on your practice of Judaism? For me it does, because I use it, but for you all it might not. But think about some of those things.
The whole point of today was to at least give you enough background knowledge for if you want to read another article in the New York Times or somewhere else in the media that’s talking about ChatGPT, or ethical implications of artificial intelligence, or putting your kids’ pictures on the internet and how that affects you or how other people can search it – things like that. So try to weave in what happens in your life or what you read about, and think about how that affects your Jewish practice, if at all.
(This post is part of Sinai and Synapses’ project Scientists in Synagogues, a grass-roots program to offer Jews opportunities to explore the most interesting and pressing questions surrounding Judaism and science. Steven Pastor is a Bioinformatics Scientist at Children’s Hospital of Philadelphia and a PhD candidate at Drexel University; Jeanine Jankovitz is Rabbi at Congregation Beth El-Ner Tamid in Broomall, PA. This post summarizes a talk given at that congregation on October 30, to which they both contributed – in addition to some audience members with incisive questions).