Episode 6A: Groundhog's day, Dynamic Networks and Data's Holy Grail - Part 1
עודכן ב: 21 בנוב׳ 2021
What's the rumpus :) I'm Asaf Shapira and this is NETfrix.
In the previous episodes, most of the networks we've covered were static networks.
By static, I mean that we viewed only a snapshot of the network, meaning networks in a given time frame. But it is clear to all that networks are for the most part - dynamic.
So why deal with static networks, if they are dynamic?
That's a great question. Especially if you think about it, the potential applications of cracking network dynamics are endless, making it the "holy grail" in the data science world. Why's that?
In popular culture, data science is perceived to be the answer for everything. Maybe not now, but soon. Data will govern us and restore order to our chaotic world. Artificial Intelligence would drive our cars safely, doctors would be replaced by empathic androids and every day would be a good shuffle day on spotify. We believe all this would be achieved when data science would unravel the patterns in the data that we just know are out there. If we could only lay a hand on those elusive patterns, then we could also detect deviations from these patterns. So not only we would be able to better understand ourselves and our data.
We could foresee the future.
As we've mentioned before, our world is a networked world and much of our data is a network or can be understood as a network. Thus, a network model serves as a map of reality and can help us navigate in it. And just as reality is dynamic, so is the network.
And as we now all know, the network follows universal laws. So, if networks follow laws, and networks model reality, then reality follows laws too.
So, in order to crack open the secrets of reality, the universe and everything, we need to crack dynamic networks.
Just imagine the possible applications:
A pandemic is a dynamic network, in this case, a contact network. When Covid25 will hit, we would know where and how it would spread and so we could eradicate it early on.
In the field of cyber, if we could identify and label patterns and anomalies, we could eliminate any threats to our network.
By tracking social media, we could anticipate the next trend and make a fortune, hell we could take stock exchange data, model it as a network, and know when to go all in and when to take the money and run.
In neurology, if we recognize the patterns in the dynamic neural network of the brain, it will allow us not only to detect mental diseases but also to "read" the brain.
Think about it for a moment. By labeling the patterns in our brain's network we could read the mind.
In short – If we crack dynamic networks, then the sky is the limit.
No wonder the study of dynamic networks is a hot topic. So why did we bother with static networks at all?
Again, a great question, but first, for those who haven't heard episodes 2 to 5, I highly recommend going back to them. Also, I recommend checking your intuitions at the door and come with an open mind. I can tell from personal experience that all the intuition I had about data and networks were wrong, until I've realized how much I did not understand.
I have often warned that networks are not intuitive. In this episode we're going to take it to the extreme. Not necessarily because it's complicated, but because it challenges a lot of things we take for granted.
So, for those who stayed - respect.
Okay, so to answer the excellent question we asked earlier, we need to first define what a dynamic network is.
What makes the network dynamic is of course the concept of time.
For example, the consequences of time in networks can be reflected in new nodes that appear or disappear in our network, or in the creation of new edges or disappearance of old ones. Of course, the weights of the edges can also change over time.
These changes in the basic components from which the network is built change the entire network: the formation of new nodes can give rise to new communities and the disappearance of edges can dismantle existing ones. This way the entire network can expand or densify over time.
We've already mentioned in episode 2 the changes in the network topology when we discussed the "small world" paradox. It states that as the network gets bigger, its diameter tends to get smaller. But "time" is a dimension that we have not expanded on before, so we'll need to take a second to differentiate between two concepts of time on networks and their applications:
The first concept we'll discuss is "time as a feature", a label or an attribute of an edge or node. It means that in this case, time serves as just another metadata we have about the components in the network.
We use this feature constantly in network analysis, even if we aren't aware of it.
For example, when we conduct a network analysis in 2020 of an electricity-network in Scandinavia, we'll probably look at only the current (pun intended) nodes and edges, meaning nodes and edges that are time labeled as "2020".
The time attribute of the edges and nodes can is used to limit the data set. And indeed, many studies of what are supposed to be dynamic networks, often convert the dynamic network to a static network in a defined time period (time window or "snapshot").
Another application of the concept of "time as a feature" of nodes and edges is in the construction of the network itself. This means using time as an edge between nodes to convert our sequel dataset to a network.
In classic network datasets this is redundant. For example, in the case of an email network. What makes it a classic network dataset is the fact that the source and target are labeled clearly. In an email dataset, we have a clear email address as a source or sender and another email address as the target or recipient. Those addresses can be easily translated to two nodes with an edge between them.
But what about the neural network in the brain? At present, brain researchers have a limited capability to identify the target of a message sent from a neuron. In fact, they cannot even commit that it is a message. What they do know is that at some point Neuron X was active. They call this a message, because they've noticed that immediately after neuron X was active, neuron Y started to be active as well. That is, they found a time activity correlation that results from the assumption of causality:
if X is active and then Y is active, it means that X effects Y and they share a directed connection.
This method is called time-series correlation. We plot the activity of the various nodes on a graph, and if we find correlation between the activities, we connect those activities with an edge. The strength or weight of the edge can be the number of times the two nodes were correlated.
But why go to all that trouble of constructing a graph from our dataset? That's because when our dataset is a network, we can apply all the network's laws we've encountered. We can cluster it, find its centers of gravity and use them to tell a story about our data.
We can also use it to check our hypothesis: For example, if we're not sure about our correlation's threshold, we can define the weight of the edge as the strength of the correlation. This way we get a clearer and more robust view of how an edge should look like when we look at our newly created network as a whole.
This application can also help us to detect if there are unknown connections between the subjects of our research.
But be carful!
We should note that correlation is not causation: The fact that two events are adjacent, doesn't necessarily mean that one caused the other.
Though there's a neat correlation between the number of imported lemons from Mexico to the states and the rate of road accidents, I guess no one in their right mind would suggest that Mexican lemons cause road accidents, perhaps except Rush Limbo.
More about the subject of correlation & causation can be found in Judea Pearl's book "The Book of Why", whose only shortcoming is that its end is quite far from its beginning.
So, how can we know we're on to something when we look at our correlation network?
That's what I like about networks. They're part of the solution, and not the problem.
Instead of checking each correlation individually, we can make use of community detection, which we talked about in episode 5.
Since communities are homophilic by nature, we expect the nodes in the community to share the same characteristics. Nodes that their labels deviate from the shared interest of the community can be pointed at as anomalies and the correlations that put them there should be further investigated. To demonstrate it, let's use the stock exchange as an example. We can convert stock prices time series to a network by creating an edge between stocks that go up or down in the same time frame.
To detect anomalies, we'll apply community detection on our newly formed network, and let's say we've found a community of oil companies that includes a food manufacturing company. This looks like an anomaly, so we should look into it.
An in-depth investigation could reveal if it's just a random correlation between oil and that specific food company or maybe that the food company's consumers should be more aware of its ingredients.
So, the first concept of time we've covered is time as an attribute of nodes and edges.
The second and more challenging concept of time in networks, and the one we'll address in the rest of this episode, is time as a dimension of the network itself or the network dynamics.
There are many methods in network science to tackle network dynamics. What these methods try to achieve is a model that represents the dynamic reality while trading between simplicity on the one hand and accuracy on the other. This is much like the scaling of maps regarding the actual terrain.
Here we will just mention in-short the three most common methods and after that ...after that I promise some fireworks.
The first and probably most intuitive method or approach is the continuous method.
This method simply allows visual follow-up of changes in the network and it's up to the analyst to come up with insights. It's much like clicking the "play" button and see the network grow or shrink. On the one hand, this model doesn't miss a heart's beat of the network dynamics, making it very accurate. On the other, when there are many changes in the network, visualization has a hard time to highlight and track them.
The second method is the time windows method, in which we divide the dynamic network into static segments (or bins) of hours, days or weeks as we see fit. Then we compare between those static segments or windows to look for continuity or changes. This method "flattens" the dynamics in each frame causing it to miss some network dynamics on the way. But on the plus side, the static nature of each time window keeps the model simple and best of all, it allows us to apply the many algorithms from the field of static networks on these time windows.
The third method is the time series method. This method doesn't view the network as a whole, but rather as a collection of features and measures that change over time. The output of this method is graphs in the shape of time series, much like the output of a polygraph, whose needle moves with each change in the metric. This approach also requires segmenting the network into time windows or frames, but its advantage is that the output is simple to visualize – it's just a line in a graph. So, no matter how small or numerous those frames would be, anomalies or changes should be easily detected.
That's why this method is commonly used to detect anomalies or change points in the network, based on the metrics we chose to monitor in the graph.
The aspiration behind these methods is that by modeling reality, it should be easier to detect patterns, anomalies, and changes in the data.
There is only one problem with this approach, and in fact in many of the methods for identifying patterns or making predictions:
They're not so good at their job.
(I promised fireworks)
"Hold on, wait just a second, that's an extreme statement.
What do you mean they're not working? Can't we identify patterns in the data? What about all the promises about machine learning and AI that will identify patterns in data and give predictions? Forget promises - proven successes! If it's true, then why do we pay data scientists?"
Ok, so I admit - this is really a bit of an extreme statement. I promise to moderate it and clarify later-on the terms in which we can state this, but in the meantime, I'll take advantage of the momentum as long as you are roused up. The point is that the challenge these methods face is not in identifying the patterns in the routine.
The problem is in the routine.
Or in the words of the wonder child to Neo in the 1st Matrix movie: There is no routine.
And now enters the groundhog.
The movie "Groundhog's Day" was voted one of the ten greatest comedies of all times. The leading character, starring Bill Murray, is depicted as a rather disgusting man who has been cursed: he is doomed to get up every day on the same date and time and experience exactly the same 24 hours over and over again. This means meeting the same people, having the same conversations over and over at the same time. Forever. The movie, by the way, is much more interesting than it sounds.
So first all, the movie is highly recommended. But putting this aside, why does the movie's plot sound so imaginary? After all, aren't we all inflicted by the same curse we call routine? Seemingly, we all get up at the same time, do the same things, meet or talk to the same people, go home and go to sleep at the same time. Well, it might seem this way but it's not. I guess some of us have fixed appointments on their schedule. Mine, for example, was that every Friday I would stand on the same street corner at the same time waiting for the school bus to drop off my kid. It was the same place, same time. Every week.
One Friday, while I was waiting, as I always do, at the corner, I saw a friend of mine pass by. I was surprised to see him. He stopped and we talked a bit, and after he had left, I've started thinking: Why don't I always meet him here? And then I started thinking, why don't I recognize all the people who pass me by when I'm standing on the corner? After all, if everyone's life is routine and if I'm in the same place, at the same time, I should have been able to predict in advance the people I see or at least recognize them. I've recognized a few, meaning the salesman from the kiosk next by, or the beggar known by his nickname "Good Omens" from our street, but I haven't recognized the vast majority of pedestrians' faces though I should have because life is supposed to be routine made of boring patterns. Right?
To understand why I've failed to notice the same people on the street, we need to understand two things. The first is that they were probably not the same people. The second is we need to understand complex systems.
And there's no false advertising here. It is a bit complicated.
But we shall overcome (:
Dynamic networks are actually complex systems and therefore the challenge we confront is also complex. But what makes them so complicated? Just as the phrase goes "no one is an island", so are the nodes in the network connect or affect each other. Attempts to find a node's routine is like tracking the routine of a drop of water in a river. Like the drop of water, the node too is affected by its immediate surroundings which is also affected by the more distant parts of its environment and so on. As with water, these interactions create "ripples" in the network.
If this reminds you of the popular term "butterfly effect" from chaos theory, this is by no coincidence. The butterfly effect is a parable from the field of weather forecasting in which the flap of butterfly's wings on one side of the globe can result in a hurricane on the other side, referring to a chain of micro-events that produce a macro, non-linear phenomenon.
It is this effect that produces chaos in the network.
Chaos is a charged word. When you call something "chaotic", it comes with a long tail of meanings and implications, and it's one heavy tail... Perhaps this is why people in the field of network science don't tend to use this word but prefer the term "complex systems" which is a definition that is inclusive to the chaotic aspects of the system, but at the same time sounds less mystical.
So, let's humor them - what is a "complex system"?
A complex system is a " dynamic, non-linear and chaotic system ". What does it mean? So, let's unpack this sentence, and start with the word "system":
A system is a collection of components that interact with each other and form a whole that is greater than the sum of its parts.
A Nonlinear system is a system in which the interactions between the components increases the whole significantly.
And in "dynamic" we mean that the system changes over time.
So now, we're left with the word "chaotic".
Why is it that if the network is a complex system, it is also chaotic?
To understand why, we'll explain a bit about what a chaotic system is. We'll then demonstrate it on the field of meteorology or weather forecasting and lastly, we'll show how it relates to networks. So, let's begin by stating that a chaotic system has several characteristics or conditions that it must meet to be considered chaotic, and those are:
Sensitivity to initial conditions
It shouldn't act on random
It will usually be fractal
And it has strange attractors
And now, let's go through these conditions by using them in a sentence:
A chaotic system is a non-linear dynamic system that is sensitive to initial conditions. This means that even the slightest change in the early stages of the system will lead to very different results. This also applies to an intervention in the system. Even the slightest intervention could lead to unexpected results (or the "butterfly effect"). In addition, the system is deterministic, meaning that given the same initial conditions, we'd get 2 identical systems. And to emphasize: In order to get 2 identical systems, the systems need to meet the same initial conditions with absolute precision. Even if the initial conditions are very very very similar but not precisely the same, we'd get 2 different systems. Now, since we know our chaotic system is deterministic, then the dynamics in the system cannot be random and they must obey laws .This is a really non-trivial statement so we will dedicate a minute to it.
Take for example the number Pi. Pi is a special number obtained by dividing the circumference of a circle by its diameter. Pi starts with 3.14 and then an infinite sequence of digits, that is, it's not like the regular numbers we know and love.
The digits go on to infinity, with no pattern or repetition, but they are not random. That is, there is a mathematical law behind them, and that is the law of division. By the way, since the initial conditions will always be the same in the calculation of Pi - that is, the ratio will always be the same between the circumference and the diameter - we will always get the same result, that is, it's a deterministic system.
But pay attention that knowing the past, meaning the first few digits, doesn't help us to predict the future or the next digits.
In everyday life, when we say something is chaotic, it usually refers to something we believe acts in a randomly or disorderly fashion and doesn't obey any law, or simply put: our children.
By the way, the historical and linguistical reasons we use the word "chaos" the way we do is a fascinating story by itself, but it's a topic we'll unfortunately have to leave aside for now.
Anyhow, "chaos" in the scientific sense bears a different meaning. A chaotic system can be very orderly, just unexpected.
If we go back to our definition for chaotic systems, we're left with two unattended conditions. The first is "strange attractors" which we'll save for later. The second, which will address now, is that chaotic systems tend to be fractal by nature.
In fractal we mean that the components of the system share the same characteristics of the system-as-a-whole. This is also true to the sub-components of the components and so on. In any resolution we'll view the chaotic system, we'll encounter the same structure and the same rules.
Time for a quick recap 😊 So, a complex system is a dynamic system, sensitive to initial conditions, whose changes are non-linear. It doesn't act by random because it follows deterministic laws that are reflected in every resolution of the system, that is, the system is fractal.
Why is this definition important? It's important because if we define a system as chaotic, this definition carries with it a curse. And the curse is that predictions on chaotic systems only go so far or in other words, there's a "glass ceiling" to the ability of making predictions on the behavior of chaotic systems. For example, we can't predict what will be the million and one digit in Pi without solving the first million. BTW, there is some prediction that can be made about Pi but it's out of the scope of this discussion meaning it's too complicated for me to explain so here's a link. So, what can we talk about? We can talk about the weather.
The weather's eco-system is perhaps the most famous and researched chaotic system and equally famous is its limits on prediction. As we all know, wrong predictions about the weather were the bread & butter of comedy for more than a century, through Jerome K. Jerome in "Three men in a boat" to Steve Martin in "L.A. story". I haven't researched comedy, but I guess these days we find this kind of jokes less funny because weather forecasts did improve greatly since then. But still, the field's inherit limitations are still there thus making it a good example of the principles of chaos. Later-on we'll apply those same principals to network science. So, as we all know, weather is not random and it follows physical laws (things like "hot air rises", "cold air goes down" and so on), and these laws are in effect in every resolution we view the system, meaning the eco-system is fractal.
But knowing the laws still doesn't allow for a long-term prediction even though the weather is deterministic. Seemingly, a deterministic system should be predictable by its very nature, but this is true only when we possess full knowledge of its exact initial conditions.
The difficulty stems mainly from the inability to accurately reproduce the initial conditions of the system in favor of the prediction models. Even the slightest deviation could lead to wrong predictions since the effects on the system accumulate in a non-linear fashion (hence the "butterfly effect" parable). Because this level of accuracy is beyond us, there's a need for constant calibration of the parameters in the models used by the forecasters to "keep the system in check". Weather forecasting is an ancient problem that humans have invested heavily in over the years. But despite the great strides made in the numbers, resolution and sophistication of sensors, the vast amounts of data, the deep familiarity with the physical laws of the system and the vast knowledge acquired, it is still not possible to give accurate weather forecasts for more than a few days.
Because of the sensitivity needed, forecasters even estimated that even if we could monitor each molecule and feed its data to a model, and hope it will compute on time, we'd still hit a "glass ceiling" of a two weeks prediction, max.
Full disclosure - The presenter is not a forecaster, so I rely on Edward Lorenz, who is the father of chaos theory in the field of weather. His story and his discoveries are amazing but at the end of the day we are gathered here to talk about networks, so for those interested I refer you to James Gleick's book called "Chaos - New Science Created". I have to say though that while reading Gleick's book, I thought to myself that this is the least edited book I've ever read. This feeling has changed when I read his other books. But if you believe you can overcome this, then you're in for a mind-blowing experience. But what if Lorenz is outdated? Haven't we said that the field of meteorology has advanced greatly in the past years? We need something more up to date in order to convince ourselves, so here we'll turn to EMCWF which is the European Center for Medium-Range Weather Forecasts.
And they concur with Lorenz.
There are several reasons why I hold the EMCWF in high esteem. First all, they have a long acronym. Second, they have data. Lots of data. About 233 terabytes (TB) of data are collected by them every day. Also, the EMCWF uses billions of variables in its models. I don't use a billion variables, do you?
On top of this, their hurricane model has proven to be better than the American model. Which is a bit weird when you think about it, because there are no hurricanes in Europe. We will get back to EMCWF later in the context of machine learning but in the meantime, let's agree that you don't need a complex model to figure out that in wintertime the temperatures will be lower than in summer. It's a cyclic behavior of the system and we know it from kindergarten. So, isn't that a valid forecast? Am I suggesting that you were lied to as children? Don't be paranoid. Your kindergarten teacher can still be trusted.
The reason this is a legitimate prediction is because chaotic systems have a feature that we have neglected, and it is the "strange attractors". A strange attractor is the parameter which the chaotic system revolves around, sort of like a gravity field. This means that the system has boundaries, and they are the result of the attraction. The system can move around the attractor, but it does so in a chaotic way meaning it doesn't repeat itself or follow a pattern, but it also doesn't move in random. A random behavior is defined as a choice between pre-determined results, where every choice has an equal chance of being picked.
For example, the throw of a perfect dice is random. You can have 6 results, but you can never know in advance what the result would be. However, it can be said that after many throws, the average result will be 3.5 even though this is a result we will never get in a dice. In a way, this average is kind of like a strange attractor. It allows us to say that the results will be around 3.5, so that we can estimate the system boundaries to be around 1-6. But here it gets a bit more complicated, because in a chaotic system, there can be more than one strange attractor and the system can move between them unexpectedly.
Now let's separate the men from the boys and imagine we have a set of D&D dice which as every man knows, contains 6 dice:
A 4-sided dice, a 6-sided dice, an 8-sided dice, a 10-sided dice, a 12-sided dice (which for some reason, is never used), and a 20-sided dice.