Episode 6B: Groundhog's day, Dynamic Networks and Data's Holy Grail – Part 2
עודכן: 23 בפבר׳
What's the rumpus, :) I'm Asaf Shapira and this is NETfrix.
In the first part of this episode we've learned about methods to model dynamic networks and covered the basics of chaotic systems. We've shown how chaos limits our ability to predict the future because chaos knows no routine. In a chaotic system, what's happened in the past doesn't reflect on what would happen in the future. We've seen it in the weather's eco-system and now, I think we're ready to apply these principles on to Networks, and not jusy any ol' networks but Temporal Real-World Networks.
A temporal network is a network in which the nodes, edges or both have an expiration date, making the network dynamic. For example, in a phone network, the duration of the edge is the same as the duration of the call, and therefore we get edges appearing and disappearing thus creating network dynamics. Before trying to apply the chaos principals on networks, let's recall from part 1 what these principals are. So, in order to define a network as a chaotic system, we need it to meet the following criteria and they are:
Sensitivity to initial conditions
It shouldn't be random
It would usually be fractal
And it has strange attractors
In the context of determinism and sensitivity to initial conditions, it seems impossible to prove it. To test this, we will have to recreate the network from start to finish, beginning with the exact initial conditions we've started with and this is not feasible. For example, in the case of Facebook, how can we recreate exactly the same initial conditions that Mark Zuckerberg had in order to demonstrate that in the same situation the exact same network would have been created ..
. The absurdity of this claim is the basis of the movie we've mentioned in part 1 - "Groundhog's Day" - which depicts exactly such a scenario. A character who gets up every morning, at the exact same time, to the same initial conditions.
Though it's a really great movie, it can't be admitted as evidence so, we'll have to leave this question in the hands of philosophers and scholars of quantum theory in the hope that they'll like Bill Murray, and what's not to like?
As for the sensitivity to initial conditions, I am willing to go on a limb here and say that empirically, no two networks are identical (may be similar, but not identical), as even the smallest impact can significantly change the network. For example, even a single edge formed between distant nodes in a network will significantly change the structure of a network, as we have learned from the Small World Law.
So, if we agree that networks are sensitive to initial conditions, and leave determinism aside for a moment, then let's move on to the issue of randomness.
As we have already mentioned in episodes 2 and 3, networks are not random (as previously thought) but operate according to laws. We've mentioned the Power Law, of course, but also the congregations of nodes to communities, the small world law and other laws which I hope to share with you in future episodes. So we can conclude that networks are not random. Another feature we've mentioned in a chaotic system is fractalism. The chaotic system is supposed to be comprised of smaller components that share the characteristics of the system, at any resolution. Sounds familiar? In episode 5 we've compared the network communities to the structure of a cauliflower. Just as the structure of the cauliflower is fractal, that is, a cauliflower is made up of many small cauliflowers (and I urge you again to check it on your next visit to the vegetable section in the supermarket), so too the network consists of many small sub-graphs that are the communities. And the laws that govern the network, govern them as well.
And now, everything we've talked about in the last episodes comes together.
In the episode about the Power Law we talked about Barabashi, the famous network researcher, who discovered that real-world networks are scale-free. "Scale free" means that we'll see the same characteristics of the network in each scale we view it. And this is what fractals are all about:
The network is made up of communities which in turn are made up of smaller communities and so on. Both the network, the communities and the sub-communities maintain the Power Law. Each fractal of the network will have a few central nodes and a long tail of nodes with few edges.
So, we're left with the "strange attractors". The last feature we need to find in networks in order to define networks as chaotic. In Episode 2, on the subject of the small world phenomena, we talked about the basic structure of networks which is the connected components or "islands in the network". We've stated that a network is comprised of a Giant Connected Component, or GCC, and a long tail of small connected components. In the process of the network's formation, the GCC is quickly formed by the congregation of nodes and edges. After a while, the nodes might break away from it or return to it, but it will stay there. Its composition can change over time, but it's always there – pulling towards it the newly formed nodes and edges and even the renegade small connected components. And so, the GCC plays the part of the network's strange attractor.
Because the network is fractal, this phenomenon will also appear in the sub-components of the network meaning the communities. Communities are a result of nodes that pull each other closer because of homophilia, meaning, shared attributes. But as we know, not all the nodes play the same role in the network. Some have more appeal than others. These hubs are the nodes that score high in the centrality measures' score and they keep the community together, pulling nodes and edges toward them. So in each community, we have a strange attractor in the form of a hub or hubs, thanks to the Power Law. These hubs could change, the same way as strange attractors, but the new ones will keep the same role.
OK, I believe it's time for a quick summary:
Networks are, in fact, chaotic systems. A chaotic system has a scientific definition and networks meet its criteria.
Like in the parable about the butterfly wings, even small events in the network, such as petting a pangolin or eating bat soup, might start a chain reaction that will cause a non-linear change in the network. We've learned that chaotic systems are not random, and they follow laws and we've shown that this is also true for networks. Networks follow physical laws such as the Power Law and more. As with many chaotic systems, networks are also fractal, meaning they are composed of sub-components that share the same attributes as the system or the network as a whole. And last, as in chaotic systems, networks also have strange attractors, mainly network hubs, that act as centers of gravity in the network. They keep the network from falling apart and produce boundaries, which in our case, keep in check the components or communities.
OK, so let's say we've established that networks are chaotic. So, what's the bottom line? What does it mean that a real network is chaotic? It means that by the very definition of the network as chaotic, then like weather forecasts, here too there is a glass ceiling for the ability to make a prediction on the network or find its routine or patterns. (Mic drop)
"Hold on – wait a second!
What are we talking about here? Does this mean that there is no routine in networks? No patterns? How can it be? I didn't sign up for this!"
Hey now, let's take a minute to cool off here, although it should come as no surprise that like many other insights about networks, this insight is clearly not intuitive. To compensate those who made a free choice to stay and read, let's go back and huddle for a moment around the warm fireplace of intuition.
Routine is an intuitive concept: people get up in the morning, some brush their teeth, they go to work, meet the same people in the office, talk to the same people, go home and go to bed. What is it if not a routine or pattern?
After all, in practice, we can give a pretty good prediction that whoever leaves the house in the morning is probably commuting to the same work place, especially if that is what this person had done the day before and the day before that. Seemingly, it's a classic case where knowing the past does help us to predict the future, meaning "past" = "future".
Sometimes we even feel so buried in this routine, that once a year we feel we have to go on vacation in order to break it.
And let's add to this intuition that though the weather is a complex or chaotic system, there are still people whose profession is to forecast the weather, and they do a pretty good job. So, how come we can't predict or find patterns in weather or networks?
What's happened to the "yes we can" attitude? In order to resolve the contradiction, we need to differentiate between the behavior of the system in the micro level and the macro level. On the macro level, the weather indeed acts in predictive cycles that borders it: The seasons, like day and night, are cyclic and less chaotic. The same is with networks. On a macro level, the network too tends to be cyclic or seasonal. For example, we expect to see more activity in the network during the day than at night. But the smaller components in the network will act in a chaotic manner.
To explain it, we'll quote Heraclitus, the Greek philosopher who said, "You cannot step into the same river twice."
When you think about it, it sorts of make sense because the water is different, the fish moved on etc., but on the other hand, it's also clear to all of us that it's still the same river.
The "fish" and "water" in this example are the nodes and edges in the network. They are the micro image of the network and they act in a chaotic fashion.
Unlike these small components, the "river" symbolizes the network on a macro level, and it does behave in a cyclic manner and so we can make good predictions about it.
The problem with predictions based on cyclic behavior is that they tend to be trivial: people usually do get up in the morning for work and return in the evening and in the case of holidays, they are less likely to go to work. We all know this.
But what is less trivial is that while people are at work for example, their interactions differ from day to day. They might interact with new nodes and might let some old edges drop. It sounds unintuitive but it can be tested empirically: If you'll empirically test a variety of temporal networks by comparing different time windows in each network, you'll find that the percentage of overlap of nodes and edges between the different time windows decreases significantly over time.
This is super important because this means that for the most part, there's no pattern or routine regarding the components of the network in the different time windows. If there was, we should have gotten a big overlap, because routine should have preserved the same nodes and edges. Please don't hold me accountable if I get the exact percentage wrong, but if, for example, we'll compare between two time windows of the same network, say, the network on a Monday and a network on a Tuesday, we might find an overlap of about ~40% of the nodes and about ~20% of the edges. Roughly.
The immediate implication is that most of the network has changed, not only in its composition (meaning the nodes) but also in its structure (meaning the edges). A change in the order of ~ 80%! That is, even though some of the nodes remained in the network, the structure of the network has been overwhelmingly changed because of the changes in edges or connections between the nodes. And over time, the changes in the network will only get bigger and bigger. But maybe comparing continuous time windows is like comparing apples to oranges?
On a Monday everyone returns to their workplace and do some catching up and on Tuesday it's back to business?
So even if we compare Mondays to Mondays, or entire weeks to entire weeks, and so on, we will still get a similar result – a big percentage of changes in the network which only gets bigger. So, a dynamic network is not defined by routine. It's defined by constant change. Networks are chaos. Sounds weird?
When I feel weird, I turn to Milgram. Milgram, we'll recall, was a famous social psychologist, that did all kinds of weird and groundbreaking experiments. Some we've mentioned in previous episodes. The one we'll focus on is the experiment he conducted in the 1970s called "the familiar stranger."
Milgram defined familiar strangers as people we meet regularly and recognize their face, but we've never held with them a conversation.
Milgram conducted the experiment at a train station. He took a picture of the people waiting at the station and showed it to them, asking them to mark the ones they recognize from the previous times they'd waited in the station, but who they had never spoken to, that is, familiar strangers. He pointed out that about 90% of the people identified at least one person. Pretty amazing stuff.
We had already learned to be suspicious of Milgram's findings but luckily for us, a similar experiment, conducted in 2004, yielded fairly similar results, in which about 78% of those surveyed identified at least one person.
Although you might have guessed it by now, I feel it's necessary to mention that the results of the experiments were of course Power Law distributed. There were a few people who recognized many faces, and the majority recognized only one person.
The interesting part of the story is that the authors of the experiments chose to see the glass half full. They were astonished to find that a fairly large percentage had identified at least one person and thus confirmed the concept of "familiar strangers".
But a more sober look at the experiment will find that actually, and probably contrary to intuition, most people do not know most of the people around them. Moreover, we can see it happens even in places that are allegedly characterized by routine (like public transportation stations). So to complete the story from part 1, here probably lies the reason why I've recognized only two people on the street while waiting for my kid's school bus, even though I've stood in the same place at the same time every week. If we go back to the river quote, then the "fish" and the "water" (the nodes and edges, or in the case of Milgram, the people at the train station) don't preserve a routine but on the contrary, most of them are pawns in a chaotic system.
This phenomenon or the lack of routine can be found in several academic papers that mention that a significant portion of the nodes and edges disappear and appear between time windows. But most of these papers dismiss these findings by attributing these changes in the network to the "noise" or "instability" of the network or the algorithm they've used.
That is why, in my opinion, this important finding has passed a bit under the data science radar.
This is not surprising because we've already discovered in episode 2 that scientists prefer their cup half full. Sometimes great discoveries can be found ironically in the "noise" of the experiment, meaning in the results we don't care for. In this case, the noise is the dominant and is also chaotic. But in contrast, if we look at our river on a macro level, we'll see a much more stable picture. If we monitor the network's activity, we'll usually see an obvious pattern of high activity in the daytime and a decrease in activity in the nighttime. It's a cyclic behavior we can predict, much like saying it's hotter in the summertime than in winter.
And in contrast to the small amount of papers that mention the noise in the data, on the subject of these cyclic patterns we can find lots of academic papers.
So, to sum this up: Remember when we talked about our subjective feeling of living in a suffocating routine? So much so that once in a while we need a vacation to break the pattern?
So, contrary to our intuition. when we take a good look at the fine details of our daily activity, they differ from day to day. It stands out when we compare it to "Groundhog's day's"' plot, where every day is a perfect copy of the day before. Ironically, a broader search for patterns in our life will reveal that it's exactly our annual vacation to escape routine that will come up as our annual routine and an obvious pattern. The realization that networks are chaotic, carries with it a long tail of implications. Some of them are critical to machine learning. So, let's into machine learning and chaos.
When passing criticism, a common way to go about it is by applying the "shoot sandwich" technique. You start and end with pleasantries but in the middle, well, you can guess what you can stuff there…. So, the same goes here to machine learning and chaos. And let's start with the pleasantries: Machine learning is great when we deal with data that its essence is static by nature. When I say static I mean that "past data = future data" or that what's true in the past, will also be true in the future. The reason for it is that many machine learning algorithms need a ground truth, that is, examples from the past, and enough of them, to see if they got their answers right. For example, the ability to identify cats:
A cat is a cat in the past and in the future. There are many kinds of cats and many angles in which to look at them, but without sounding too philosophical, a cat's essence is static by nature. It means a cat would look tomorrow the same as it looks today. There's no non-linear change in the cat on a day to day basis, unless we decapitate it, which we shouldn't do no matter what we're trying to prove. In order to label cats, machine learning requires training, which requires time and a lot of data (for example, millions of pictures of cats which serves as the ground truth).
Therefore, a large enough dataset of cats' pictures from the past, will enable machine learning to label cats in the future.
But the problem with applying machine learning on a dynamic network is that networks are chaotic. The past doesn't help much to predict the future. Especially when we can't produce a large enough dataset that will cover all the possibilities of anomalies or possible changes that have been and will be. When you think about it, will we ever have such a dataset? After all, the very essence of change is that it is different from what happened in the past. So how can we find something that has not happened before, based on the past?
For a quick sanity check, let's go back to my favorite weather research institute, the EMCWF, and discuss the experiment they conducted in the field of predictive machine learning on a chaotic system.
The experiment included several machine learning methods in order to predict a relatively simple chaotic system in the field of weather.
The output of the experiment was up-and-down graphs that followed the system's equations. One of the graphs represented the ground truth, that is, the real behavior of the chaotic system, and at the same time, the other graphs represented the predictions made by the machine learning's models. When the graphs started to run, it first looked promising - the machine learning graphs and the ground-truth graph went up and down together showing that the machine learning models gave good predictions in the first phase. But from then on, any correlation between the graphs had vanished. When the ground-truth graph went down the others went up and so on. The chaotic system quickly turned to a whole new way and the machine learning algorithms lost track of it, in a short span of time.
In the paper describing the experiment, there is also a lengthy discussion about the data on which the machine is supposed to be trained. How, for example, can a machine predict an ice age in the summer, if this has never happened? But, as promised, we'll seal our sandwich on a good note, by saying that machine learning can probably be used to make forecast models more efficient, if not more accurate. Since these models became more and more complex over the years, I'm sure they would appreciate the help. So what about predictions regarding the smaller components in networks? Should we give up? Never! But we need to adjust our aims. If we aim a bit lower, then we might find some good predictions we can do on the dynamics of nodes and edges. The most famous is perhaps link prediction. In link prediction we attempt to predict edges that will connect previously unconnected nodes. We can find applications for such algorithms in recommendation engines, such as, for example, suggestions of potential friends or products to on-line users.
A simple example of such a prediction is a network that contains only three nodes, in which only one node is connected to the other two. In such a simple network, it can be reasonable to assume that the two unconnected nodes will eventually connect, completing a triangle, thus following the logic that a good friend of my good friend will probably end up as my friend too.
Network laws, like the Communities Law, can help us in these kinds of predictions. The law states that the strongest connections will be within the community. This happens as the result of the homophilic trait of communities or mor simply put "Birds of a feather - flock together.". So, it can be deduced from it that links will usually be created between nodes that share the same friends or share the same community. But in any network, we'll find exceptional edges, meaning those weak connections that connect communities and create the big changes in network topology. Those are the edges that create the “small world” effect in networks, and those links are probably almost impossible to predict though they are the most influential on the network's topology as a whole. For example, let's say we monitor a social network. A casual meeting of two people on the street, which we do not monitor, can result in a friendship between two users in our monitored network that we couldn't have predicted. The reason we're upset is because this newborn link changed our network significantly and might start a chain reaction that could lead to even more changes we cannot predict. So maybe the solution should be to increase monitoring. Maybe if we add more dimensions to our monitored network we'll be closer to depict reality. We can add not only the edges on our social network but add edges from other networks we surveil and add more labels to the nodes, like hobbies and place of residence. This way, we convert our prediction problem to a classification problem, meaning we ask the algorithm to find similar nodes even if they do not share a connection. The conversion of the problem from prediction to classification makes it applicable to machine learning which thrives on classification. What we'll get as a client of such an engine is something like: Users in Amazon that are similar to you in a way, tend to buy product X and therefore we will recommend it to you.
But since we cannot monitor everything (and if you don't believe me, ask Alexa), then it would probably never be perfect, but maybe it's good enough, mainly because in most cases the price of a mistake is low.
In weather forecasting, the price of a mistake can be higher, and though a lot of data is gathered all the time from a lot of sensors that are getting better over time, we still can only give good short-term predictions, but the glass ceiling is still there. So, where's the magic that will break it? As for now, no such thing. At least nothing that's perfect and this brings us to network models. Full disclosure – I don't like network models.
I like it when things are absolute and crystal clear and a taking on a model's approach condemns us to a life sentence of words like "approximately". But in the field of physics, "approximately" is the equivalent to a gold medal, so I was told. Or as the famous quote goes: "All models are wrong, but some are useful". This approach is expressed, for example, by Barabashi (the famous network researcher we have talked about many times) and Dr. Baruch Barzel, from Bar Ilan University. They try to model the flow in dynamic networks while taking into account the topology of the network, which is usually more static by nature. This notion is very similar to navigating apps that try to predict the best route by comparing the static roads to the dynamic traffic.
So, is that it? Should we just abandon our holy grail's crusade?
So first all, I'm against giving up so quickly.
Besides, understanding the network's laws allows us to better understand the past and present and this is already a significant achievement. We should also recall that network science is an emerging field so there's plenty of room for new and exciting discoveries. And here's just two leads that might help us in our quest. The first lead is communities. When we talked about the network as a river, we've mentioned the water and the river itself, but we didn't mention the currents. In this analogy, the communities are the currents in the river. Though they may change, they are more stable than the ever-changing details, and might give us predictions on a better resolution than the river-as-a-whole can give us.
The study of community detection in dynamic networks is ongoing and though it had its ups and downs, it might have the potential to make a meaningful breakthrough in this field.
The second lead calls for some inter-disciplinary study. This means that maybe, if we won't shy from a bit of chaos, we can apply chaos theory on networks in a more inter-disciplinary fashion than we do at present.
Chaos theory has applications in the real world. We won't detail them all but give as an example the study of fluid dynamics, or more precisely – chaotic mixing.
If we take a bucket full of white paint, and pour a drop of red paint into it, the process of diffusion will make the red paint disperse evenly in the bucket and we'll eventually get an even color of pink.
But doing it this way is as slow and boring as watching paint dry.
We can be more proactive and mix the bucket in a uniform motion to get a faster dispersion.
But if we apply chaotic mixing, the color will disperse even faster.
So maybe it is possible to make a projection from the field of chaos to the field of networks? For example, in diffusing messages in a network.
If we would upload our message to the network and wait for it to disperse organically to the entire network, we would probably have to wait for quite a long time.
If we would convey the message randomly, the chances for a successful dispersion are also not so great.
But if we would convey the message taking advantage of the network's strange attractors, meaning its centers of gravity, our chances should be much higher. I hope we'll see more collaborations between these fields, but as we've already mentioned, the word "chaos" has given way in academia to the concept of a complex system, which is a bit of a shame, because between us, who wants to hear about something complex?
Did you enjoy, and want to share? Have you suffered and you do not want to suffer alone?
#Network_Science #SNA #Social_Network Analysis #Graph_Theory #Data_Science #Social_Physics #Computer_Science #Statistics #Mathematics #Social_Science #Physics #Facebook #Podcast #chaos #complex_system #connectome #machine_learning
 For example in (Sekara, Stopczynski, & Lehmann, 2016) they point to such "attractors" (called "cores") in a proximity network that allow for better tracking of the network.
 For an example of the difficulty in applying similar logic to the detection of changes see (Fish & Caceres, 2017).