To be honest, networks are essentially can be represented as graphs, almost any relationships can be represented as graph. Do not also forget that network theory is a part of graph theory where nodes have attributes.
Today we will try to understand whether we are in information bubble created by the same authors of different channels, building a graph for visualization.
In our project will be used telethon 1 Python 3 module for Telegram interaction, NetworkX 2 for graph and Plantuml 3 for visualization.
First thing we need is to understand which type of graph we must use: NetworkX provides several such types 4.
NetworkX class | Type | Self-loops allowed |
Parallel edges allowed |
---|---|---|---|
Graph | undirected | ✔️ | 🚫 |
DiGraph | directed | ✔️ | 🚫 |
MultiGraph | undirected | ✔️ | ✔️ |
MultiDiGraph | directed | ✔️ | ✔️ |
The quote 5 from Wikipedia about self-loops:
In graph theory, a loop (also called a self-loop or a “buckle”) is an edge that connects a vertex to itself.
The quote 6 from Wikipedia about parallel edges:
In graph theory, multiple edges (also called parallel edges or a multi-edge), are two or more edges that are incident to the same two vertices.
Apparently, the beginning of our graph will be a user itself (let’s call it “root node”) and after that graph will be built around this node. No parallel edges are expected and graph could be unidirected, thus we can pick a simple Graph type.
To make our graph bigger, I’ll subscribe/join several random channels grouped by topic found on tgstat.com 7. But the main idea is to connect channels and chats to each other by common contacts which we can find in channel’s and chat’s information. For example, here is the Reddit channel information:
Usernames starting with @ will represent other nodes on our graph and they can be connected with multiple channels and chats. As a result, we got a similar graph:
Before run the following code, we need to get API ID and API hash 8. Start from our root node named by first name from Telegram client’s profile with defined attribute color:
#!/usr/bin/env python3
from telethon import TelegramClient, sync
import os
import networkx as nx
# Main configuration
config = {
'telegram': {
'api_id': '123456', # Client's API ID
'api_hash': 'bdaf62f128aeaa9a65b67a479d9ff413', # Client's API hash
'phone_id': '+31201234567' # Client's phone
}
}
if __name__ == '__main__':
# Create Client and sign in
client = TelegramClient(os.path.basename(__file__),
config['telegram']['api_id'],
config['telegram']['api_hash'])
# Create Graph
graph = nx.Graph()
# Start Telegram client
client.start(config['telegram']['phone_id'])
client_name = client.get_me().first_name
# Add Client as root node
graph.add_node(client_name, color='gold')
# Print all nodes from graph with their attributes
print(graph.nodes(data=True))
Executing it:
~> python3 telegram_graph.py
[('Username', {'color': 'gold'})]
Now is the time to collect all the chats and channels for our client. We could do that using client.get_dialogs() 9 function. Below I’m filtering the items only to Reddit channel as an example:
#!/usr/bin/env python3
from telethon import TelegramClient, sync
import os
import networkx as nx
# Main configuration
config = {
'telegram': {
'api_id': '123456', # Client's API ID
'api_hash': 'bdaf62f128aeaa9a65b67a479d9ff413', # Client's API hash
'phone_id': '+31201234567' # Client's phone
}
}
if __name__ == '__main__':
# Create Client and sign in
client = TelegramClient(os.path.basename(__file__),
config['telegram']['api_id'],
config['telegram']['api_hash'])
# Create Graph
graph = nx.Graph()
# Start Telegram client
client.start(config['telegram']['phone_id'])
client_name = client.get_me().first_name
# Get all dialogs for current client and filter them on the Reddit channel
reddit_channel = [dialog for dialog in client.get_dialogs() if dialog.name == 'Reddit']
# Add Client as root node
graph.add_node(client_name, color='gold')
# Print all nodes from graph with their attributes
print(graph.nodes(data=True))
# Print reddit_channel list containing only one element
print(*reddit_channel)
Output:
[('Username', {'color': 'gold'})]
Dialog(name='Reddit', date=datetime.datetime(2020, 4, 8, 19, 0, 1, tzinfo=datetime.timezone.utc), draft=<telethon.tl.custom.draft.Draft object at 0x7f7ebcc0dcd0>, message=Message(id=8856, to_id=PeerChannel(channel_id=1236920376), date=datetime.datetime(2020, 4, 8, 19, 0, 1, tzinfo=datetime.timezone.utc), message='r/ #funny\nСпортзал закрыт? Импровизируй, адаптируйся!', out=False, mentioned=False, media_unread=False, silent=True, post=True, from_scheduled=False, legacy=False, from_id=None, fwd_from=None, via_bot_id=None, reply_to_msg_id=None, media=MessageMediaDocument(document=Document(id=5220052781397706607, access_hash=5929123456789142977, file_reference=b'\x02I\xb9\xe88\x00\x00"\x98^\x8e3j\xc7F\x00iQ\xdc\xe6\xeb~\x02nR\xf2\xde\xf0(', date=datetime.datetime(2020, 4, 8, 18, 0, 27, tzinfo=datetime.timezone.utc), mime_type='video/mp4', size=666658, dc_id=2, attributes=[DocumentAttributeVideo(duration=11, w=640, h=800, round_message=False, supports_streaming=True), DocumentAttributeFilename(file_name='r_gifs.mp4'), DocumentAttributeAnimated()], thumbs=[PhotoStrippedSize(type='i', bytes=b'\x01( xC\x96\xe3\xbd5\xd7\x08\xdcv\xab\x8a~f\xfa\xd3.?\xd51>\xdf\xce\x86\x08\xca\x05\xb7\xe0\x83\x8f\xa5H\x05-*\xf4\xac\xca!\x8e\xf6v\x97\x1b\xfe\xf5K$\xb2I\x13\x06rT\xf5\xaa\x11\x1d\xb2\x02jl\x85b\xbb\xb0\xc4\xfeub\x13\xccd\x18\x1d)\xbel\xa4`\x90A\xf4\xa6\xf9\xb9\xedMn{\x0f\xc2\x80\x1e",\xa1\x87z\x9bnB\x96_\x99z\x11\xde\x8a*[-"%\x8b#\x9a_$\x03\xedE\x15<\xcc,\x7f'), PhotoSize(type='m', location=FileLocationToBeDeprecated(volume_id=200020400599, local_id=30505), w=256, h=320, size=15312)]), ttl_seconds=None), reply_markup=None, entities=[MessageEntityHashtag(offset=3, length=6)], views=25322, edit_date=None, post_author=None, grouped_id=None), entity=Channel(id=1236920376, title='Reddit', photo=ChatPhoto(photo_small=FileLocationToBeDeprecated(volume_id=247538747, local_id=28490), photo_big=FileLocationToBeDeprecated(volume_id=247538747, local_id=28492), dc_id=2), date=datetime.datetime(2019, 8, 7, 19, 24, 51, tzinfo=datetime.timezone.utc), version=0, creator=False, left=False, broadcast=True, verified=False, megagroup=False, restricted=False, signatures=False, min=False, scam=False, has_link=False, has_geo=False, access_hash=643357336673217314, username='Reddit', restriction_reason=None, admin_rights=None, banned_rights=None, default_banned_rights=None, participants_count=214146))
But wait, there is nothing in the output about the contacts we saw above. Another method GetFullChannelRequest 10 can help us solve this:
#!/usr/bin/env python3
from telethon import TelegramClient, sync
from telethon.tl.types import Channel
from telethon.tl.functions.channels import GetFullChannelRequest
import os
import networkx as nx
# Main configuration
config = {
'telegram': {
'api_id': '123456', # Client's API ID
'api_hash': 'bdaf62f128aeaa9a65b67a479d9ff413', # Client's API hash
'phone_id': '+31201234567' # Client's phone
}
}
if __name__ == '__main__':
# Create Client and sign in
client = TelegramClient(os.path.basename(__file__),
config['telegram']['api_id'],
config['telegram']['api_hash'])
# Create Graph
graph = nx.Graph()
# Start Telegram client
client.start(config['telegram']['phone_id'])
client_name = client.get_me().first_name
# Get all dialogs for current client and filter them on the Reddit channel, get its full info
reddit_channel = [dialog for dialog in client.get_dialogs() if dialog.name == 'Reddit']
reddit_channel_fullinfo = client(GetFullChannelRequest(channel=reddit_channel[0]))
# Add Client as root node
graph.add_node(client_name, color='gold')
# Print all nodes from graph with their attributes
print(graph.nodes(data=True))
# Print reddit_channel_fullinfo
print(reddit_channel_fullinfo)
Output:
[('Username', {'color': 'gold'})]
ChatFull(full_chat=ChannelFull(id=1236920376, about='По рекламе @lulzzsecurity \nПрислать новость @redditgbot', read_inbox_max_id=8857, read_outbox_max_id=0, unread_count=0, chat_photo=Photo(id=515918679006882259, access_hash=4998218436240959217, file_reference=b'\x00^\x8eP\x91\xce\xd7jl\xe5f\xb7\xa7H\x05\x0b\x01\xea9\x88#', date=datetime.datetime(2019, 12, 21, 13, 31, 49, tzinfo=datetime.timezone.utc), sizes=[PhotoSize(type='a', location=FileLocationToBeDeprecated(volume_id=247538747, local_id=28490), w=160, h=160, size=10863), PhotoSize(type='b', location=FileLocationToBeDeprecated(volume_id=247538747, local_id=28491), w=320, h=320, size=25424), PhotoSize(type='c', location=FileLocationToBeDeprecated(volume_id=247538747, local_id=28492), w=640, h=640, size=52851)], dc_id=2, has_stickers=False), notify_settings=PeerNotifySettings(show_previews=None, silent=None, mute_until=datetime.datetime(2038, 1, 19, 3, 14, 7, tzinfo=datetime.timezone.utc), sound=None), exported_invite=ChatInviteEmpty(), bot_info=[], pts=44738, can_view_participants=False, can_set_username=False, can_set_stickers=False, hidden_prehistory=False, can_view_stats=False, can_set_location=False, participants_count=214142, admins_count=None, kicked_count=None, banned_count=None, online_count=None, migrated_from_chat_id=None, migrated_from_max_id=None, pinned_msg_id=None, stickerset=None, available_min_id=None, folder_id=1, linked_chat_id=None, location=None), chats=[Channel(id=1236920376, title='Reddit', photo=ChatPhoto(photo_small=FileLocationToBeDeprecated(volume_id=247538747, local_id=28490), photo_big=FileLocationToBeDeprecated(volume_id=247538747, local_id=28492), dc_id=2), date=datetime.datetime(2019, 8, 7, 19, 24, 51, tzinfo=datetime.timezone.utc), version=0, creator=False, left=False, broadcast=True, verified=False, megagroup=False, restricted=False, signatures=False, min=False, scam=False, has_link=False, has_geo=False, access_hash=643357336673217314, username='Reddit', restriction_reason=None, admin_rights=None, banned_rights=None, default_banned_rights=None, participants_count=None)], users=[])
Now the contacts are in place, we are able to create new nodes from chats, channels and contacts from full info, and then link them by edges. Every node type will have its own color attribute (you remember that, right?):
#!/usr/bin/env python3
from telethon import TelegramClient, sync
from telethon.tl.types import Channel
from telethon.tl.functions.channels import GetFullChannelRequest
from textwrap import wrap
import os
import re
import networkx as nx
# Main configuration
config = {
'telegram': {
'api_id': '123456', # Client's API ID
'api_hash': 'bdaf62f128aeaa9a65b67a479d9ff413', # Client's API hash
'phone_id': '+31201234567' # Client's phone
}
}
if __name__ == '__main__':
# Create Client object and sign in
client = TelegramClient(os.path.basename(__file__),
config['telegram']['api_id'],
config['telegram']['api_hash'])
# Create Graph object
graph = nx.Graph()
client.start(config['telegram']['phone_id'])
client_name = client.get_me().first_name
# Add Client as root node
graph.add_node(client_name, color='gold')#config['graph']['color']['client'])
# For each not ignored channel in list of dialogs
for channel in [dialog.entity for dialog in client.get_dialogs()
if isinstance(dialog.entity, Channel) and
dialog.entity.id not in config['graph']['channels_ignore']]:
# Get full information for a channel and word wrap its name
channel_full_info = client(GetFullChannelRequest(channel=channel))
channel_name = '\\n'.join(wrap(channel.title, config['graph']['wordwrap_length']))
# Add channel ID as node with attributes 'title' and 'color', link it to the root node
graph.add_node(channel.id, title=channel_name, color=config['graph']['color']['channel'])
graph.add_edge(client_name, channel.id)
# For each contact in full information
for contact_name in re.findall("@([A-z0-9_]{1,100})", channel_full_info.full_chat.about):
# Add contact as node with attribute and link to the channel node
graph.add_node(contact_name, color=config['graph']['color']['user'])
graph.add_edge(contact_name, channel.id)
# Print all nodes from graph with their attributes
print(graph.nodes(data=True))
Below you can find the final script 11 that builds our Plantuml file.
#!/usr/bin/env python3
from telethon import TelegramClient, sync
from telethon.tl.types import Channel
from telethon.tl.functions.channels import GetFullChannelRequest
from textwrap import wrap
import os
import re
import networkx as nx
# Main configuration
config = {
'telegram': {
'api_id': '123456', # Client's API ID
'api_hash': 'bdaf62f128aeaa9a65b67a479d9ff413', # Client's API hash
'phone_id': '+31201234567' # Client's phone
},
'graph': {
'channels_ignore': [], # Channels to ignore
'color': { # Colors for nodes (https://plantuml.com/en/color for more information) [12]
'client': 'gold',
'channel': 'technology',
'user': 'lavender'
},
'title': 'Telegram channels relationships', # Graph title
'wordwrap_length': 15
}
}
if __name__ == '__main__':
# Create Client object and sign in
client = TelegramClient(os.path.basename(__file__),
config['telegram']['api_id'],
config['telegram']['api_hash'])
# Create Graph object
graph = nx.Graph()
client.start(config['telegram']['phone_id'])
client_name = client.get_me().first_name
# Add Client as root node
graph.add_node(client_name, color=config['graph']['color']['client'])
# For each not ignored channel in list of dialogs
for channel in [dialog.entity for dialog in client.get_dialogs()
if isinstance(dialog.entity, Channel) and
dialog.entity.id not in config['graph']['channels_ignore']]:
# Get full information for a channel and word wrap its name
channel_full_info = client(GetFullChannelRequest(channel=channel))
channel_name = '\\n'.join(wrap(channel.title, config['graph']['wordwrap_length']))
# Add channel ID as node with attributes 'title' and 'color', link it to the root node
graph.add_node(channel.id, title=channel_name, color=config['graph']['color']['channel'])
graph.add_edge(client_name, channel.id)
# For each contact in full information
for contact_name in re.findall("@([A-z0-9_]{1,100})", channel_full_info.full_chat.about):
# Add contact as node with attribute and link to the channel node
graph.add_node(contact_name, color=config['graph']['color']['user'])
graph.add_edge(contact_name, channel.id)
# Create Planutml file object
plantumlFile = open("{}_telegram_graph.plantuml".format(client_name), 'w')
# Write Plantuml header with graph title
plantumlFile.write("@startuml\ntitle {}\nleft to right direction\n".format(config['graph']['title']))
# For each node in graph
for node in graph.nodes(data=True):
# The node is channel if it has 'title' attribute
if 'title' in node[1]:
plantumlFile.write('frame {} as "{}" #{}\n'.format(node[0], node[1]['title'], node[1]['color']))
# Otherwise, the node is contact
else:
plantumlFile.write('usecase {0} as "@{0}" #{1}\n'.format(node[0], node[1]['color']))
# Link the nodes with each other by edges
for edge in graph.edges():
plantumlFile.write('{} 0--# {}\n'.format(edge[0], edge[1]))
# Write Plantuml footer and close the file
plantumlFile.write("@enduml")
plantumlFile.close()
After completion of execution it will create a file named by first name from Telegram client’s profile concatenated with _telegram_graph.plantuml. Example content:
@startuml
title Telegram channels relationships
left to right direction
usecase Username as "@Username" #gold
frame 1035713458 as "ntwrk" #technology
usecase zhenyatsk as "@zhenyatsk" #lavender
usecase mxssl as "@mxssl" #lavender
usecase darwinggl as "@darwinggl" #lavender
frame 1280552026 as "Коронавирус\nLIVE" #technology
usecase pr_virus as "@pr_virus" #lavender
usecase virusologbot as "@virusologbot" #lavender
frame 1378813139 as "Baza" #technology
usecase mogutin as "@mogutin" #lavender
usecase bazanewsbot as "@bazanewsbot" #lavender
frame 1236920376 as "Reddit" #technology
usecase lulzzsecurity as "@lulzzsecurity" #lavender
usecase redditgbot as "@redditgbot" #lavender
...
Username 0--# 1035713458
Username 0--# 1280552026
Username 0--# 1378813139
Username 0--# 1236920376
...
1035713458 0--# zhenyatsk
1035713458 0--# mxssl
1035713458 0--# darwinggl
1280552026 0--# pr_virus
1280552026 0--# virusologbot
pr_virus 0--# 1470273900
virusologbot 0--# 1470273900
1378813139 0--# mogutin
1378813139 0--# bazanewsbot
1236920376 0--# lulzzsecurity
1236920376 0--# redditgbot
...
@enduml
To tell the truth, in the beginning I wanted to make a visualization using classic matplotlib.pyplot 13 library. But after several tries to draw a well-formed graph, I changed my mind and pick Plantuml 3 for visualization. It supports many topologies and cases, plenty image types as PNG, SVG and LaTeX for a graph output.
PlantUML limits image width and height to 4096, but there is PLANTUML_LIMIT_SIZE environment variable that we can set to override this limit during launch of Plantuml. Let’s draw our graph we got at the previous chapter, by default Plantuml generates PNG image:
~> ls -1
telegram_graph.py
~> python3 telegram_graph.py
Please enter the code you received: 12345
Signed in successfully as Username
~> ls -1
Username_telegram_graph.plantuml
telegram_graph.py
telegram_graph.py.session
~> java -DPLANTUML_LIMIT_SIZE=8192 -Djava.awt.headless=true -jar /path/to/plantuml.jar ./Username_telegram_graph.plantuml
~> ls -1
Username_telegram_graph.plantuml
Username_telegram_graph.png
telegram_graph.py
telegram_graph.py.session
Now you can open Username_telegram_graph.png with your favorite image viewer to inspect how many channels have link to same contacts. As for me, there were few such channels:
A little close-up the one piece:
Following the previous article we already learned that write code is not always necessary. Some cases could be solved in CLI only: curl 14 and jq 15 do magic. For example, let us investigate how many channels use Social Energy 16 agency (we can also do this here 17):
$ ( echo -e '@startuml\nleft to right direction\nusecase social_energy as "@social_energy" #lavender'; curl -sX POST https://tgstat.com/channels/list --data 'offset=0&period=yesterday&country=global&language=global&verified=0&price[vp]=0&search=@social_energy' | jq '.items.list[] | "frame \(.id) as \"\(.title)\" #technology\nsocial_energy -->> \(.id)"' -r | sort -V; echo "@enduml" ) | java -DPLANTUML_LIMIT_SIZE=8192 -jar /path/to/plantuml.jar -pipe > social_energy_telegram_graph.png
Explaining step by step. Create a Plantuml header and an actor:
$ echo -e '@startuml\nleft to right direction\nusecase social_energy as "@social_energy" #lavender';
@startuml
left to right direction
usecase social_energy as "@social_energy" #lavender
Request a list of channels from tgstat.com 7 for @Social_Energy contact and sort output lines:
curl -sX POST https://tgstat.com/channels/list --data 'offset=0&period=yesterday&country=global&language=global&verified=0&price[vp]=0&search=@social_energy' | jq '.items.list[] | "frame \(.id) as \"\(.title)\" #technology\nsocial_energy -->> \(.id)"' -r | sort -V
frame 55002 as "Женский Гороскоп" #technology
frame 56218 as "Интересные факты" #technology
frame 62891 as "Факты | Наука | Фильмы" #technology
...
social_energy -->> 55002
social_energy -->> 56218
social_energy -->> 62891
...
Print Plantuml footer:
$ echo "@enduml"
@enduml
All this text is pipelined to Plantuml with -pipe option and the result will be written to social_energy_telegram_graph.png file.
1. Python3 telethon library ↩
2. Python NetworkX library for complex networks ↩
3. PlantUML tool allowing to create diagrams from a plain text language ↩
4. NetworkX graph types ↩
5. Loop meaning in graph theory ↩
6. Multi-edge meaning in graph theory ↩
7. Telegram Analytics ↩
8. Telegram API ID and API Hash ↩
9. Telethon: Entities ↩
10. Telethon: Get full channel info ↩
11. telegram_graph.py at GitHub ↩
12. Plantuml supported colors list
13. Python Matplotlib library ↩
14. Command line tool for transferring data with URLs ↩
15. Lightweight and flexible command-line JSON processor ↩
16. Social Energy platform ↩
17. Social Energy prices ↩