library(gganimate)
library(ggplot2)
library(reshape2)
Git repositories for extra packages reference:
https://github.com/thomasp85/gganimate
https://bcable.net/x/Rproj/shared
source("../../shared/load_recurse.R")
source("shared/load_varlog.R")
source("shared/parse_rawsplit.R")
source("shared/cleanup_logs.R")
source("shared/country_code_cleanup.R")
source("shared/geoip.R")
source("shared/heatmap_prep.R")
source("shared/turn_to_animation.R")
source("shared/world_mapper.R")
site_name <- "bcable.net"
source("../../shared/paths.R")
theme_heatmap <- function(){
theme_bw() %+replace% theme(
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
legend.margin = margin(0, 0, 0, 0, "cm"),
legend.spacing = unit(0, "cm"),
panel.border = element_blank(),
panel.grid = element_blank(),
panel.spacing = unit(0, "cm"),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
plot.title = element_text(
face="bold", size=22,
margin=margin(0.5, 0.5, 0.5, 0.5, "cm"), debug=FALSE
),
)
}
theme_simple <- function(){
theme_bw() %+replace% theme(
axis.text.x = element_text(angle=90, size=15)
)
}
Geolocation based on IP address is not to be taken as entirely accurate as to the source of traffic or attacks conducted. There are many reasons for this, which include (but are not limited to):
Large quantities of traffic, especially attack based traffic, will use a VPN or the Tor network (or some reasonable facsimile), to mask the origin of the traffic. This will in turn change the appearance of the location of origin. Usually, an attacker will also intentionally want the traffic to appear to come from somewhere that has some form of lesser legal jurisdiction, some form of lesser ability to police traffic, or come from a well known source of malicious attacks such as China or Russia.
For instance, the following log entry was generated by myself against my servers while sitting at my desk in the United States, but it gets geolocated as Russia because of how the packet was sent. This sort of masking is trivial to perform, even by a nine year old on a cellphone.
httpd_data[grep("/from/russia/with/logs", httpd_data$Request), c("Request", "Response.Code", "Country.Code")]
## Request Response.Code Country.Code
## 1 GET /from/russia/with/logs HTTP/1.1 404 RU
Some locations will have a higher distribution of virtual servers than others, such as Silicon Valley or China. This can lead to larger quantities of vulnerable virtual machines and servers in those regions, and
It is possible that due to address assignment for governmental intelligence purposes or other economic or political reasons a nation could re-allocate address space and forge the identity similarly to a NAT (network address translation). They could also funnel information via VPN technologies for another nation.
Because most of these agreements are made in private, and due to the fact that most geolocation and WHOIS records are based on self-reporting, it is impossible to know the 100% true nature of geographic address assignment.
This geolocation uses the rgeolocate package available in CRAN, and uses the internal country database that is shipped with it. There could be an error in the database shipped, there could be an error in the lookup code, etc. Bugs happen. I have no reason to believe that any false geolocation is being performed by these packages, however.
Despite these weaknesses, this doesn't change the fact that looking at this sort of data can be quite fun and interesting, and potentially enlightening. Generalized conclusions should not be made from this data or the maps herein. You have been warned.
messages_records <- load_varlog(path_syslog, "messages")
messages_records <- raw_populate(messages_records)
messages_records <- cleanup_syslog(messages_records)
ipt_data <- cleanup_iptables(messages_records)
messages_records$Raw.Split <- NA
ipt_data$Raw.Split <- NA
Date Min: 2018-10-28 03:09:03
Date Max: 2019-09-22 03:25:01
secure_records <- load_varlog(path_syslog, "secure")
secure_records <- raw_populate(secure_records)
secure_records <- cleanup_syslog(secure_records)
secure_records$Raw.Split <- NA
Date Min: 2018-10-28 04:03:23
Date Max: 2019-09-14 17:14:25
Checking “POSSIBLE BREAK-IN ATTEMPT!” messages, they all appear to be innocuous enough (usually me logging in successfully 5 seconds later, so a typo in my password or somesuch). However, the following is interesting:
sub(
"([0-9][0-9]:[0-9][0-9]:[0-9][0-9]) [^ ]+ ", "\\1 [REDACTED] ",
secure_records$Raw[grepl("Bad protocol", secure_records$Raw)]
)
## [1] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"
## [2] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"
## [3] "Dec 12 14:58:28 [REDACTED] sshd[3809]: Bad protocol version identification '\\003' from [IPREDACTED] port 863"
## [4] "Feb 8 01:06:05 [REDACTED] sshd[27407]: Bad protocol version identification '\\003' from [IPREDACTED] port 46318"
## [5] "Feb 8 01:06:09 [REDACTED] sshd[27408]: Bad protocol version identification '\\003' from [IPREDACTED] port 53422"
## [6] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"
## [7] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"
## [8] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"
## [9] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"
## [10] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"
## [11] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"
## [12] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"
## [13] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"
## [14] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"
## [15] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"
## [16] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"
## [17] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"
## [18] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255"
## [19] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [20] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255"
## [21] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [22] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895"
## [23] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"
## [24] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895"
## [25] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"
All IPs appear to be hosts from specific hosts from Germany, Russia, and Bulgaria. My message to the Bulgarians: “NODNOL 871 SELIM? Thankski Verski Muchski Budski!”
What was odd is that after looking at the information for the WHOIS on the Bulgarian IP address, the physical address and name is very, extremely specific. It gave a specific apartment number, name, etc, that was easily pulled up on Google Street View. Lots of satellite dishes on the side of the apartment complex! Nice enough city, though. Maybe a slight bit crowded. Very creepy that this can be done today, huh? I'm literally looking at the apartment and surrounding city for someone who likely sent a payload at my server. All of this with PUBLIC tools and PUBLIC information. Technology must be destroyed. This kind of goes to show how sensitive an IP address can be, and why I tend to redact these when publishing things like this (even though he's probably being a naughty boy, I do not know the context of what actually occurred).
This also confirms my suspicions that you should never use your actual IP address and send all traffic through a VPN connection you trust. ALL traffic. And ALL traffic going over that should be over an encrypted means to the destination in case the VPN provider turns out to be sketchy.
WHOIS data can be too specific sometimes. This gets into a weird area with GDPR, too, since the US has sided with this information being public, and the EU siding with masking WHOIS information. Might be an interesting factoid to throw into the debate, but who cares about politics anyway? It's just domesticated primates flinging poo at each other. Facts rarely enter the debate, and when they do ideology destroys their purpose. Only way to keep yourself private is to take your privacy into your own hands and don't create data to begin with if you can help it, or mask it well. Better to treat the internet as a more public place than the out of doors.
I'll probably end up using the Rwhois package I made to dig through these IPs next.
Also, another disclaimer. The physical address discovered could be inaccurate or incomplete. I didn't investigate to see if it was a Tor node and there's no way for me to know if it's a VPN this guy runs for his friends or a private collection of clients, or a variety of other circumstances.
Unrumble.
ipt_data$Country.Code <- geoip(ipt_data$IP.Source, "country_code")$country_code
ipt_country_df <- country_code_cleanup(ipt_data$Country.Code)
ipt_top20 <- ipt_country_df[ipt_country_df$Count >
tail(head(sort(ipt_country_df$Count, decreasing=TRUE), n=21), n=1),
]
ipt_data$Date.NoTime <- as.POSIXlt(strftime(ipt_data$Date, format="%Y-%m-%d"))
ipt_data$Count <- rep(1, nrow(ipt_data))
agg_country_time <- aggregate(
Count ~ Country.Code + as.factor(Date.NoTime),
data=ipt_data, FUN=sum
)
agg_country_time <- country_code_merge(agg_country_time)
names(agg_country_time) <- c(
"Country.Code", "Date", "Count", "Latitude", "Longitude", "Country.Name"
)
agg_country_time$Date <- as.POSIXlt(agg_country_time$Date)
agg_country_time_top20 <- agg_country_time[
agg_country_time$Country.Name %in% unique(ipt_top20$Country),
]
g <- world_mapper(ipt_country_df)
g <- g + labs(
title=paste0(
site_name, ": IPTables: INPUT Table Packet Drops", collapse=""
),
fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g
g <- ggplot(ipt_top20, aes(x=Country, y=Count))
g <- g + geom_bar(stat="identity")
g <- g + theme_simple()
g
agg_country_time_top20$Date <- as.POSIXct(agg_country_time_top20$Date)
g <- ggplot(agg_country_time_top20, aes(
x=Date, y=Count, group=Country.Name, colour=Country.Name)
)
g <- g + geom_line() + coord_cartesian(ylim=c(0,10000))
g <- g + theme_simple()
g
agg_dst_ports <- aggregate(
Hostname ~ Destination.Port, data=ipt_data, FUN=length
)
names(agg_dst_ports) <- c("Value", "Count")
agg_dst_ports$Value <- as.numeric(as.character(agg_dst_ports$Value))
non_ephemeral_ports <- heatmap_prep(
agg_dst_ports[agg_dst_ports$Value < 1024,], 1024, 32
)
names(non_ephemeral_ports) <- c("Destination.Port", "Scale", "X", "Y")
non_ephemeral_graph <- function(data, post_title=""){
g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
g <- g + geom_tile() + geom_text()
g <- g + labs(
title=paste0(site_name,
": IPTables Filtered Non-Ephemeral Destination Ports",
post_title, collapse=""
), x="", y=""
)
g <- g + theme_heatmap()
g <- g + scale_fill_continuous(
low="#500000", high="#E00000", guide="colorbar"
)
g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
g
}
non_ephemeral_graph(non_ephemeral_ports)
Truncated at 1000 for visual purposes.
non_ephemeral_ports$Scale[non_ephemeral_ports$Scale > 1000] <- 1000
non_ephemeral_graph(non_ephemeral_ports, " (truncated)")
common_ports <- head(agg_dst_ports[order(-agg_dst_ports$Count),], n=256)
common_ports <- common_ports[order(common_ports$Value),]
common_ports <- heatmap_prep(common_ports)
names(common_ports) <- c("Destination.Port", "Scale", "X", "Y")
common_ports_graph <- function(data, post_title=""){
g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
g <- g + geom_tile() + geom_text()
g <- g + labs(
title=paste0(site_name,
": IPTables Top 256 Commonly Filtered Destination Ports",
post_title, collapse=""
), x="", y=""
)
g <- g + theme_heatmap()
g <- g + scale_fill_continuous(
low="#500000", high="#E00000", guide="colorbar"
)
g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
g
}
g <- ggplot(common_ports, aes(x=as.factor(Destination.Port), y=Scale))
g <- g + geom_bar(stat="identity")
g <- g + labs(
title=paste0(site_name,
": IPTables Filtered Destination Ports Barchart", collapse=""
), x="Port Number (0-65535)", y=""
)
g <- g + theme_simple() %+replace% theme(axis.text.x=element_blank())
g
common_ports_graph(common_ports)
Truncated at 1000 for visual purposes.
common_ports$Scale[common_ports$Scale > 1000] <- 1000
common_ports_graph(common_ports, " (truncated)")
Attacks going after/scanning most commonly attacked or used ports.
ipt_country_22 <- country_code_cleanup(
ipt_data$Country.Code[ipt_data$Destination.Port == 22]
)
g <- world_mapper(ipt_country_22)
g <- g + labs(
title=paste0(
site_name, ": IPTables: INPUT Table Packet Drops (Port 22: ssh)",
collapse=""
),
fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g
Why are people still using telnet. :(
ipt_country_23 <- country_code_cleanup(
ipt_data$Country.Code[ipt_data$Destination.Port == 23]
)
g <- world_mapper(ipt_country_23)
g <- g + labs(
title=paste0(
site_name, ": IPTables: INPUT Table Packet Drops (Port 23: telnet)",
collapse=""
),
fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g
Yucky.
ipt_country_445 <- country_code_cleanup(
ipt_data$Country.Code[ipt_data$Destination.Port == 445]
)
g <- world_mapper(ipt_country_445)
g <- g + labs(
title=paste0(
site_name,
": IPTables: INPUT Table Packet Drops (Port 445: microsoft-ds)",
collapse=""
),
fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g
A port scan is detected if any specific IP address attempts to connect to more than 50 unique destination ports. Under normal usage of my resources, zero will occur. One off connections to random ports that aren't being used are cut out of this detection (for instance, incorrect IP address configured somewhere). No resource should be using more than 50 unique destination ports.
Two detection mechanisms are used in this code. One detects on a per-day basis, to see who is spamming the server (such as: nmap -T insane
), and on a long-term basis, connecting over multiple days from the same IP but to unique destination ports (such as: nmap -T paranoid
).
agg_ip_port_date <- aggregate(
Destination.Port ~ IP.Source + Country.Code + as.factor(Date.NoTime),
data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port_date) <- c(
"IP.Source", "Country.Code", "Date", "Count"
)
agg_ip_port_date$Count <- as.numeric(as.character(agg_ip_port_date$Count))
agg_ip_port <- aggregate(
Destination.Port ~ IP.Source + Country.Code,
data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port) <- c("IP.Source", "Country.Code", "Unique.Ports")
agg_ip_port$Unique.Ports <- as.numeric(as.character(agg_ip_port$Unique.Ports))
agg_unique_ip <- aggregate(
IP.Source ~ Country.Code,
data=agg_ip_port_date[agg_ip_port_date$Count > 50,], FUN=length
)
unique_ip_map_insane <- country_code_merge(agg_unique_ip)
names(unique_ip_map_insane) <- c("Country.Code", "Count", "X", "Y", "Country")
agg_unique_ip_paranoid <- aggregate(
IP.Source ~ Country.Code,
data=agg_ip_port[agg_ip_port$Unique.Ports > 50,], FUN=length
)
unique_ip_map_paranoid <- country_code_merge(agg_unique_ip_paranoid)
names(unique_ip_map_paranoid) <- c("Country.Code", "Count", "X", "Y", "Country")
nmap -T insane
port scans:
nrow(agg_ip_port_date[agg_ip_port_date$Count > 50,])
## [1] 609
nmap -T paranoid
port scans:
nrow(agg_ip_port[agg_ip_port$Unique.Ports > 50,])
## [1] 799
Top nmap -T insane
scan dates:
agg_ip_port_date$Date[agg_ip_port_date$Count > 3000]
## [1] 2019-02-02 2019-02-03 2019-03-01 2019-03-02 2019-03-03 2019-04-01
## [7] 2019-04-05 2019-04-08
## 330 Levels: 2018-10-28 2018-10-29 2018-10-30 2018-10-31 ... 2019-09-22
g <- world_mapper(unique_ip_map_insane)
g <- g + labs(
title=paste0(site_name,
": IPTables: Detected Port Scans (`nmap -T insane`-like)",
collapse=""
),
fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g
g <- world_mapper(unique_ip_map_paranoid)
g <- g + labs(
title=paste0(site_name,
": IPTables: Detected Port Scans (`nmap -T paranoid`-like)",
collapse=""
),
fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g
agg_dst_ports_time <- aggregate(
Hostname ~ Destination.Port + as.factor(Date.NoTime),
data=ipt_data, FUN=length
)
names(agg_dst_ports_time) <- c("Value", "Date", "Count")
agg_dst_ports_time$Date <- as.POSIXlt(agg_dst_ports_time$Date)
agg_dst_ports_time$Value <- as.numeric(as.character(agg_dst_ports_time$Value))
agg_dst_ports_time$Count[agg_dst_ports_time$Count > 1000] <- 1000
ipt_map_data <- ipt_data[!is.na(ipt_data$Country.Code),]
anim_geoip <- turn_to_animation(ipt_map_data)
anim_geoip$Count[anim_geoip$Count > 1000] <- 1000
anim_ports <- turn_to_animation(
agg_dst_ports_time[agg_dst_ports_time$Value < 1024,], "Value", "Count"
)
names(anim_ports) <- c("Animate.Time", "Value", "Count")
anim_ports$Value <- as.numeric(as.character(anim_ports$Value))
anim_ports$Count[is.na(anim_ports$Count)] <- 0
anim_ports <- anim_ports[
(!is.na(anim_ports$Animate.Time) & !is.na(anim_ports$Value)),
]
names(anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
anim_ports$Animate.Time <- as.POSIXct(anim_ports$Animate.Time)
anim_ports_org <- heatmap_prep(
anim_ports, 1024, 32,
date.field="Animate.Time", merge.field="Destination.Port",
value.ordering=TRUE
)
names(anim_ports_org) <- c(
"Animate.Time", "Destination.Port", "Scale", "X", "Y"
)
anim_ports_org$Scale <- as.numeric(as.character(anim_ports_org$Scale))
anim_ports_org$Animate.Time <- as.character(strptime(
anim_ports_org$Animate.Time, format="%Y-%m-%d"
))
common_anim_ports_lbls <- head(
agg_dst_ports$Value[order(-agg_dst_ports$Count)], n=256
)
common_anim_ports_lbls <- common_anim_ports_lbls[order(common_anim_ports_lbls)]
common_anim_ports <- turn_to_animation(
agg_dst_ports_time[agg_dst_ports_time$Value %in% common_anim_ports_lbls,],
"Value", "Count"
)
names(common_anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
common_anim_ports$Animate.Time <- as.POSIXct(common_anim_ports$Animate.Time)
common_anim_ports_org <- heatmap_prep(
common_anim_ports[
common_anim_ports$Destination.Port %in% common_anim_ports_lbls,
], 256, 16,
date.field="Animate.Time", merge.field="Destination.Port",
date.ordering=TRUE, expand.values=common_anim_ports_lbls
)
names(common_anim_ports_org) <- c(
"Animate.Time", "Destination.Port", "Scale", "X", "Y"
)
graph_to_animation <- function(g, x=Inf, y=Inf){
g <- g + geom_label(
aes(x=x, y=y, label=Animate.Time),
vjust="inward", hjust="inward",
colour="#808080", fill="#FFFFFF", label.size=0
)
g <- g + transition_manual(Animate.Time)
g
}
g <- world_mapper(anim_geoip)
g <- g + labs(
title=paste0(
site_name, ": IPTables: INPUT Table Packet Drops GeoIP Lookup",
collapse=""
),
fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- graph_to_animation(g)
options(
gganimate.fps=5,
gganimate.nframes=length(levels(as.factor(anim_geoip$Animate.Time)))
)
g
g <- non_ephemeral_graph(anim_ports_org)
g <- graph_to_animation(g, y=-32.5)
options(
gganimate.fps=5,
gganimate.nframes=length(levels(as.factor(anim_ports_org$Animate.Time)))
)
g