Traffic Journal ::004:: IPTables Analysis (Updated) (2019 Full)

Libraries

library(gganimate)
library(ggplot2)
library(reshape2)

Git repositories for extra packages reference:

https://github.com/thomasp85/gganimate

Local Sourcing

https://bcable.net/x/Rproj/shared

source("shared/load_recurse.R")
source("shared/load_varlog.R")
source("shared/parse_rawsplit.R")

source("shared/cleanup_logs.R")
source("shared/country_code_cleanup.R")
source("shared/fill_zeroes.R")
source("shared/geoip.R")
source("shared/heatmap_prep.R")
source("shared/themes.R")
source("shared/turn_to_animation.R")
source("shared/world_mapper.R")

Config

site_name <- "bcable.net"
path_syslog <- "./appel"
year_filt <- 2019
source("shared/paths.R")

Boilerplate GeoIP Disclaimer

Geolocation based on IP address is not to be taken as entirely accurate as to the source of traffic or attacks conducted. There are many reasons for this, which include (but are not limited to):

Proxies, VPNs, and Tor

Large quantities of traffic, especially attack based traffic, will use a VPN or the Tor network (or some reasonable facsimile), to mask the origin of the traffic. This will in turn change the appearance of the location of origin. Usually, an attacker will also intentionally want the traffic to appear to come from somewhere that has some form of lesser legal jurisdiction, some form of lesser ability to police traffic, or come from a well known source of malicious attacks such as China or Russia.

For instance, the following log entry was generated by myself against my servers while sitting at my desk in the United States, but it gets geolocated as Russia because of how the packet was sent. This sort of masking is trivial to perform, even by a nine year old on a cellphone.

httpd_data[grep("/from/russia/with/logs", httpd_data$Request), c("Request", "Response.Code", "Country.Code")]

##                               Request Response.Code Country.Code
## 1 GET /from/russia/with/logs HTTP/1.1           404           RU

Vulnerable Servers and Botnets

Some locations will have a higher distribution of virtual servers than others, such as Silicon Valley or China. This can lead to larger quantities of vulnerable virtual machines and servers in those regions, and distort the resulting aggregate data.

Government Interference

It is possible that due to address assignment for governmental intelligence purposes or other economic or political reasons a nation could re-allocate address space and forge the identity similarly to a NAT (network address translation). They could also funnel information via VPN technologies for another nation.

Because most of these agreements are made in private, and due to the fact that most geolocation and WHOIS records are based on self-reporting, it is impossible to know the 100% true nature of geographic address assignment.

Weaknesses or errors in MaxMind or rgeolocate package

This geolocation uses the rgeolocate package available in CRAN, and uses the internal country database that is shipped with it. There could be an error in the database shipped, there could be an error in the lookup code, etc. Bugs happen. I have no reason to believe that any false geolocation is being performed by these packages, however.

Final Note

Despite these weaknesses, this doesn't change the fact that looking at this sort of data can be quite fun and interesting, and potentially enlightening. Generalized conclusions should not be made from this data or the maps herein. You have been warned.

Load Syslog Files

messages_records <- load_varlog(path_syslog, "messages")
messages_records <- raw_populate(messages_records)
messages_records <- cleanup_syslog(messages_records)
secure_records <- load_varlog(path_syslog, "secure")
secure_records <- raw_populate(secure_records)
secure_records <- cleanup_syslog(secure_records)
secure_records$Raw.Split <- NA
messages_records <- messages_records[
    messages_records$Date$year == year_filt - 1900,
]
secure_records <- secure_records[
    secure_records$Date$year == year_filt - 1900,
]
ipt_data <- cleanup_iptables(messages_records)
messages_records$Raw.Split <- NA
ipt_data$Raw.Split <- NA

Messages Records Stats

Records: 1974599
Date Min: 2019-01-01 00:00:01
Date Max: 2019-12-31 23:59:57

Secure Records Stats

Records: 875
Date Min: 2019-01-02 06:10:33
Date Max: 2019-12-18 16:14:45

Interesting Logs in “secure”

Checking “POSSIBLE BREAK-IN ATTEMPT!” messages, they all appear to be innocuous enough (usually me logging in successfully 5 seconds later, so a typo in my password or somesuch). However, the following is interesting:

sub(
    "([0-9][0-9]:[0-9][0-9]:[0-9][0-9]) [^ ]+ ", "\\1 [REDACTED] ",
    secure_records$Raw[grepl("Bad protocol", secure_records$Raw)]
)
##  [1] "Feb 8 01:06:05 [REDACTED] sshd[27407]: Bad protocol version identification '\\003' from [IPREDACTED] port 46318"
##  [2] "Feb 8 01:06:09 [REDACTED] sshd[27408]: Bad protocol version identification '\\003' from [IPREDACTED] port 53422"
##  [3] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [4] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [5] "Feb 12 06:41:24 [REDACTED] sshd[1511]: Bad protocol version identification '\\003' from [IPREDACTED] port 489"  
##  [6] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
##  [7] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
##  [8] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
##  [9] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
## [10] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
## [11] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
## [12] "Mar 4 19:44:40 [REDACTED] sshd[772]: Bad protocol version identification '\\003' from [IPREDACTED] port 156"      
## [13] "Mar 6 04:56:50 [REDACTED] sshd[2936]: Bad protocol version identification '\\003' from [IPREDACTED] port 185"     
## [14] "Mar 10 00:10:25 [REDACTED] sshd[8761]: Bad protocol version identification '\\003' from [IPREDACTED] port 285"    
## [15] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255" 
## [16] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [17] "Aug 20 10:36:43 [REDACTED] sshd[2671]: Bad protocol version identification '\\003' from [IPREDACTED] port 3255" 
## [18] "Aug 20 10:36:43 [REDACTED] sshd[2672]: Bad protocol version identification '\\003' from [IPREDACTED] port 18769"
## [19] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895" 
## [20] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"    
## [21] "Sep 12 11:30:01 [REDACTED] sshd[6444]: Bad protocol version identification '\\003' from [IPREDACTED] port 58895" 
## [22] "Sep 14 17:14:25 [REDACTED] sshd[9926]: Bad protocol version identification '\\003' from [IPREDACTED] port 341"

All IPs appear to be hosts from specific hosts from Germany, Russia, and Bulgaria. My message to the Bulgarians: “NODNOL 871 SELIM? Thankski Verski Muchski Budski!”

What was odd is that after looking at the information for the WHOIS on the Bulgarian IP address, the physical address and name is very, extremely specific. It gave a specific apartment number, name, etc, that was easily pulled up on Google Street View. Lots of satellite dishes on the side of the apartment complex! Nice enough city, though. Maybe a slight bit crowded. Very creepy that this can be done today, huh? I'm literally looking at the apartment and surrounding city for someone who likely sent a payload at my server. All of this with PUBLIC tools and PUBLIC information. Technology must be destroyed. This kind of goes to show how sensitive an IP address can be, and why I tend to redact these when publishing things like this (even though he's probably being a naughty boy, I do not know the context of what actually occurred).

This also confirms my suspicions that you should never use your actual IP address and send all traffic through a VPN connection you trust. ALL traffic. And ALL traffic going over that should be over an encrypted means to the destination in case the VPN provider turns out to be sketchy.

WHOIS data can be too specific sometimes. This gets into a weird area with GDPR, too, since the US has sided with this information being public, and the EU siding with masking WHOIS information. Might be an interesting factoid to throw into the debate, but who cares about politics anyway? It's just domesticated primates flinging poo at each other. Facts rarely enter the debate, and when they do ideology destroys their purpose. Only way to keep yourself private is to take your privacy into your own hands and don't create data to begin with if you can help it, or mask it well. Better to treat the internet as a more public place than the out of doors.

I'll probably end up using the Rwhois package I made to dig through these IPs next.

Also, another disclaimer. The physical address discovered could be inaccurate or incomplete. I didn't investigate to see if it was a Tor node and there's no way for me to know if it's a VPN this guy runs for his friends or a private collection of clients, or a variety of other circumstances.

Unrumble.

Build Data Frames

Country Code

ipt_data$Country.Code <- geoip(ipt_data$IP.Source, "country_code")$country_code
ipt_country_df <- country_code_cleanup(ipt_data$Country.Code)
ipt_top10_threshold <- tail(
    head(sort(ipt_country_df$Count, decreasing=TRUE), n=11), n=1
)
ipt_top20_threshold <- tail(
    head(sort(ipt_country_df$Count, decreasing=TRUE), n=21), n=1
)
ipt_top10 <- ipt_country_df[ipt_country_df$Count > ipt_top10_threshold,]
ipt_top20 <- ipt_country_df[ipt_country_df$Count > ipt_top20_threshold,]
ipt_data$Protocol.Clean <- as.character(ipt_data$Protocol)
ipt_data$Protocol.Clean[
    !(ipt_data$Protocol.Clean %in% c("ICMP", "TCP", "UDP"))
] <- "Other"

ipt_data$Date.NoTime <- as.POSIXlt(strftime(ipt_data$Date, format="%Y-%m-%d"))

ipt_data$Count <- rep(1, nrow(ipt_data))
agg_proto_counts <- aggregate(
    Count ~ Protocol.Clean + as.factor(Date.NoTime), data=ipt_data, FUN=sum
)
names(agg_proto_counts) <- c("Protocol", "Date", "Count")
agg_proto_counts$Date <- as.POSIXct(agg_proto_counts$Date)
agg_proto_counts$Protocol <- as.factor(agg_proto_counts$Protocol)
agg_proto_counts <- fill_zeroes(agg_proto_counts, by="Protocol")
order_df <- aggregate(
    Count ~ Protocol, data=agg_proto_counts, FUN=sum
)
agg_proto_counts$Protocol <- factor(
    agg_proto_counts$Protocol, levels=order_df[[1]][order(order_df[[2]])]
)
agg_country_time <- aggregate(
    Count ~ Country.Code + as.factor(Date.NoTime),
    data=ipt_data, FUN=sum
)
agg_country_time <- country_code_merge(agg_country_time)
names(agg_country_time) <- c(
    "Country.Code", "Date", "Count", "Latitude", "Longitude", "Country.Name"
)
agg_country_time$Date <- as.POSIXct(agg_country_time$Date)
agg_country_time_top10 <- agg_country_time[
    agg_country_time$Country.Name %in% unique(ipt_top10$Country),
]
agg_dst_ports <- aggregate(
    Hostname ~ Destination.Port, data=ipt_data, FUN=length
)
names(agg_dst_ports) <- c("Destination.Port", "Count")
agg_dst_ports$Destination.Port <- as.numeric(as.character(
    agg_dst_ports$Destination.Port
))
agg_dst_ports_time <- aggregate(
    Hostname ~ Destination.Port + as.factor(Date.NoTime),
    data=ipt_data, FUN=length
)
names(agg_dst_ports_time) <- c("Destination.Port", "Date", "Count")
agg_dst_ports_time$Date <- as.POSIXct(agg_dst_ports_time$Date)
agg_dst_ports_time$Destination.Port <- as.factor(
    as.character(agg_dst_ports_time$Destination.Port)
)
common_dst_ports <- head(
    agg_dst_ports$Destination.Port[order(-agg_dst_ports$Count)], n=10
)
common_dst_ports_time <- agg_dst_ports_time[
    as.numeric(as.character(agg_dst_ports_time$Destination.Port)) %in%
    as.numeric(as.character(common_dst_ports)),
]
order_df <- aggregate(
    Count ~ Destination.Port, data=common_dst_ports_time, FUN=sum
)
common_dst_ports_time$Destination.Port <- factor(
    common_dst_ports_time$Destination.Port,
    levels=order_df[[1]][order(order_df[[2]])]
)
top10_dst_ports <- head(
    agg_dst_ports$Destination.Port[
        order(-agg_dst_ports$Count[
            -max(as.numeric(as.character(agg_dst_ports$Destination.Port)))
        ])
    ], n=10
)

top10_dst_ports_time <- agg_dst_ports_time[
    as.numeric(as.character(agg_dst_ports_time$Destination.Port)) %in%
    top10_dst_ports,
]
order_df <- aggregate(
    Count ~ Destination.Port, data=top10_dst_ports_time, FUN=sum
)
top10_dst_ports_time$Destination.Port <- factor(
    top10_dst_ports_time$Destination.Port,
    levels=order_df[[1]][order(order_df[[2]])]
)

Graphs

IPTables INPUT Table Packet Drops

g <- world_mapper(ipt_country_df)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops", collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_ipt_geo

IPTables Packet Type Timeline

g <- ggplot(agg_proto_counts, aes(x=Date, y=Count, colour=Protocol))
g <- g + geom_line()
g <- g + theme_simple()
g <- g + scale_colour_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Protocol", collapse=""
    )
)
g

plot of chunk timeline_agg_proto_counts_line

g <- ggplot(agg_proto_counts, aes(x=Date, y=Count, fill=Protocol))
g <- g + geom_area()
g <- g + theme_simple()
g <- g + scale_fill_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Protocol", collapse=""
    )
)
g

plot of chunk timeline_agg_proto_counts_area

IPTables Top 20 Country Barchart

g <- ggplot(ipt_top20, aes(x=Country, y=Count/1000))
g <- g + geom_bar(stat="identity")
g <- g + labs(
    title=paste0(site_name, ": IPTables: INPUT DROPs by Top 20 Countries"),
    y="Count (thousands)"
)
g <- g + theme_simple()
g

plot of chunk barchart_country

IPTables Top 10 Country Timeline

g <- ggplot(agg_country_time_top10, aes(
    x=Date, y=Count, group=Country.Name, colour=Country.Name)
)
g <- g + geom_line() + coord_cartesian(ylim=c(0,10000))
g <- g + labs(
    title=paste0(site_name, ": IPTables: INPUT DROPs by Top 10 Countries")
)
g <- g + theme_simple()
g <- g + scale_colour_brewer(palette="Paired")
g

plot of chunk timeline_country

order_df <- aggregate(
    Count ~ Country.Code, data=agg_country_time_top10, FUN=sum
)
agg_country_time_top10$Country.Code <- factor(
    agg_country_time_top10$Country.Code,
    levels=order_df[[1]][order(order_df[[2]])]
)
g <- ggplot(agg_country_time_top10,
    aes(x=Date, y=Count, fill=Country.Name)
)
g <- g + geom_area() + coord_cartesian(ylim=c(0,10000))
g <- g + labs(
    title=paste0(site_name, ": IPTables: INPUT DROPs by Top 10 Countries")
)
g <- g + theme_simple()
g <- g + scale_fill_brewer(palette="Paired")
g

plot of chunk timeline_area_country

Ports

Timelines

g <- ggplot(common_dst_ports_time, aes(x=Date, y=Count, colour=Destination.Port))
g <- g + geom_line()
g <- g + theme_simple()
g <- g + scale_colour_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Top 10 Common Ports", collapse=""
    )
)
g

plot of chunk timeline_ports_line

g <- ggplot(common_dst_ports_time, aes(x=Date, y=Count, fill=Destination.Port))
g <- g + geom_area()
g <- g + theme_simple()
g <- g + scale_fill_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Top 10 Common Ports", collapse=""
    )
)
g

plot of chunk timeline_ports_area

g <- ggplot(top10_dst_ports_time, aes(x=Date, y=Count, colour=Destination.Port))
g <- g + geom_line()
g <- g + theme_simple()
g <- g + scale_colour_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Top 10 Max Frequency Common Ports",
        collapse=""
    )
)
g

plot of chunk plot_top10_ports_line

g <- ggplot(top10_dst_ports_time, aes(x=Date, y=Count, fill=Destination.Port))
g <- g + geom_area()
g <- g + theme_simple()
g <- g + scale_fill_brewer(palette="Paired")
g <- g + labs(x="", y="Dropped Packets",
    title=paste0(site_name,
        ": IPTables DROPs on Public IP by Top 10 Max Frequency Common Ports",
        collapse=""
    )
)
g

plot of chunk timeline_top10_ports_area

Non-Ephemeral Tile

agg_dst_ports$Destination.Port <- as.numeric(as.character(
    agg_dst_ports$Destination.Port
))
non_ephemeral_ports <- heatmap_prep(
    agg_dst_ports[agg_dst_ports$Destination.Port < 1024,], 1024, 32,
    merge.field="Destination.Port"
)
names(non_ephemeral_ports) <- c("Destination.Port", "Scale", "X", "Y")
non_ephemeral_graph <- function(data, post_title=""){
    g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
    g <- g + geom_tile() + geom_text()
    g <- g + labs(
        title=paste0(site_name,
            ": IPTables Filtered Non-Ephemeral Destination Ports",
            post_title, collapse=""
        ), x="", y=""
    )
    g <- g + theme_heatmap()
    g <- g + scale_fill_continuous(
        low="#500000", high="#E00000", guide="colorbar"
    )
    g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
    g
}
non_ephemeral_graph(non_ephemeral_ports)

plot of chunk graph_non_ephemeral_ports

Truncated at 1000 for visual purposes.

non_ephemeral_ports$Scale[non_ephemeral_ports$Scale > 1000] <- 1000
non_ephemeral_graph(non_ephemeral_ports, " (truncated)")

plot of chunk graph_non_ephemeral_ports_trunc

Filtered Destination Ports

common_ports <- head(agg_dst_ports[order(-agg_dst_ports$Count),], n=256)
common_ports <- common_ports[order(common_ports$Destination.Port),]
common_ports <- heatmap_prep(common_ports)
names(common_ports) <- c("Destination.Port", "Scale", "X", "Y")
common_ports$Destination.Port <- as.factor(common_ports$Destination.Port)
common_ports_graph <- function(data, post_title=""){
    g <- ggplot(data, aes(x=X, y=Y, fill=Scale, label=Destination.Port))
    g <- g + geom_tile() + geom_text()
    g <- g + labs(
        title=paste0(site_name,
            ": IPTables Top 256 Commonly Filtered Destination Ports",
            post_title, collapse=""
        ), x="", y=""
    )
    g <- g + theme_heatmap()
    g <- g + scale_fill_continuous(
        low="#500000", high="#E00000", guide="colorbar"
    )
    g <- g + scale_x_discrete(expand=c(0,0)) + scale_y_discrete(expand=c(0,0))
    g
}
g <- ggplot(common_ports, aes(x=Destination.Port, y=Scale))
g <- g + geom_bar(stat="identity")
g <- g + labs(
    title=paste0(site_name,
        ": IPTables Filtered Destination Ports Barchart", collapse=""
    ), x="Port Number (0-65535)", y=""
)
g <- g + theme_simple() %+replace% theme(axis.text.x=element_blank())
g

plot of chunk bar_common_ports

Commonly Filtered Destination Ports Tile

common_ports_graph(common_ports)

plot of chunk graph_common_ports

Truncated at 1000 for visual purposes.

common_ports$Scale[common_ports$Scale > 1000] <- 1000
common_ports_graph(common_ports, " (truncated)")

plot of chunk graph_common_ports_trunc

Graphs for Common Ports

Attacks going after/scanning most commonly attacked or used ports.

22: ssh
ipt_country_22 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 22]
)
g <- world_mapper(ipt_country_22)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops (Port 22: ssh)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_ssh_map

23: telnet

Why are people still using telnet. :(

ipt_country_23 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 23]
)
g <- world_mapper(ipt_country_23)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops (Port 23: telnet)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_telnet_map

445: microsoft-ds

Yucky.

ipt_country_445 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 445]
)
g <- world_mapper(ipt_country_445)
g <- g + labs(
    title=paste0(
        site_name,
        ": IPTables: INPUT Table Packet Drops (Port 445: microsoft-ds)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_microsoftds_map

3389: rdesktop/rdp

Due to alleged rise in RDP attacks after COVID-19 mass migrations to staying at home, this will be interesting to watch over the next couple years.

Mind you, I follow the Larry David approach, and don't know why anyone would leave their home in the first place. There's nothing but trouble out there, nothing is gained by leaving your home, so why do it? Why were people doing it for so long, and why are they squandering this amazing opportunity to have an excuse to block out the rest of the stupid world full of domesticated primates flinging poo at each other?

Read a book. Watch TV. Play some video games. Calm the fuck down you sheeple. Also, stop trying to break into my house. It won't end well for either one of us, think “No Country for Old Men” or “Enemy of the State”.

ipt_country_3389 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 3389]
)
g <- world_mapper(ipt_country_3389)
g <- g + labs(
    title=paste0(
        site_name,
        ": IPTables: INPUT Table Packet Drops (Port 3389: rdp/rdesktop)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_rdp_map

37215: Huawei Administration Port

Huawei has been in the news for quite a few security vulnerabilities, and I've noticed on some other servers that are getting blasted on this port which is a Huawei administration port, so would be interesting to see here where it's coming from.

Most likely a Chinese Community Party // Government run company, Huawei could potentially be just dumping insecure product, then attacking those vulnerabilities. Based on where the attacks are coming this doesn't really say much, but is interesting nonetheless.

If it's from China, it could be actually coming from Chinese hackers and/or government agents. Or US/Russia/criminals using a Chinese VPN or proxy to throw off detection systems.

If it's from elsewhere, it could just be from where it says it is, or China routing it from another country.

As I said in the disclaimer above, none of these country codes are very reliable for many reasons. It could just be that Huawei is an incompetent company at designing secure routing equipment.

Regardless of the truth, there is no reason to use anything developed by Huawei.

ipt_country_37215 <- country_code_cleanup(
    ipt_data$Country.Code[ipt_data$Destination.Port == 37215]
)
g <- world_mapper(ipt_country_37215)
g <- g + labs(
    title=paste0(
        site_name,
        ": IPTables: INPUT Table Packet Drops (Port 37215: Huawei Admin)",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- g
g

plot of chunk graph_huawei_map

Examining Specific Data

Detecting Port Scans

A port scan is detected if any specific IP address attempts to connect to more than 50 unique destination ports. Under normal usage of my resources, zero will occur. One off connections to random ports that aren't being used are cut out of this detection (for instance, incorrect IP address configured somewhere). No resource should be using more than 50 unique destination ports.

Two detection mechanisms are used in this code. One detects on a per-day basis, to see who is spamming the server (such as: nmap -T insane), and on a long-term basis, connecting over multiple days from the same IP but to unique destination ports (such as: nmap -T paranoid).

Aggregation

agg_ip_port_date <- aggregate(
    Destination.Port ~ IP.Source + Country.Code + as.factor(Date.NoTime),
    data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port_date) <- c(
    "IP.Source", "Country.Code", "Date", "Count"
)
agg_ip_port_date$Count <- as.numeric(as.character(agg_ip_port_date$Count))
agg_ip_port <- aggregate(
    Destination.Port ~ IP.Source + Country.Code,
    data=ipt_data, FUN=function(x){ length(unique(x)); }
)
names(agg_ip_port) <- c("IP.Source", "Country.Code", "Unique.Ports")
agg_ip_port$Unique.Ports <- as.numeric(as.character(agg_ip_port$Unique.Ports))
agg_unique_ip <- aggregate(
    IP.Source ~ Country.Code,
    data=agg_ip_port_date[agg_ip_port_date$Count > 50,], FUN=length
)
unique_ip_map_insane <- country_code_merge(agg_unique_ip)
names(unique_ip_map_insane) <- c("Country.Code", "Count", "X", "Y", "Country")
agg_unique_ip_paranoid <- aggregate(
    IP.Source ~ Country.Code,
    data=agg_ip_port[agg_ip_port$Unique.Ports > 50,], FUN=length
)
unique_ip_map_paranoid <- country_code_merge(agg_unique_ip_paranoid)
names(unique_ip_map_paranoid) <- c("Country.Code", "Count", "X", "Y", "Country")

Raw Data

nmap -T insane port scans:

nrow(agg_ip_port_date[agg_ip_port_date$Count > 50,])
## [1] 989

nmap -T paranoid port scans:

nrow(agg_ip_port[agg_ip_port$Unique.Ports > 50,])
## [1] 890

Top nmap -T insane scan dates:

agg_ip_port_date$Date[agg_ip_port_date$Count > 3000]
## [1] 2019-03-01 2019-03-02 2019-03-03 2019-04-01 2019-04-05 2019-04-08
## 365 Levels: 2019-01-01 2019-01-02 2019-01-03 2019-01-04 ... 2019-12-31

Mapping Detected Port Scans

g <- world_mapper(unique_ip_map_insane)
g <- g + labs(
    title=paste0(site_name,
        ": IPTables: Detected Port Scans (`nmap -T insane`-like)",
        collapse=""
    ),
    fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g

plot of chunk map_port_scans

Mapping Long-Term Detected Port Scans

g <- world_mapper(unique_ip_map_paranoid)
g <- g + labs(
    title=paste0(site_name,
        ": IPTables: Detected Port Scans (`nmap -T paranoid`-like)",
        collapse=""
    ),
    fill="Unique IPs", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g

plot of chunk map_port_scans_paranoid

Animations

Data Processing

ipt_map_data <- ipt_data[!is.na(ipt_data$Country.Code),]
anim_geoip <- turn_to_animation(ipt_map_data)
anim_geoip$Count[anim_geoip$Count > 1000] <- 1000
agg_dst_ports_time_trunc <- agg_dst_ports_time
agg_dst_ports_time_trunc$Count[agg_dst_ports_time_trunc$Count > 1000] <- 1000
agg_dst_ports_time_trunc$Destination.Port <- as.numeric(as.character(
    agg_dst_ports_time_trunc$Destination.Port
))

anim_ports <- turn_to_animation(
    agg_dst_ports_time_trunc[
        agg_dst_ports_time_trunc$Destination.Port < 1024,
    ], "Destination.Port", "Count"
)
names(anim_ports) <- c("Animate.Time", "Value", "Count")

anim_ports$Value <- as.numeric(as.character(anim_ports$Value))

anim_ports$Count[is.na(anim_ports$Count)] <- 0
anim_ports <- anim_ports[
    (!is.na(anim_ports$Animate.Time) & !is.na(anim_ports$Value)),
]
names(anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
anim_ports$Animate.Time <- as.POSIXct(anim_ports$Animate.Time)
anim_ports$Destination.Port <- as.numeric(as.character(
    anim_ports$Destination.Port
))
anim_ports_org <- heatmap_prep(
    anim_ports, 1024, 32,
    date.field="Animate.Time", merge.field="Destination.Port",
    value.ordering=TRUE
)

names(anim_ports_org) <- c(
    "Animate.Time", "Destination.Port", "Scale", "X", "Y"
)

anim_ports_org$Scale <- as.numeric(as.character(anim_ports_org$Scale))
anim_ports_org$Animate.Time <- as.character(strptime(
    anim_ports_org$Animate.Time, format="%Y-%m-%d"
))
common_anim_ports_lbls <- head(
    agg_dst_ports$Destination.Port[order(-agg_dst_ports$Count)], n=256
)
common_anim_ports_lbls <- common_anim_ports_lbls[order(common_anim_ports_lbls)]

common_anim_ports <- turn_to_animation(
    agg_dst_ports_time_trunc[
        agg_dst_ports_time_trunc$Destination.Port %in% common_anim_ports_lbls,
    ], "Destination.Port", "Count"
)
names(common_anim_ports) <- c("Animate.Time", "Destination.Port", "Value")
common_anim_ports$Animate.Time <- as.POSIXct(common_anim_ports$Animate.Time)
common_anim_ports_org <- heatmap_prep(
    common_anim_ports[
        common_anim_ports$Destination.Port %in% common_anim_ports_lbls,
    ], 256, 16,
    date.field="Animate.Time", merge.field="Destination.Port",
    date.ordering=TRUE, expand.values=common_anim_ports_lbls
)
names(common_anim_ports_org) <- c(
    "Animate.Time", "Destination.Port", "Scale", "X", "Y"
)

Quick Animate Function Wrapper

graph_to_animation <- function(g, x=Inf, y=Inf){
    g <- g + geom_label(
        aes(x=x, y=y, label=Animate.Time),
        vjust="inward", hjust="inward",
        colour="#808080", fill="#FFFFFF", label.size=0
    )
    g <- g + transition_manual(Animate.Time)
    g
}

IPTables INPUT Table Packet Drops

g <- world_mapper(anim_geoip)
g <- g + labs(
    title=paste0(
        site_name, ": IPTables: INPUT Table Packet Drops GeoIP Lookup",
        collapse=""
    ),
    fill="Dropped Packets", x="", y=""
)
g <- g + scale_fill_continuous(low="#300000", high="#E00000", guide="colorbar")
g <- graph_to_animation(g)
options(
    gganimate.fps=5,
    gganimate.nframes=length(levels(as.factor(anim_geoip$Animate.Time)))
)
g

plot of chunk anim_iptables_render

Non-Ephemeral Tile

g <- non_ephemeral_graph(anim_ports_org)
g <- graph_to_animation(g, y=-32.5)
options(
    gganimate.fps=5,
    gganimate.nframes=length(levels(as.factor(anim_ports_org$Animate.Time)))
)
g

plot of chunk anim_non_ephemeral_ports_render

Common Ports Animation

g <- common_ports_graph(common_anim_ports_org)
g <- graph_to_animation(g, y=-16)
anim_save(
    "images/analysis/syslog_iptables_world-2019/anim_common_ports_render.gif", g, fps=5,
    nframes=length(levels(as.factor(common_anim_ports_org$Animate.Time))),
    width=864, height=720
)