SEO & The Yandex Code Leak
In the digital world, search engine optimization (SEO) remains a hot topic that is fiercely debated, and its secrets jealously guarded. Whether your goal is to generate fresh leads or boost traffic to your site, you need to know the latest rules and guidelines touted by the leading search engines and aggregators. But what happens when the unthinkable happens, and you find yourself looking at leaked code and technical data from the fourth biggest search engine in the world?
In this week’s episode, Dwight and Gary discuss the impact of the Yandex code leak and what it means for those planning for a busy 2023 in the SEO space.
SEO Remains A Puzzle To Solve In 2023
While some things never change, SEO tends to pull a few surprises at least once a year. These come in the form of a Google update or two that tweak the rules on what is considered valuable to a site or the launch of a new feature that can help promote different content and landing pages.
And barely a month into the year, something that no one could plan for has happened; an alleged data leak of Yandex code. For those unfamiliar with it, Yandex is a Russian search engine that functions much in the same way as Google. It’s not a platform you or your clients will be looking to dominate in 2023, but the insights found by sifting through the leaked list of ranking factors raises interesting questions.
For those in the SEO industry, it’s easy to be swamped by the sheer volume of data, features, and new updates that arrive annually. And the more that gets added over time, the more diluted our own opinions can become on core SEO subjects. We risk losing sight of what has worked in the past and, more worryingly, the core aspects of what makes for a winning SEO strategy.
This latest SEO event is an excellent chance for us all to look through vast piles of data, but this time, we might want to take a step back and get a refresher on the basics of what makes a modern search platform tick.
Interested in learning more about Search Engine Optimization and how best to improve your rankings on Google? Hiring an agency to help put together a winning strategy may be the right choice for you! For more information about our copywriting services, social media maintenance, and SEO maintenance, contact PA online or call us at (248) 582-9210 to get started.
The full transcript from Episode 35 can be found below:
Been such a very, very long time since we’ve done any type of podcasting. Lack of class? Laziness? I don’t know. Number of different things. But in the midst of it all, we’ve done a lot of changes here at the agency. One of the things that we were lucky enough to do is during open interviewing and scouring throughout the whole entire England market, we came across a new really special SEO person. His name is Gary Jones, and he joins us here today on the podcast. Hello, Gary.
Yeah. Hi. Thanks for… Cheers for hiring me, Dwight.
Thanks for coming on board. Ideally, though, not ideally, but just for clarifications, though, Gary is stateside. He is a former gent from the UK. Obviously, you can tell with the accent. But why don’t you give everybody a little bit of background, at least our customers and listeners know a little bit back where you come from. Why, the big reason, it’ll be pretty made aware why of we wanted Gary to be a part of this team. Gary, tell us a little bit about who you are and what your background is inside of 30 seconds. Go.
Okay, yeah. I’ve been in the United States since 2015. I have worked in a couple of different fields, mostly related to media and media writing. A lot of that was spent towards SEO and the importance of getting content to basically rank near the top pages of Google. I’ve worked with some big and some small companies. I’m always hungry to learn as much as I possibly can about the SEO industries, the secrets you can find out, and the little tips and tricks you can pass on to help people grow.
Now, one of the things that drew me to you during the hunt back in the fall was the wide variety of experience you did have working on a number of large brands but also in the gaming space, a little bit in the entertainment space. I think one of the last stints you had was to a company you were working with site that had millions and millions of page views on a daily, weekly basis, even if not per month. You were looking at articles for their performance on different long tail keyword phrases to modify older entry point pages so that you could work CPMs to have higher return on investment. Am I reciting it the right way?
Yeah, sure. It was definitely something similar to what you’re describing, and it was always an ongoing challenge because it was always such a competitive marketplace and market space in which you find yourself in. Some companies are in niches that, say, don’t move that often, and some companies are in niches that change every single day and have piles and piles of new content appearing online within hours.
I think that’s where it drew my attention, because you’re inside the space so much, myself and you are, we were in parallels there for a bit, and it’s like you forget how small your world really is and how big it possibly can be. A lot of our customers, they don’t understand. They come to us with a set of KPIs and it just might be a handful of those, but you don’t understand by going through and analyzing a lot of your traffic, site conditions, where there’s low-hanging fruit, how there’s other opportunity for revenue in a number of different ways. That could be downstream, but you could look at that in a lot of different ways.
Researching traffic that’s coming to pages on your site, if it’s blog or it’s content, and if you have any type of CPM advertising around there or offers or it’s lead-gen based, if you reoptimize those on a regular basis, you might be able to flex your muscles and become a little bit more efficient and definitely show an ROI for that investment. I thought that was pretty cool and a lot of people don’t think about going that granular in a lot of ways, but I guess when you have a big site and you can hire someone just to do that as a full-time gig, that works, right?
Yes, for sure. Some places will always have unique challenges, but you can have someone with a lot of traffic, but if the traffic isn’t something that they can sell on, then it can be a problem. If you’re a European based site and most of your traffic is coming from the United States, what do you do with that problem? Do you look to migrate away from that, or do you look to grow that and move closer to that market and try and set yourself up further in that area? These are just some of the questions that can just pop up from working on all kinds of different sites with different clients.
Yeah. The reason to reunite, to reinvigorate the podcast and get into this a little bit more. We have a special thing that’s going to be happening with the agency in the next couple months forthcoming, but our podcasts and helpful content and dialogue, getting it out to our clients and target audience or whoever’s out there that enjoys hearing about this stuff. We’re going to be doing this on a regular basis, going to be happening weekly moving forward. Over the weekend, actually on Friday, sent a Slack over to Gary on Friday night and said, huh, look what I just heard about that was pretty interesting. It was definitely an eyebrow raiser for me. In the midst of some serious news that was going on about police brutality and other things, this came across my Twitter feed, which is Yandex had a leak that basically, a portion of their code that was submitted to a Git repository was hacked over the summer in ’22 and was released on some substack sites yesterday. I’m sorry, last week, I believe on Wednesday or Thursday that it was confirmed, and it was about 43 gigs of information.
But the important part about this is that it basically is supposedly, here’s air quotes, it was giving out a lot of their search ranking factors for how their algorithm works. Now, for people listening, Yandex is essentially the Google of Russia. That is their number one search engine over in that part of the world. Definitely not going to trump out what Google is doing, but it definitely came after Google and it’s indexing and it’s, again, the number one search engine over in that portion of the world in parts of Europe. That’s a big deal. It’s definitely managing a lot of requests every single second over in that region and it’s been around for over a decade or so. It’s got some legitimacy behind this here. Was digging into this and a lot of the things that were here and how does this come into play for a simpleton or for a general person out there?
You got a small business. Might be something hyper localized. You have a website. You’re putting some money into ads, into Meta and you’re spending a couple thousand dollars a month to work on a strategy on your website and of course you do want organic traffic coming into your site, so why would this even matter to you? I think that’s the thing Gary and I wanted to summarize for you guys in about 10 or 15 minutes, is just point out a number of things that helped to solidify things that we knew and where we can find parallels how this matches to what Google does.
Now, again, this is speculative. I believe it has been verified that this was actual code of theirs, a version of the code. Of course, they’re going to skittle bot around this and stuff. However, it was verified that this code did go out and it is not Google. Let’s talk about some of the similarities and some of the things that happened. Gary, do you want to hit on some of these things or do you want to go back and forth in regards to it? Where do you want to start?
Yeah, sure. I would just start by saying that in such a data-driven industry such as us, we have to go very long lengths of time with very little information on some of the more technical sides of things. We’ve gone through phases where we’ve got a lot of information from Google when they’ve been making major updates and we’ve gone through phases when Google hasn’t given us any information or very little information for some updates that have seemed to have caused some major changes.
That was a thing, too, is it was only three years ago that actually, somewhere right around the time that Danny Sullivan joined as the head of spam after Matt Cutts and they had a vacancy over there for a couple of years, but that’s when Google actually did, I think it was the your health, your life, those updates, when they formulated E-A-T as basically part of their radar guidelines. But that’s when they started actually announcing that they were doing updates and that they were verifying those via Twitter. I just wanted to add in there so a lot of people know, that we haven’t always known when Google was going to be doing updates or when there were going to be major shuffles that went on, or we called them the Google dance for the longest time since the mid two thousands, that’s what they were known as. Now, it’s a regular, ongoing daily thing with some major core updates to specific parts of the algorithm that are announced and done on a monthly basis now. But continue on, Gary.
Yeah. I think that recently, we’ve gone back to a point where we do get more information now. I think in 2022, we got quite a lot of updates on those Google updates, like the helpful content, things like that. Sometimes we’re still missing things out, but the one side that we don’t know much about is the technical side. We don’t get to look at Google’s code for things like that. A lot of it can be good guessing, basically, on how to move more forward. I think what has happened with Yandex, it’s a good point in time where we can look at it, and while we can’t say that this is going to be point for point with Google, I think it’s a great time for people to just use what’s coming out of it as a refresher and check out those ranking factors.
Because like I say, they might not be word for Google or anything like that, but there might be things in there that get you thinking about things that you might have forgotten about, ranking factors that you already knew about Google or they’ve talked about. You might go, well, wait a minute, let’s go back and see if those things are still, what’s the latest for something in that area for Google? That’s what I’ve been looking at. I’ve been looking through this list of ranking factors for Yandex and pointing out the things that maybe have slipped my mind or they’ve slipped down the pecking order in my mind and I’m using this as a good time to really look back and go, well, how can I take that information and apply it to the people I’m working with now and the clients I’m working with now?
Yeah. There’s the value factor when you’re looking for a partner or you’re looking for a service that’s going to be working along with your team or they’re going to be doing the work solely at your appointment. How many points of reference are they going back to and how they taken in something like this objectively and looking at it and taking consideration or using them as a refresher, like you’re saying there. Search Engine Land, of course, was one of the first places to come out and had a little bit more of a deeper article in regards to it. I just want to run through some of the things here as we do a lot of scanning. Everybody does now, and time is limited so you can’t just read everything and soak 80 hours into something like this over a weekend. But what I thought was interesting is there was a number of points of ranking factors and they went ahead and recited a lot of what they were and then looked at commonalities or things that were just too basic, and then reconfigured it.
1900 search factors of how a website or webpage is ranked specifically. What they recited in regards to that is a number of things were what you expected to see. They have their own variation of page rank, which essentially, was branded by Larry Page, one of the founders of Google. Back in the day, it was really a couple factors is what ranked webpages, because everything before was an in indexing directory. Alta Vista, maba.com, Netscape, Lycos. Search engines like those were in a big way trumped out by Google because of the fact that they looked at the page rank, the number of links coming back to pages inside of a website and what was the anchor text of those links that were pointed into what pages. And then as they crawled the web, they calculated those and so on and so forth.
It grew from obviously, a large point there. Text relevancy is a big deal also with Yandex. Some other things, like Gary mentioned, you don’t think about a lot of ways, too, and this is one of the things that turned me onto him as an applicant back in the fall was some of this creative thinking of content age and freshness. How often are you going back into some of your cornerstone pages of your website and looking at traffic and looking how trends change and how do you jump on the things like that, and how often are those being updated and becoming fresh? Some other things, which a lot of people do forget about, is yeah, I put a website together. Largely, the biggest hindrance is going to be assets, is going to be the content images, a pending white paper, PDFs, other things like that, and how those are going to be built so they’re going to be search and viewed by an end user from your website.
User behavior signals that are coming from a search index. Let’s say we’re talking about Premiership soccer and Crystal Palace and we’re doing some searches on Google and it displays different results to us. There are some paid ads above it. Then you have your organic results. I go to the first result and it happens to be for the Premier League and the second result might be for the club’s website. I click on the second link and I clearly don’t find what I’m looking for so I quickly bounce out of there and then I go back to the first link. Essentially, those actions are recorded in somewhere that we know of, like Google, or it’s speculated and very highly speculated that those types of things come into the fray of how they’re calculating how good your website is.
If people are bouncing from going through a click on an organic search and then going back and clicking on something else or lack thereof, they actually click through to your website and they stay there for quite a while. Those things are all giving you ranking factors whether you like it or not. Another one that was pretty important was host reliability. This all goes back to the old joke is we have customers that will come in and it might be on the smaller business market, but they host with GoDaddy or Tucows or some of those large conglomerates, but they bought the 8.99 domain registration and they’re paying 4.99 a month for hosting on a shared host. That host reliability and how much traffic that it can take and its performance is definitely going to play a factor into how you rank specifically.
Let’s see. Some of the other ranking factors that they’re finding, what was surprising was the number of unique visitors, percentage of organic traffic, and the average domain ranking across queries. There goes back into some arguable points of age of domain registration, longevity of domain registration. Some other factors that I thought were pretty interesting is I’ve always been, what’s the right word for it? I’ve always been a bugger for having underscores in page URLs or having too many dashes. That seems to be a factor as well. Also numbers in URLs for pages and other things, which is a big deal. Let’s see. Being penalized. Let’s see. A lot of people are highlighting. Are interpretation is this website is penalized and page rank is reduced to zero. This is inline long-standing theory that if you receive a penalty in Yandex, recovery is a lot harder. Penalizations, that seems to be a big thing. That’s some of the information that was summarized inside of there. What are some other things you’ve seen, Gary?
There’s some things that are super basic, but there are also things that I just think are quite interesting as well just to think about and look at from a different perspectives. Things like the percentage of direct traffic is a ranking factor and the idea that if your site is just pulling organic traffic and it’s not getting people who are just coming straight to the site because they know it, that can be seen as suspicious to a search engine.
I find that incredibly interesting because it feels like some of these factors that are in place are making it much more difficult to be, say, creating a new website or if you are a new business, and it’s rare these days to say you’re a business and you don’t have a website or your website is ancient and then you’re looking to reinvigorate or change that area and you create a site and you invest in the SEO and you’re doing well with the organic traffic. It’s a strange feeling that there’s a counterbalance where it’s just like, well, if you’re not getting direct visitors, then I find that strange.
But I understand why. I can see the logic behind that because there are so many spam websites out there that just simply copy and paste other people’s content and then look to beat them on the ranking pages and so that that’s a way of counterbalancing that strategy people have. But it’s interesting as well, just think about it and just look at that and think, oh, I see. It also feeds into how SEO needs to work with a larger marketing strategy sometimes when you learn those.
Oh, yeah. That’s definitely for sure. It definitely needs a strategy behind it. But you got to arm yourself with someone that can help walk you along in the steps that are appropriate and also be able to explain to you what you’re up against potentially and how to get there. Because we’re definitely still tainted in an industry at large where SEO now is more common more than it was a dozen plus years ago. Did a lot of education back in those days. But now the value of what those are and something that is being optimized or being fed. Hiring someone to do SEO for your site is like yeah, I stopped and got the kids food. Well, you went to McDonald’s, versus you did get some good chicken and brought it home and cooked it appropriately with some fresh vegetables.
That’s definitely getting the kids food, but there’s variance in the type of food that’s being delivered and the quality that’s going to have and some obviously, negative impact that could be seen from those as well. Running back through some of these things was URL construction seemed to matter, trailing slashes were seen as negative, and numbers in the URL could also be seen as negative. Again, seen as, not necessarily definitive. Positives, if you’re containing a corresponding country or city geoidentifier to the user, that could be beneficial. I’m in Detroit, so if we had detroitcars.com, obviously, it’s going to be a little bit more relevant than a brand name or something that’s like a Carvana. Might have a little bit more trouble based off of that for people that are looking for corresponding keyword phrases. Domains or the query has a semantic relation to the query and URL length is also a factor.
When you get into those longer terms, like crystalpalacewannabefootballer.co.uk, that might not be an as advantageous URL as some other simpler, easier URLs would be. I keep bringing up these football references, obviously, because I’m a football fan and Gary is like my segue over the pond and what’s going on over there. I’m stuck in a country where a lot of people football, they think about how bad the lines are doing or the Super Bowl that’s going to be going on where that is definitely not what I’m talking about.
But I enjoy talking about a little bit about things that happen on the pitch and stuff. Predicting the numbers of products on a page where they use a DSSSM to look up the URL and the page title to determine if a webpage has one product or multiple products listed on it. DSSSM product prediction probabilities, also utilizing into the URL into the title. That seems to be a factor in regards to it. Obviously, they have quality scoring that’s going on. We found, or not we, but others have found essentially a handful of factors related to medical, financial, and legal topics, which was also very interesting. Also TikTok. It’s not clear that it’s essentially implemented as far as traffic coming in or they’re visible or there’s things going on in TikTok that’s playing into the SEO.
Host reliability, we already talked about. Let’s see. Visits to individual URLs. Visitors that went there over a period of time, how long they’re spent, average time spent, audience data, which looks like it’s coming from other third parties that they’re meshing that with. The depth of how long someone goes into a site and where they click through. And then summarizing this up here, too, age of links and factors of query relevancy in titles and in text of titles. Now, we go back to when we’re building out sites, it’s really important to, I use the metaphor of the book. The book is your company on the web and how do we structure with the title and then chapters and sub-chapters. And then we look at parent, child and the grandchild relevancy for navigational components and which of those should be spoken more towards the audience at hand that we’re targeting for your users.
And then how do we mesh those together with what socially, people would expect to find or how they can look for certain things and expect to find them by the terms you want to use? Or do we link those back to, or not link, but utilize more keyword related frames? Over-optimization versus generalized optimization for your audience all seems to be factors in regards to them. But links, another component that it does impact overall search ranking and that’s still a big deal. This is Yandex in Russia and it’s not necessarily everything to do with Google, but I think there’s a lot of factors here that’s going to keep them very much well-aligned in things that we have to think about in regards to when we’re doing stuff and putting together marketing strategies and outlining tactical plans on how you’re going to be getting an ad.
Yeah, for sure.
With that being said, yes, summary, Gary, what do we learn? What do we need to do and what do we need to tell clients in regards to what they should be thinking about?
I think going back to an earlier point you made, I think this information is great. It’s really informative. I think it’s useful in a way that just helps everyone kind of reboot what maybe they were thinking about and look at their strategies in a different light. But it can also be a big pile of data that you don’t really understand and you don’t really want to understand as well. I think it can maybe answer questions for you, but it also emphasizes the point of being able to go to someone, reach out to someone and say, hey, I just saw this online and it’s saying that bookmarks is a ranking factor now for websites. You go to an SEO person or you can go to whoever you’re working with and they can tell you and that they can give you an opinion on, well, is that worth the time and energy to increase people bookmarking your website?
If having a link to Wikipedia is a massive ranking factor and it’s something you care about, then it’s something that we’ll care about as well, but we can also give you an honest opinion on how easy something like that is to achieve. I think something like this is a great way for us to refresh our ideas and also be able to maybe answer questions more thoroughly and with a better understanding of what’s happening on the technical side of things. Obviously, this information is speculative, but it might help us when people have questions, and we might be able to give more, less vague answers, if that makes sense.
No, absolutely. Lastly, we have to remind everybody that Yandex, Google, they are companies and they do have shareholders and they’re intellectual property. When you go there and you type in and do a search, you’re giving into all the terms service and how that is utilized. Same with businesses when you’re submitting your sites there and it’s crawled and indexed. It’s really about how you’re showing up. If not, they have programs such as ads. I’m not sure what’s over on the Yandex side. I’m sure it’s very similar that you can utilize to be very, very targeted for keywords and where to show up where you want your users to see you, of course for a fee, if that’s going to be per click or that’s going to be on a CPM basis.
Well, that’s another addition of our new podcast. It’s the 301 Redirect has been always the name, and I’m understanding that that is up for challenge and that’s going to be changed coming up very shortly with a couple other changes that are going on. But I hit that little prematurely there, Gary. But thank you for coming on and taking time out in the afternoon to give everybody a bunch of your sharp knowledge.
Always a pleasure.
See everybody soon.
Primary Image Source: Envato Elements