In this episode of Opinionated SEO Opinions™, we are joined by Lazarina Stoy, the queen of Machine Learning for SEO.
Lazarina helps us shed light on how machine learning models can be used for SEO tasks, how to choose the best model for a particular use case, how AI and ML are different, and more. Give this exciting episode a listen and submit your SEO questions to get them featured on our next episode with a shout-out.
This episode contains special shoutouts to Danny Richman, Greg Bernhardt, Francis Rice, and Andrea Volpini for their contributions to the industry on machine learning and AI.
Don’t forget to follow TGDC on LinkedIn and Twitter!
Begüm Kaya 00:07
Hello, everyone. Welcome to this episode of Opinionated SEO Opinions™ where we're going to discuss machine learning and SEO with one and only Lazarina Stoy. Welcome.
Tory Gray 00:16
Thank you for joining us.
Lazarina Stoy 00:19
Thank you. So happy to be here. Thank you for having me.
Tory Gray 00:22
Tory Gray 00:23
Can you tell us a little bit about your background? And how you came into machine learning, perhaps?
Lazarina Stoy 00:28
Yeah, of course. So first about me, I am Lazarina Stoy, I am the SEO and Data Science Manager at an agency called Interpret Digital, the agency focuses on large-scale enterprise companies. So machine learning is really, really good for that, and data science in general.
So how I came on to machine learning was when I was studying marketing in university, I got really worried because I was working on social media at the time. And—this is kind of a long winded story, but it will make sense...
Sam Torres 01:05
I like it.
Lazarina Stoy 01:07
So I was, yeah, I was working in social media, and I thought that, you know, the big companies are going to take over my job, and I have to literally move to any other side of digital marketing. And that's how I got really into like, automation, and how—you know—like Facebook, and Google does automations on their end, and I did a dissertation about the ways that artificial intelligence (is what I called it at the time.)
I know like everyone that knows a little bit about machine learning knows that it's NOT that – this is a fancy term that you put on a deck for c-suite executives, of course—but I put it on my dissertation. So I started learning a little bit more about that and I got super, super fascinated about how that kind of fields (artificial intelligence, machine learning and data science)—how it can be implemented in marketing, and not only in social media, but all areas of marketing. And then I got my Master's in specializing in Information Management, NLP, and Machine Learning. And actually, that's how I kind of started with machine learning. And then, when I went into SEO, I thought that this was the perfect field to apply what I was learning, because obviously, like, there's so much opportunity to analyze text and like do entity recognition work and all that fun stuff that relate to machine learning?
Tory Gray 02:34
Awesome. I mean, at the risk of jumping into the questions, I would love to hear your breakdown of the difference between machine learning and artificial intelligence.
Lazarina Stoy 02:43
Yeah, sure. So it's for me, I'm not definitely not an expert on this topic. So there are definitely a lot more expertise kind of sources, and authoritative sources. But for me, the difference is that machine learning is only a subset of artificial intelligence.
Artificial Intelligence includes everything that is kind of a robust, thorough end-to-end system. So when we talk about things like, let's say, Hanson Robotics, building a robot– that robot operates on, like different machine learning components, talk-like systems, communicating with one another. And that kind of makes the robot do complex movements, let's say. But if, you know, we're just talking about machine learning, that is just like mathematical-statistical models combined with like programming logic. And it's just as simple as that, at least based on my understanding, I'm sure there will be people that are a lot more knowledgeable that will have maybe a slightly different opinion. But it's essentially kind of a simplified version that I think is a lot more manageable for day-to-day people to understand. So that's my take.
Sam Torres 03:57
You know, I really wish I had known you when RankBrain came out, because I feel like you probably would have done a really great job of explaining what it means and why we should care.
Lazarina Stoy 04:09
Yeah well, I don't know. Maybe? I think, I don't know, I wasn't around. To be honest. I'm quite new to the SEO industry in general. So I wasn't around at the time to know what, you know, what kind of educational work was happening at the time, but I would love to kind of know what you felt was missing, without like ruffling some feathers.
Sam Torres 04:33
Oh, I think honestly, it was just—you kind of already tackled it by saying like, people put AI as a label. And it's not really quite an AI because that usually infers a level of sophistication that maybe isn't in machine learning, right? Not to say that it can't have it, but machine learning it's still a very closed set of data and types of decisions where Artificial Intelligence, I would say true AI, you're gonna have like an open field, right? Anything's really going to be possible. And that's, that's what makes it really cool and also terrifying.
Lazarina Stoy 05:09
It's also really far away.
Sam Torres 05:11
Yes, that too!
Lazarina Stoy 05:14
Because it's, one cannot exist without the other, right? Like, if you have not poorly functioning but very, like narrow tasks that you can complete with—successfully complete—with machine learning, then you cannot really have like true AI, I would say.
Sam Torres 05:29
Yeah, I agree. RankBrain was just a change—and Tory, feel free to interject if I'm misquoting anything—I think maybe it was like, was it like 2016? Or 2017? When that happened? Maybe? But yeah, it's basically they just put machine learning into the algorithm itself, which I think what became really exciting about that, is that that now meant, there's no one person at Google who could even understand the algorithm. Or why it does what it does sometimes. So I find that really fascinating.
Maybe that maybe should scare me a little bit, but I actually I like it. But I also like machine learning, which is why I'm really excited to have you today. Yeah, I think there was just a lot of like, because words like artificial intelligence were being thrown around when RankBrain was introduced. There was just a lot of miseducation, misconception about "what does that mean for search results?" And I'd say even just like—understanding what does machine learning mean, was much less than what it is today for the SEO industry. So yeah, I just I kind of wish I had known you then. So that you could explain.
Because I remember trying to understand myself and trying to get on board and then train my teams that I had, because I was at an agency at the time. And it was still just like, at some point, you just have to accept that this concept is over your head. And let's move on! And I love that I feel like now as an industry, and as a profession, we're getting to a place where it's like, "No, I'm not going to move on". This can be contextualized into—into understandable measures, and like, let's talk about it and really understand what's going on. And then also love, we're going to talk about "how do you then turn that around" and use it to your own advantage to make your work that much better? So? But yes, I just wish you'd been there. Because the way you just explained AI versus machine learning. I loved it, it felt very, very tangible. So thank you.
Lazarina Stoy 07:39
Yay! Happy times. Probably at the time, I had no idea what artificial intelligence was anyway, so.... don't worry.
Sam Torres 07:48
Neither did I. I thought I did. I didn't.
Lazarina Stoy 07:51
I feel like the more you know, that's why I'm a little bit cautious with like, you know, trying to define how much you know about a certain topic and machine learning and SEO are very similar in the sense that the more you know about it, the more you realize you don't know that much about it. So there's always so many things that you can learn, and so many areas that you can become better at. And even if you are an expert in a certain field of SEO, and even if you're the best expert in that field, there's always like 15 other fields that you probably know nothing about. Or if you know something about it, you're not an expert in them. So machine learning is kind of like that, which it's, it's really good that both of the fields are like that. And we have the possibility to kind of bridge the gaps between the two.
Sam Torres 08:36
It makes it fun, right? Our job is never boring.
Tory Gray 08:38
Yeah, yep. As the reigning queen of machine learning SEO, oh, my goodness, think of what more you're gonna learn and teach us through the years as you get deeper and deeper and deeper into it.
Lazarina Stoy 08:49
So yeah, I'd love to, I'd love to. I hope I can keep up with things. That would be great.
Begüm Kaya 08:56
And the thing is, like, for example, I also experienced this when we first met at Brighton SEO in April 2022. I was very amazed by how humble, kind and collaborative and like—teaching and giving you are—even though you have so much in your pocket. So how do you do that Lazarina? And how can we help people in our industry not to be jerks?
Lazarina Stoy 09:20
Thank you so much, first of all. I try my best. I kind of go in the belief that, you know, data science, or at least the data science or machine learning that I know, doesn't have to be difficult. And I've said this before, and I'll say it again, I do like being lazy. Like if there's something that I can do without like a super huge amount of effort, I would love to do that. And you know, if I'm able to find a way to do this, I would also like to think about the other people that might benefit from this type of project, for—from this type of maybe script or tool or whatever else it may be. And I'd like to kind of enable them to do that if they—if they kind of encountered that product. So in my mind, it's always the case of information is not a source of power, unless it's shared.
Tory Gray 09:37
The application of a thing—in the new context—matters. And it is obviously meaningful. I mean, we've been experimenting in that ourselves, like keyword clustering, for example. And so based on what you were just saying, I'm curious if you've noticed any trends, because there's so many models you can follow, right? And I've seen and had shared many different ways to cluster that cluster in very different ways.
Lazarina Stoy 10:10
Like there's no point in you knowing something and you know keeping it to yourself, just because, you know? And especially in this industry, I wouldn't – and I know that a lot of people would agree with this, I wouldn't be where I am, if there weren't so many resources available online on how to do certain things. So I always kind of wanted to be part of that wave of people that are, you know, super open, collaborative, and, you know, sharing their knowledge, instead of you know, saying how good I am at something and never actually doing it.
With machine learning, I definitely have a little bit of – maybe a little bit of – hypocrisy in that manner, because I do try to share as much as possible, but because the work that I'm doing is never super groundbreaking, I'm very cautious at sharing certain scripts, because I always think like, this is something that you can find online. So hence why I'm a little bit more focused on maybe trying to show the process of how I got there, maybe how it can be implemented in a certain task that I'm doing, like an internal linking audit, or a SERP analysis or whatever it may be. But I can definitely say I don't feel like I'm doing something groundbreaking, because you know, everything that I'm working based off, has been created in the field of machine learning, like maybe 10, or 15 years ago. So it's always like, "Oh, I just discovered this thing. And it's been around for a ton of time. Like, Why? Why are people not using this type of thing?"
Tory Gray 11:57
I'm curious if you've noticed any trends in ways in which are there certain models that work best? Like for certain industries, or keyword types, or stages of the customer funnel? Or like, how would you go about choosing the right model? Or do you just run a bunch of models and see what happens?
Lazarina Stoy 12:18
That's a good question. Sometimes, yes. But I 100% agree with what you're saying, like so many models to choose, and especially for every single task, and keyword clustering has definitely been one of the ones that have really experienced a boom in the past few months. I think Lee Foot is the one to blame for this—his awesome scripts and apps and everything.
The way that I would go about choosing a model would be first to understand how it evaluates the similarity between certain keywords. So essentially, I've talked about this before, but just like a quick recap: when you have a certain task, and you're trying to see whether a machine learning model can help you with this task, it would be good to kind of understand what you're trying to achieve.
So using the example keyword clustering, you would like to see what keywords are similar to one another, and then kind of find that Core term that defines that cluster, let's say or that similarity, right? So which one is the most definitive out of all of those for that group of keywords? So knowing that, you immediately think like, okay, so how is that similarity evaluated? And there are different methods to do this.
Initially, earlier models did this by simply changing characters. So let's say for example, you have the keyword "shoe" and the keyword "shoe lace". How many changes of characters does it take from one from the one keyword to go to the other one, right, and this is what is called Fuzzy Matching, a type of modeling. And while whilst it works, you know, it can have very bad results sometimes, because like, you can go from shoe to shoe lace, but you can equally go from shoe to something totally different that has a second word with four like-letters, right?
So from that point of view, that similarity score that you get might, not be as relevant. And you know, you might get results that you think like, "hum—I'm not really sure that that is semantically related". So that's where, you know, later models, like BERT, for instance, would be, I think GPT-3 also does this, but I might be wrong on this. But BERT for sure has this kind of capability in order to just assess the semantics of the word.
So–what it means; is a synonym; is it related; what is the vector representation look like; how close are these words to one another? And that's how it bases the decision to kind of cluster these words together. So from that point of view, I would always go with a model that is a little bit more sophisticated. But I completely understand that if you are just learning about keyword clustering, there's no way that you would have this insight, right? Every time I would say—just do a little bit of research on what that task involves, like—what do the what do the models actually do?
And then a little bit of research of what this particular concept means. Like, if it's a distance-based model, how, what does that mean even? And if it's a semantics model, like what does that?
Tory Gray 15:18
Yeah, I mean, just in general, like the–the model of really understanding the inputs, so you can understand better the outputs. And I think sometimes it's a challenge, because you don't always get out what you expected. But then you can go back to the: "Okay, well, we can test more than one. And we can see what the results are." And, you know, that's something I hope to get to over time–really understanding, you know, of the more advanced models, what you'd apply in, like—is it about differences in the customer funnel? Or like, because different keyword stages, right, have different intents? So maybe some are better at some places than others? So that's something I think we're going to be working to experiment in over time.
Lazarina Stoy 16:02
Yeah, yeah. 100%. I think also, when it comes to experimentation, something that helps a lot would be...... and that's kind of like backwards thinking, especially for people that are like myself, I'm not very good at coding, not even a little bit to be honest with you! I'm like, I'm outrageously poor at coding. I know this for myself, for sure. And when–when you kind of have that challenge, I always try to improve. But at the same time, we all work in an agency, we all know, kind of like what–what that looks like in terms of day to day.
And because there are so many successful examples of people doing that—Danny Richman is one that comes to mind, especially using HTTP3 and in Google Sheets—you can really kind of copy the pattern of the code and then just replace the model because there are so many available models that you can literally channel in a couple of lines of code, right? So if coding is kind of the blocker in terms of what's stopping you from experiment[ing], it's not really that—something that you should be that afraid of, because like you have Apps Script, you have Jupiter Notebooks, you have a ton of resources on GitHub and StackOverflow. And like, it's just a matter of, you know, trying to understand the logic behind it, instead of—you know—getting stuck in an error or maybe like not knowing where to start from.
Tory Gray 17:53
Totally. Tinkering with it, little bits over time goes a long way.
Lazarina Stoy 17:58
Yeah, I think so that 1% rule, right? 1% better every day, it's a lot better than you know, trying to, you know, be 100% from day one to day two, right? Like, that never happens. So yeah, it's a learning process. For sure. It's definitely a learning process for me.
Begüm Kaya 18:14
I mean, obviously, I really recommend taking a look at your "The Beginner's Guide to Machine Learning for SEOs" to get started around these. And most of the things that you mentioned here are kind of there too. But here are some questions to intrigue people into, like using machine learning for SEO.
Begüm Kaya 18:31
So the first one is from Lidia Infante. She wants to know some ways to combine machine learning and human brain to find schema types that are right for a website and implement them. Implement them. How do we do that?
Lazarina Stoy 18:45
Yeah, that's actually—I feel from all of the questions, some things that I was very fascinated by was that there was a lot of them related to schema. So I'm really interested in this because that's something—a project that I've been working on and trying to kind of test around with. But the long and short answer is that yes, it's possible.
But there are a lot of buts. Like, there's—anything when it comes to like data science or trying to build any sort of program to do something for you, you can make it as easy or as complex as you'd like. So for this particular task, I was trying to think like, "what would be the easiest way to do this?" And I started thinking like, "why aren't CMS providers providing a service like that?" Because it's a simple case of, for me personally, like—"if else" statements, right, like if you're writing a page or publishing a page, and you have a how to in the title and all of your H1s are step one, step two, then the CMS could potentially recommend, "hey, would you like to add a how to schema?" Or if you have the word recipe more than 10 times then you can have a recipe schema. If you have product or related semantic variation of the word product. Like product description, product size guide, or whatever else, that same kind of tool, or let's say the CMS, could recommend that type of schema.
But I think the challenge here and why it's a little bit hard to build, especially as an external tool, I think as an internal would be a little bit easier because they have the capabilities for this. But as an external tool, you would have to then automatically generate the schema. And then with more complex schema types, like how to schema for instance, there's a lot of potential for error. So there, there would be a lot of investment needed in order to kind of get this right at a massive scale and kind of roll this out as a product.
Tory Gray 20:45
And like individualized fields in the CMS to structure that.
Lazarina Stoy 20:50
Absolutely. And even if even if it's just something as simple as you know, adding a pop-up to say, "Would you like to add this type of schema?" then the easiest thing you can do is kind of link to the Google's guide on how to implement it. But then would that be enough for the user? Most likely not, because if we're talking about people that are using this kind of service, but they're not SEO professionals, then it will be quite hard to do that.
But something that you can absolutely do, kind of like as a workaround, the lazy mode! Of this entire thing would be to scrape the content of the website, depending on—of course—the size, or at least try to do a crawl with the certain kind of keywords that would potentially describe or indicate that a certain type of schema could be implemented on that kind of page, and running that crawl or exporting the content from the website, you will be able to identify maybe pages that would be suitable for that kind of work. So maybe, if you have How To pages on your website, run a crawl and try to find those kinds of words.
Then run another crawl to see if you have different question-based keywords. So let's say for instance, if you have, at a certain page more than five times the word—the words, how, by which words that are kind of very typical for informational intent, right? So this could indicate that maybe FAQ schema could be added to this page. So it's kind of like working backwards, when you know that the solution like this doesn't exist.
And for this particular one, I definitely think that machine learning would not be the answer here, it would just be kind of like a program that runs on if else statements, I'm definitely sure that you can implement machine learning for this, but it might just be an over-complication. You know, having in mind that at the end, you will have to also find a way to automatically generate the schema as well. But it can be done machine.
Tory Gray 22:46
Yeah, I think it'd be something like an SEO tool could implement—a Screaming Frog, or anything that's really just crawling your website that says, "Hey, I've identified that you have a series of pages that have recipe in the URL, could this be recipe schema?" And it could like, you know, it's an opportunity section, it could identify that or most companies can use organizational schema, or you have product pages, because you're an eCommerce site, like I can tell it's literally in the URL, or many times on the page that it's this like, and it wouldn't just be like definitive, because you know, the implementation of the schema is one thing, but the identification of the opportunities is also obviously a thing.
Lazarina Stoy 23:29
It was a very loaded question. Right, like, Lydia, really beautiful. Yeah, not only identification, but implementation as well. I love it. Yeah. I wish!
Sam Torres 23:44
I do agree, it would be... it's probably not machine learning, because we do quite a bit of like, structured data audits. So either it's just auditing what they already have, or finding opportunities for different structure data to find. We also will typically add structured data types that maybe aren't in Google's guidelines, but are part of schema.org. And there's been times we've been ahead of the curve, and like, it got accepted. And it's like, "oh, we already have it!" Um... So there's definitely been wins there.
Plus, I just love structured data, because it's just, it's such a good way of saying like, "this is the stuff that's important." Like, "this is what I need you to know", right? There's so many ways to go about how to do that. But certainly, on the implementation, it's like, at that point, you almost need to like have... I would think like you say, like a pop up with like, "Hey, do you want to add this?" And then I would envision like, if they say yes, then it opens up another pop-up, and it's like, "Okay, here's the fields to enter the information," and it automatically contains the syntax.
But updating that sounds like it could be – keeping it up to date – I should say, sounds like it could be a bit of a bear.
Tory Gray 24:55
Especially as new fields are added, right? Or they've become required– something that was once recommended is now required or is recommended... and now, it helps you stand out from that and like adding all the new things all the time.
Sam Torres 25:10
Yeah, so I could see maybe that would be the way to do it, but I agree with what you said about how that would be much easier to do internally. Right? For an internal system with maybe some customized CMS logic. I am really curious, as you know, I think, is it—oh I never remember their name, but the big schema company.
Begüm Kaya 25:32
The Schema App?
Sam Torres 25:34
Schema App? Is that it? Is that really what it's called? Yeah. To learn how they do—do more of what they do behind the scenes, because then obviously, they offer all the implementation methods. So anyways...
Begüm Kaya 25:48
Next up, we have a question from Anett Pohl. She is looking for an easy way to use machine learning for redirect mapping and how can we do it?
Lazarina Stoy 25:57
Yes, so I'm actually going to refer you here to something that I touched upon earlier on the call, which is Fuzzy Matching. And we already explained the logic of this. So kind of like the distance between the certain things. Now, this is an application where fuzzy matching would be very good because especially with redirect mapping, you would need to identify whether there is similarity in the URLs. And also, if you are also moving the URLs, you might need logic that also reads the page as it's doing that.
Explore: Ultimate End-to-End Guide to Fuzzy Matching For SEOs (with Google Sheets template) - a guide from Lazarina
Lazarina Stoy 26:30
So I'm going to refer you here to the scripts and tools of a couple of people. The first one is Greg Bernhardt, if you are interested in Python, machine learning, any sort of scripts, tests, tools, anything– like he is amazing with that. So he has a tool specifically for redirect mapping. And he has also a tool to evaluate content similarity. So both of these things can be done with fuzzy matching.
Lazarina Stoy 26:56
And then there's another tool with literally the same purpose by Francis Angelo Reyes. And I will send the links after the podcasts and you know, everyone will be able to access them. So that's essentially how that will be done using machine learning. You just audit in bulk, and then it spits out a similarity score. So you can review the ones that are not similar or not identified as similar.
Sam Torres 27:24
Tory Gray 27:26
Begüm Kaya 27:30
We haven't done it we have it. Yeah, I'm loving this, like educating marketers about topic modeling algorithms and everything. Just, it's amazing.
Begüm Kaya 27:40
So on the content side, like there are lots of cases you covered from forming intent, first information architecture to auto-generating meta tags and everything. So we have another content-related question, which is on machine learning for more content, SEO. If there is a way we can utilize much machine learning to identify and answer questions to generate FAQ questions.
Lazarina Stoy 28:06
Yes, there is a way. I said like, lots of questions in relation to schema. That's very fascinating. I definitely did not expect that. So thank you so much for the question. I don't remember who asked this question. Maybe Begüm, you'll be able to tell me.
Begüm Kaya 28:25
It was Anett. Yeah. It was Anett again.
Lazarina Stoy 28:28
Well, thank you, Anett. I actually did a lot of research for this question, because I was really fascinated by it as well.
So let me introduce you to Haystack. So Haystack has a question and answering kind-of capability. And it's kind of like an NLP library or tool that you can use. And there's a ton of code out there. So I'm not going to go through any of the like technical steps in order how to do it, I'm just going to give a very generic overview of what the steps would look like in terms of doing that.
As much as I understand the question this would be about: you have a webpage, and you want to ask certain questions for this webpage and see whether it can answer them. So in order to do this, you need to, first of all, obviously, import your code and install all the different libraries, I will attach a tutorial for this whenever and however, we can do that.
But essentially, you are using a model that does information extraction with a feature of question answering and that is something that Haystack does. So in order to do this, you essentially pre-process your pages or the content on the pages. This can also be used for things like tables and a PDF and you can convert them to, you know, having them pre-processed, convert them to text, and then convert them to vector representations or word embeddings. Essentially, you have to put your text in a format that is machine-readable and that is suitable for doing machine learning tasks.
And then once you have that you essentially build a pipeline. Which is, again, sounds extremely complex, it's something that you do with a single line of code– no super fancy stuff happening there. And after that, you just input the questions that you would like... kind of the content to answer. And it puts out the response based on kind of processing the text that is available. This kind of work typically works better for questions that are quite straightforward.
So let's say for example, you have an export of Google Analytics, and you want to answer specific questions based on data contained in the cells; it might be better suited for that kind of work. So you would like to ask "how many sessions did we receive in the week of this, and this, like two weeks ago", and if you have that information somewhere on the text or in the table, you would be able to get that response very accurately.
When it comes to kind of reformatting and restructuring the text, I would be a little bit more interested in knowing specifically how Haystack does that, whether it is via summarizing the text on the page, or via extracting certain sentences and building the response based on these sentences. Because these will be two very different things. When it comes to kind of paraphrasing what it means, it might not be suitable for every industry, because there might be industries where the syntax, the format, is very, very important. And it's important for that to be kept as it is. And when it comes to like maybe industries that are a little bit more laid back, like recipe blogs, or I don't—I don't want to diss on certain industries, but there's definitely some that are not as straightforward as others.
And you know, you have a lot of more leeway to experiment with paraphrasing and machine learning when it comes to content. And in those kinds of industries, it would be good to kind of experiment with that a little bit more freely, let's say. So yeah, so long story short– Haystack, super easy. I checked the code. I tested it. I think it's like maybe seven or eight lines of code, altogether. But it's just a matter of how you want to customize it afterwards. And you know, how you make it work based on the certain page or industry or, you know, website that you're analyzing, so will it work for Anett's case? Maybe it will, maybe it won't? I don't know. Don't hold me accountable!
Tory Gray 32:36
I would think like... how advanced the subject matter would be—it would matter a lot. Like if it's 101 content, then there's going to be a lot of content on the web to choose from, or it's going to be more straightforward. But if it requires a lot of context-setting or a lot of nuance, that's always going to be harder for a model too.
Sam Torres 32:53
I also love that you bring up—and I think this is something... so actually, Tory and I, when we were at Women In Tech Fest, and we were talking about this a little bit with Lily—of how much curation after machine learning is important. And like you need to review the results. Don't accept things like...
I think for us, at Gray Dot, that's really been the powerful thing about machine learning for us is not necessarily that it's automatically spitting out all the answers. But it's helping us figure out where to focus our time to do, like, custom review and curation. Like I think back to—there's a project we did, and you know, I feel like for most of my career, when it came to SEO keyword research, you'd have a set of maybe eight to 10,000 keywords—would be like the largest you would do at a time, right? And of course, you're you'd go a little bit cross-eyed.
There's a reason why the word furniture has no meaning to me anymore, right!
But because of models that actually, Lazarina, you sent to me. And so it was a huge help, and thank you. We were able to take a dataset of like 116,000 keywords, coupled with YouTube data of over 16,000 videos, and start to really look at and find trends and meaningful themes of content.
It's just... that's what's really exciting to me. It's just—it enables you to take these huge datasets that would have been literally impossible to really glean anything useful out of, and you'd have to chunk it up, and there might be things that you missed in the past. And I just feel like it's... I guess my question for you next would be: how much time do you spend building the models, testing the models versus like, alright, I have my results and now I'm going to custom curate and like modify it myself.
Lazarina Stoy 34:52
Good question. So first of all, super out I agree 100% with what you're saying. There's another step before building the models and doing all of that, which is actually cleaning the thing that you're going to use for your model, and before that even, you know, like really defining your problem, because I'm a huge advocate of that—if you don't have a defined problem, and a task that you want to achieve, you're never going to be able to find, like, what—what model can help you to do that.
So I've been quite lucky in the past that when I have been experimenting with models, at least in the SEO work that I do, most of the time, I'm not building them from scratch. And I, as I said, like, I'm super spoiled in the fact that there are so many available pre-trained models, and so many of them are also part of like a tech giant's network, so Google, AWS, and everything in relation to that, like Microsoft Azure.
And when you have that kind of network, you can actually focus on, you know, defining what algorithm, maybe you would like to test, and then defining on what the process—that that algorithm does. And then after that focusing your entire attention into turning the output into something that you can use, or actually building a case for why machine learning is not useful for this type of problem. And then maybe revisiting that problem, a couple years down the line, or maybe like six months down the line.
Because there are certain problems where machine learning is not going to be something that I would immediately jump into. And there's like, good reason for that. Because maybe it's not- whilst you can do it with machine learning- like, would you want to deliver that to your client? Maybe not. Like, you definitely have to spend some time to evaluate the output. So from that point of view, I would definitely say like, evaluating, and you know, making sure that everything that happens is something that would be of the standard of maybe a person doing that—or better.
That's where it passes the line in terms of, okay, we can work with this, or maybe like, okay, let's crush that. Let's find another way to do this. So, so yeah, I think I don't know if I answered your question. But I spend a lot of time in actually ensuring that the output is good enough, and it's something that can be used, as you know, a client deliverable. And you know, it is something that is actually bringing value.
And then another thing that I find myself spending a lot of time with is ensuring that—because you guys seem extremely lucky that all of you are fascinated with machine learning, and you know, you work together and you maybe do a little bit of coding and all of that fun stuff. But no agency, not all agencies are like that. And maybe not all, even technical SEO departments are like that.
So I spend a lot of time also trying to figure out if that is something that maybe my teammates can use, without having that technical knowledge that I have. So maybe they're not as familiar with machine learning. They're not as familiar with maybe, how to even use the Jupiter Colab Notebook, maybe like how to use App Script and things like that. So trying to kind of put the code that I do in a way that is accessible to people that's quite important for me. And if it’s as easy as possible, if I can find the script that already exists for that, then that is perfect. So having that accessibility for team members is also quite important as well.
Tory Gray 38:34
It is and I think it's also really important to like—set expectations well, because we do see people that come in and just want "scale scale scale!", and they expect to like turn on a machine learning model and suddenly, all your alt texts for everywhere is generated, or all your metadata everywhere is magically created. And you can you know, what was a 20 to 60-hour project is suddenly an hour and you're done and you can move on. There needs to be a curation quality step to it. And I mean, the other thing, if you have that expectation is you can run this model, and then you can look at it and be like, it's not perfect, throw it out—when really maybe it saved you eight hours. And so that's still net less time you have to spend as a human curating and fixing the details. It's still time savings that's still valuable, and a better place to start.
Sam Torres 39:22
So much easier to edit and to create, right? Just so much easier.
Lazarina Stoy 39:28
Absolutely. And also because for me, and I know so many people from my team are exactly like that when you have a task that you're just starting and you're just staring at a blank piece of paper or like a blank spreadsheet and you know, you have like 1000s of URLs to do something for.
And it's like- How do you even start? You know, like just—just the fact that in your mind you're thinking like, "oh my god, I have to go through like 10,000 images and write the alt text from scratch. It's like, no, like you don't have to. Like, there's definitely a step in QA. And that's a very, very important step. And it's not only in like alt text generation or meta descriptions, or like.
I've attended a talk for, sorry, Andrea Volpini. He talks about how he used GPT-3 in terms of generating descriptions for products. And he also talked about how he QA'd GPT-3 to generate those descriptions. So there's always that step of QA, but, you know, alleviating the pressure from starting from scratch, I think it's something that machine learning helps with a lot. And Tory, 100%.
Like, if you, if you're working with stakeholders, and clients, and even teammates that really expect the machine learning model to be the end all be all, I always say that there's not a single model that will work, like out of the box, and you know, for everything like that... just doesn't exist. So if it's good at one thing, it might be good, for instance, for, you know, redirect mapping, identification, but is it going to be good for keyword clustering? Maybe not. Like, that's not—not how it works. So testing is important.
Begüm Kaya 41:07
And when we said that I kind of thought about Jarvis because you know, people have been going crazy about its capabilities and everything. But yeah, it removes that stress element from starting from scratch. Yes, but it still needs some sort of curation. So even though it saves time, you still need human to be there and be focusing on the work.
Tory Gray 41:31
Sam Torres 41:33
And I think that's... that's probably just true across the board with SEO, right? It's always we should be working in creating content and strategizing for audiences and for people. It'd be great if it was only for other robots, but, you know, people matter too!
Begüm Kaya 41:53
Some quite industry-dependent, I would say, would you agree? Like—the capabilities are very dependent on the industry.
Tory Gray 42:05
Capabilities of what?
Begüm Kaya 42:07
Capabilities of like machine learning and the content creation, like automatic content creation?
Sam Torres 42:14
Well, yes. And actually, I'm, I will say, I'm not a huge fan of the automatically generated like, AI content. And I will say I use some of them, like I'll have content, and I'm like, I just need to figure out another way to word it. I use that a lot. But yeah, AI-generated content. I just don't think it's useful.
Sam Torres 42:37
Right most of the time, because like, really, and we've talked about this, and we're getting a little off-topic. But when it comes to writing content, it's going to be meaningful if you're adding your own unique perspective, if you're framing it in a different way. And that's not something that an AI is going to recreate.
Explore this further in Ep. 4 of Opinionated SEO Opinions: Opinions About The Future of Content
Tory Gray 42:56
Because its definition pulling stuff from out there in the world and mashing it up. So it is by definition, not original. Yeah, what unique, value will you bring.
Sam Torres 43:08
Yeah, so I can like I feel it's the same as like—when using, you know, like topic modeling or topic clustering models. It's really great at maybe helping you start with something and then edit from there. But I certainly... I would not post AI-generated content. But also, our blogs are already like 6500 words. So clearly, we are the extreme and maybe people shouldn't always do exactly what we do.
Tory Gray 43:38
Well, I think it can be helpful if like a starting point. So not.. even if you decide not to take it and use it. There's something about seeing something and reacting to it. So if you were paralyzed before and you didn't know where to start, once you read something, you're like, "Oh, this is crap and this is why." Now you have a point of view, and now you can get started. So even if that's the case, and you end up throwing out what was there, you still started with something. And that's still more than you had before. And that's valuable.
Sam Torres 44:07
Yeah. I agree.
Lazarina Stoy 44:09
An interesting discussion when it comes to like AI-generated content, is that there is a lot of work done in machine learning at the moment. And I'm not sure whether that is isolated to only the academic community, or maybe like, the companies are also doing that kind of work, but maybe not yet public about it. And the work is just about identifying whether a certain content piece follows the same kind of tone of voice. And like writer's style as other contents published from the same author.
So essentially figuring out whether this aligns with the way that the author typically words things. So when you think about this kind of research being done, and that was something that I used to do in university, kind of like, as testing. You know, in lab—the fact that this kind of work is happening in labs in universities means that there is definitely a way for AI-generated content to be identified. And if the patterns of what AI-generated content means, you know, going beyond, you know, those articles that are really generated based on certain patterns.
We're talking here about like freely-generated content, like content writers, like Jarvis, for instance, and models GPT-3, that say, for instance, that at one point, there is a clear identification of content being written by a model GPT-3 or GPT-J or whatever other model it may be.
From that point, you do not want to be on the side of SEO that just literally vanished yourself on the internet from day one to day two. Right? So I completely agree, that wouldn't be something that I would include in my strategy without any QA, without any sort of, you know, having editing, having, you know, using that as inspiration is good. You know, I think it can definitely alleviate writer's block, but then basing your entire content strategy on this, I definitely do not want to be on the side of this when this happens. Because it's, it's inevitable. It's kind of like, you know, the bad link-building practice, you know, Blackhat link building 10 years ago, you don't want to be on that side. So why do it again, right?
Sam Torres 46:28
Yeah, I agree. And I think also, for me, one of the things that when it comes to all of this, like machine learning or AI-generated content, however you want to phrase it, I'm not even sure if machine learning content is the right thing for some of the tools? I'm not sure. Um, but there's also just, there's so much bias in a lot of these models. And like—let's stop perpetuating the bias and just post it, then that's exactly what you're doing great.
Lazarina Stoy 47:05
If you think about it, there's no way that this bias would not trickle down into how these models are building the content, because they're inevitably being trained on historic content. And we've only just started seeing those kind of societal changes and waves within the past, like, what—20 years? So if you think about the amount of data that it took to train a model, like GPT-3, like billions and billions of pages and like historic, you know, work. You know, how would you even go about, you know, capturing that bias and, you know, working towards it, if you're just pushing out thousands of content pieces a day, right? It's impossible. So completely agree.
Sam Torres 47:50
Well, we're gonna end on that happy note. Bias.
Lazarina Stoy 47:55
Bias in content.
Sam Torres 47:57
And yeah, the ethics of machine learning. Yay!!!
Lazarina Stoy 48:02
This can be a webinar on its own. Absolutely.
Sam Torres 48:05
Actually, I would love to do that. Because I find it absolutely fascinating. And I think it's also just one of those topics that the more we talk about it, the more we become aware of it. You know, it's just, the education has to come first before any real change. So. Let's do it!
Lazarina Stoy 48:22
Absolutely, I think both- Oh, sorry.
Begüm Kaya 48:25
No, no, no go ahead.
Lazarina Stoy 48:27
I was going to say on the topic of ethics and machine learning, there's a very big discussion on how fast legislation is moving in relation to the developments in machine learning. And you know, having even having the discussions about ethics is extremely delayed based on the pace of development, but then having legislation pay attention to what is happening is like a whole other story. So that is even, like decades away, then.
Sam Torres 48:57
Yeah, that just reminds me here in the US with all the congressional hearings about Facebook, and it's just like, clearly, the legislators don't even know what the internet is. Yeah.
Tory Gray 49:11
Tubes, Sam. It's a series of tubes.
Sam Torres 49:14
Yeah, totally. Exactly. Yeah. 100%.
Lazarina Stoy 49:17
It's very difficult sometimes to kind of escape what you know, and what your reality is, and your role and your industry. And kind of, you know, go talk to someone that, you know, doesn't work in that industry, it doesn't understand the concepts that you understand and try to persuade them, you know, of anything! It's really, really difficult. And then when you think about those people, their reality is totally different. And for them this is just, you know, another day, you know, they don't really, they don't get super involved in those things as much as someone working from this—in this industry would be. Yeah.
Tory Gray 49:25
Sam Torres 49:40
Well, thank you so much for joining us today. This was fantastic.
Lazarina Stoy 50:00
Thank you for having me. It was great talking to you.
Tory Gray 50:03
It's so concrete. There's so many applications for people to start experimenting today, right? LThat's, that's wonderful. Tory Gray 50:11
Thanks for joining us for another episode of Opinionated SEO Opinions™. We'd love for you to submit your questions. You can do that—go to TheGray.Company/Ask-Seo-Questions or I think just questions. We also set it up to make it nice and easy.
Tory Gray 50:27
Let us know what you want to know and we will start answering questions again. Thanks so much for your time.
Don’t forget to follow TGDC on LinkedIn, Youtube and Twitter, where we share cool documents, expert opinions and useful tips on SEO.