PL-300 – Section 27: Part 3 Level 5 – Text Functions

  • By
  • May 13, 2023
0 Comment

201. Text Searching

Now, in this section we’ll be looking at the various text functions. And we’ll start with FIND and SEARCH. They’re almost exactly the same as in Excel, but there is one particular hitch that you need to be aware of.

So, FIND, will find a particular phrase within another phrase. Or, maybe, a letter within a phrase. By default, starting at the first position. So, right at the beginning and if not starting wherever you want it to start.

Now, this is one sensitive, base one. M is base zero. So, for instance, “Good morning. Hi there.” The letter G in M is at position zero in DAX, it’s at position number one. So, if I do a FIND for G within there and I start at position number one, then it will give me number one. Now, the one is, indeed, optional, but I will tell you why it’s not really optional in just a few minutes.

So, if I change that to the letter H, we can see it says, “It’s at position number 15.” But if I change the H, the capital H from upper to lower case, then it won’t find it in the word hi. So, if I change this to hi, it will give me an error.

Now, errors in data models are not a good idea, and really you need to avoid them. So, that’s why there is this fourth argument. Value not found. And, please, can we have this in Excel? It’s really good, if anyone from Microsoft is listening. Because right now I have to say IFERROR, then give me whatever else, like a blank. But I don’t have to do that in Power BI. Instead, I just say, “Find the text ‘hi’ in whatever text it is starting at position one and if it’s not there than give me something else like the number zero.”

Now, as I said, this won’t find hi as a capital as a lower case H-I, because we have a capital H-I in here. But if you change this from the FIND function to the SEARCH function, then it will because SEARCH is case-insensitive. So, it doesn’t matter whether it is lower or upper case. However, it is still accent sensitive, so it will treat an E with an accent differently from an E without an accent. So, what I had to do with the FIND function before the SEARCH function was introduced in I think 2003 is say well I want the upper of this text, convert that all into upper case. And then I will look for hi in upper. Oh, I’ve got that the wrong way around. But I had to convert both arguments into upper case.

So, let’s have a look at it in Power BI. I’m in the subcategory table and what I want to do is look for the word bike in the English part of the subcategory name. So, a new column, because I want this to be case-insensitive. I don’t care if it’s a lower case bike or an upper case bike. I’m going to use the SEARCH function. So, I’m saying bike location equals search. And I’m searching within the English product subcategory name and that’s the second argument. What I’m looking for which I need to put first is bike. So, I’m searching for bike within the English product subcategory name.

Now, there are hard brackets around start position and NotFoundValue. So, because there are hard brackets, the computer is saying they are optional. So, there we go. Press enter and it’s going to give me the location of all of the word bikes. Except it doesn’t, it gives me an error throughout.

So, what’s going on? Well, it says that the search text could not be found. Okay, so let’s go look. Row one, yep there’s bike, row two yes, there’s bike, row three there it is, row four. Ah, the computer goes, “okay, can’t find the word bike “in here, what do I do?” Well, I will look at the SEARCH function for the NotFoundValue, there isn’t a NotFoundValue therefore I’m going to write an error.

Now, because there is an error in one row, this permeates the entirety of the column even those rows where it has been found. So, whilst the hard brackets indicate that this fourth argument is optional, it isn’t. Every time, specify a start position and every time specify a NotFoundValue, it could be zero or it could be blank or it could be something else. Because if it’s just one wrong row, you might find the entire column becomes an error. So, now, we’ve got the right answer. We have got bikes starting at position number 10, six, nine and everything else, but for couple down here are blank. So that’s how to use SEARCH and FIND, while SEARCH I think is a bit more useful because it’s case-insensitive.

Now, what if I wanted to change all of the bikes to mopeds. Well, I can do that using the SUBSTITUTE function. So, let’s go look at the SUBSTITUTE function; it replaces existing text with new text in a text string. And there are four arguments. The text in the words what might be altered. And then what is going to be altered, and what is going to replace it, and then an instant number.

So, let’s just have a look at this SUBSTITUTE function. So, I am going to substitute in the English product subcategory name the word bike, notice this is lower case. And I’m going to replace it with the word moped. And you notice the instance number is optional. So, let’s do that and we find absolutely nothing happened. Because SUBSTITUTE is case-sensitive. So, we will need a capital B and a Capital M. And now we’ve got our Mountain Mopeds, Road Mopeds and Touring Mopeds.

Now, if you wanted to capture both bike and moped. So, bike has a capital B and a lower case B. Then what we would need to have is two substitutes. One looking for the capital and one looking for the lower case.

Now, there is a fourth argument which is optional. It says, “Okay, I want you to find the first instance, the second instance, the fourth instance, and only replace that.” If you don’t put in an instance number, then it’s not going to just do the first instance, it’s going to do all of the times that the word bike appears in that particular string. Which might be what you want and is generally for me what I do want.

So, these are the text substitution and searching functions, so just remember SEARCH is case-insensitive and FIND is case-sensitive. But for both of these functions, I strongly recommend using all four arguments.

202. Text Extraction and Substitution

Text extraction can be done using some very familiar functions, left, mid, and right. They work exactly as they do in Excel, and as I’ve said in the previous video, DAX is base one.

So, the first character is character number one instead of M, where the first character is number zero. So, probably not that much of an introduction that’s needed. Left takes the left two characters, for instance. Mid takes from character position X for as many characters as you want. And right takes the right-hand most characters.

So, let’s take an example in Power BI. We’ve got in the fact internet sales, this sales order number, which starts with SO. Now, suppose, I wanted not the SO, but I wanted to extract the number, which is the last of five characters. Maybe, I want then to convert it into a number and so forth. So, I can do that using a right. So, my extracted text is at right of the sales order number. And it’s the right-hand most five characters. So, there’s the extracted text. But I could say, “Well, I want from the third position for five characters.” So that we should be using the mid function. So, I want from the third character for five characters. It gives me exactly the same result in this particular instance because I know that there are seven characters throughout.

Now, suppose, I didn’t know that there were seven characters throughout, and I wanted the first three digits. So, maybe, five or nine has this particular departmental meaning. So, what I could do is say well, I want from the third position for three characters. Perfectly possible, but suppose the SO that we got at the beginning needn’t just be two characters. Maybe, it could be one. Maybe, it could be five. So, what I’m trying to get is from the third character to the fifth character, but it could be a variable length. Suppose that I had elsewhere, this, and I still wanted these three characters.

Now, what we know about these characters is that they are five characters away from the end. So, here we have the length being seven, and we want to go back five. So, we can get the length of the entirety, which is seven, subtract five. But that won’t get us to where we want to be. We want to be not in the second position, but the third. And so we have to add one to that. And same here, the length is eight. We want to get to the fourth position, and so we go back five, but we still have to add one to get four. So, let’s see that in action. So, I want to take the length of the sales order, deduct five, but then add one.

Now, quite often, I do have the plus one separately because this five might be a variable number as well. It’s the plus one that’s really important. And then take three characters. So that is how it’d would work with a variable number of characters. And I just wanted to go from the end rather than from the beginning.

Now, suppose I wanted to replace the SO with, say, sales order. Well, there are many ways of doing that. So, the first way is to take from a particular position. So, it could be from the third position if we know that that’s when the characters end, when the letters end, and take five characters. But we might not know that it’s five characters. We might want a huge number, and we could say 999 characters, which is fine, as long as we know that this sales order number is not going to be greater than 1,001 characters. So, the 999 is some number way beyond what we needed.

Now, we could do it scientifically and say give me the sales order number length and minus two from that. So that is another way of doing it, more accurate way perhaps. But quite frankly, I just prefer using 999. It’s a lot simpler than using that. And then from there, we can add sales order right to the beginning and drag them together using an ampersand. So that’s one way of doing it.

Another way, well, we had a look in the previous video about substitute. So, I could substitute in sales order number, get rid of SO, and give me sales order. So, that is another way of doing it.

But what if instead of saying, well, I want to replace this very specific text, and it could be a not specific text. I want instead to replace the first two characters with sales order. And that’s when the replace comes in. So, this uses the same start number, number of characters, as a lot of these other functions do but then replaces it with new text.

So, the difference between replace and substitute is substitute, we know what text we are trying to replace. In the replace function, we know where it. So, that’s why I put it into the extract functions. So, I’m going to now replace fact internet sales. I’m going to replace from the first character onwards for two characters. I’m going to get rid of those and substitute or replace it with sales order. And that’s just yet one more alternative way of doing things.

So, just to give a further example, suppose I had the text, hi, my name is something, what’s yours. Well, I could do that by saying I want the first 20 characters and then say Philip and then continue from character number 22 for as many characters as I want. Alternatively, I could substitute the hash with Philip, or alternatively, I can replace what’s there at the 21st character, the one character, with the word Philip. All giving the same answer, but we’ve got different criteria.

203. Text Conversion

Now, we have previously done implicit conversions. For example, suppose I had the string, product key, and I joined to that, the field ProductKey, which is a whole number. So, ProductKey’s a number. The string, Product Key, is text, and the computer then combines the two together. It converts the number into a string so it can use the ampersand to concatenate them. Now, that’s an implicit conversion, it’s the computer deciding how to convert one field type into another. Now, this is to a number, and just to prove it’s a number, I’m just going to add 1,000 to it. And you can see it operates as a number before it gets converted. So that’s implicit conversions. I have relatively little say as to how it gets converted.

Now, for explicit conversions, if you need more control, there are three functions. The first one is FIXED. This rounds a number to a certain number of decimal places, and says, “Do you want commas?” And this third argument is a bit confusing because it’s actually asking, “Do you want no commas?” So yes, I want no commas, or no, I don’t want no commas. So, let’s just see how this works, so if I have the word FIXED, the function FIXED, I want to convert it to zero decimal places, and do I want no commas? It is true that I want no commas. So that gives me a bit more control over the conversion. Alternatively, I could say false, I don’t want no commas. In other words, I do want commas. And so this will give me one comma, 528.

Obviously, if you are in a non-English locale where you use a dot as a thousand separator, that’s what you’ll see.

VALUE goes the other way. It converts a number that is looking like a number but is actually text into a number. For example, let’s take this FIXED result which has a comma in it. I can change it into a number by putting it a VALUE around it. And now you can see that this type was text is now a decimal number.

But the function which has the most amount of convertibility is the FORMAT function. Now, you can FORMAT numbers given a certain string. And there are an awful lot of FORMATs, and they’re broken down into Pre-Defined and Custom. So, let’s have a look at the Pre-Defined first. So, we have the specifications of General Number, Currency, Fixed, Standard, Percent, Scientific, Yes No, True False, On Off. Doesn’t this sound rather familiar? Sounds a bit like the formatting that we’ve got here. Decimal numbers, whole numbers, percentage, true false. So, what this function is trying to do is literally convert into plain English. And if you are in a non-English locale, I’m afraid it is English. It doesn’t except French or German, for instance. However, if you are a bit of an Excel guru, then you’ll probably used to these sorts of numeric formats. And these formats are available in DAX. So, zero, displays a digit of zero, hash displays a digit or nothing, and so forth. And it’s even got these section dividers. Up to four different section dividers. And this is what you can use in Excel. So, let’s take the number minus 45. I can FORMAT this number, Custom. So, I can say if it’s positive, then give me this sort of formatting. If it’s negative, give me this sort of formatting. And so on. So, you can see, it’s a negative so it goes to the second of the sections, which are separated by semicolons.

Now, I’m using red just to actually highlight it. Colours don’t actually work that well in (mumbles). So, I would just stick to what you got here, all of the various symbols. Just remember, if you so choose, you can have a different formatting for positive values, for negative values, for zeros, and for null values.

Now, you have also got Pre-Defined formatting for date and time. And to show you why this is important, let’s just have a column called Ship Date and it has Ship Date is and the Ship Date column. This is my Ship Date string. And you can see I’ve got no control over how the date is displayed. Here’s it’s displayed in American Short Date format. However, having a look at the various Pre-Defined formats, we have General, Long and Medium Date, Short Date, Long, Medium, and Short Time. So, if I was to go back in here and format this as a Long Date, you can see it changes from the Short Date to Wednesday comma August eight comma 2007. Of course, if you’re an Excel guru, then you’ll know you can use Custom dates as well. So, we can have Ds for days or dates. W for weeks, M for months, Y for years, H is for hours, and N or M for minutes, and S for seconds.

Now, why I say N or M is because the computer is now intelligent enough to know that if you use an M immediately following a H, then you mean minutes, rather than months. It used to be that you had to use a capital M for months. It’s no longer the case. You can use lower case as well. The computer is intelligent enough to work out what you mean. So, let’s say I wanted this in British format as opposed to the American format. So date first, and then month, and then year. Notice I’m just using one of each. And you can see what we’ve got. 24, eight, 220. So, this Y by itself is not really working. Let’s have a look and see what a single Y means. It says, display the day of the year. So, we need at least two Ys to get the year. So eight, eight, or seven. Notice what happens if I add an extra character to each. We now have zero eight, zero eight, zero seven. And if I keep adding, we now have a different form of day and month expression. Wed, Aug, ‘07. So, let’s add the day in as well, the date. So Wed eight Aug, ‘07. And let’s put lots of things into four characters. And now we have Wednesday eight, August 2007.

So, hopefully you can see how powerful the FORMAT function is for converting numbers and dates into strings. And if you’re unfamiliar with the Pre-Defined or Custom numeric formats, then you’ve got the hyperlinks within the spreadsheet to have a read of the various values that you can use.

204. Other Functions

A quick look at all of the remaining text functions.

While BLANK is classified as a text function, it can also be used in numeric columns. It returns a blank, so it’s the equivalent of a null in other languages.

CODE returns the ASCII byte. So, for instance, a capital A would return usually a code of 65.

CONCATENATE is an equivalent of using the ampersand and a lot more unwieldy as you can see. But there’s also CONCATENATEX, which goes through an entire table or part of a table and joins everything together using a delimiter.

EXACT sees if two strings are the same, ignoring formatting differences.

LOWER and UPPER, notice that there isn’t a PROPER function like there is in some languages, converts text either entirely to lowercase or uppercase. So, there is no function for capitalising the first letter of each word.

REPT that repeats text lots of times.

Now, TRIM is actually, out of all of these functions, with the possible exception of BLANK, the one that I use the most. It removes excess spaces from the beginning of a string, excess spaces at the end of a string. So, if you get your data as a 20-character string, regardless of how many spaces there are in, how many characters are used in it, it removes any spaces from the end. So, if I had hello with 15 spaces at the end, after using TRIM it would just have five characters. And it also removes spaces from in between words, apart from single spaces. So, if you had a double space or triple space, it replaces those with one.

Now, other languages use functions such as LTRIM and RTRIM. You don’t have this in DAX, but for me, the single TRIM function, is a lot more powerful. So, these are the remaining text functions.

Comments
* The most recent comment are at the top

Interesting posts

IBM Certified Data Scientist: Building a Career in Data Science

In today’s digital age, data is the new oil, driving decision-making and innovation across industries. The role of a data scientist has become one of the most sought-after positions in the tech world. If you’re considering a career in data science, obtaining the IBM Certified Data Scientist certification can be a game-changer. This certification not… Read More »

How to Balance Work and Study While Preparing for IT Certification Exams

Balancing work and study while preparing for IT certification exams can feel like an uphill battle. Juggling a full-time job and intense study sessions requires careful planning, discipline, and creativity. The pressure of meeting job responsibilities while dedicating time and energy to study can be overwhelming. However, with the right strategies and mindset, you can… Read More »

10 Highest Paying IT Certifications

In the ever-evolving world of information technology, certifications are more than just a feather in your cap – they’re a ticket to higher salaries and advanced career opportunities. With the tech landscape constantly shifting, staying updated with the most lucrative and relevant certifications can set you apart in a competitive job market. Whether you’re aiming… Read More »

Strategies for ISACA Certified Information Systems Auditor (CISA) Exam

Are you ready to take your career in information systems auditing to the next level? The ISACA Certified Information Systems Auditor (CISA) exam is your ticket to becoming a recognized expert in the field. But let’s face it, preparing for this comprehensive and challenging exam can be daunting. Whether you’re a seasoned professional or just… Read More »

Preparing for Juniper Networks JNCIA-Junos Exam: Key Topics and Mock Exam Resources

So, you’ve decided to take the plunge and go for the Juniper Networks JNCIA-Junos certification, huh? Great choice! This certification serves as a robust foundation for anyone aiming to build a career in networking. However, preparing for the exam can be a daunting task. The good news is that this guide covers the key topics… Read More »

Mastering Microsoft Azure Fundamentals AZ-900: Essential Study Materials

Ever wondered how businesses run these days without giant server rooms? That’s the magic of cloud computing, and Microsoft Azure is a leading cloud platform. Thinking about a career in this exciting field? If so, mastering the Microsoft Certified: Azure Fundamentals certification through passing the AZ-900 exam is the perfect starting point for you. This… Read More »

img