By Madrid_datalady


2019-02-06 22:35:35 8 Comments

I am writing a program to count the number of uppercase and lowercase letters in a string. I came up with something that works, but as I am still a beginner I have a feeling writing the code this way is probably considered "clumsy."

Here is what I have:

stri = input("Give me a phrase:")
stri_up = 0
stri_lo = 0
for i in stri:
    if i.isupper():
        stri_up += 1
    if i.islower():
        stri_lo += 1
print("The number of uppercase letters in your phrase is:", stri_up)
print("The number of lowercase letters in your phrase is:", stri_lo)

Output:

Give me a phrase: tHe Sun is sHininG
The number of uppercase letters in your phrase is: 4
The number of lowercase letters in your phrase is: 11

I would like to learn how to write neat, beautiful code so I am wondering if there is a more efficient and elegant way to code this.

4 comments

@user192377 2019-02-07 17:14:30

You can approach this in a cleaner manner by using the filter function; for example:

stri = input("Give me a phrase:")
# Filter will return every character in stri x, where x.isupper() returns true
stri_up = filter(str.isupper, stri)  
# Filter returns an iterator, to get the length we cast to a list first
up_count = len(list(stri_up))  
stri_lo = filter(str.islower, stri)
lo_count = len(list(stri_lo))
print("The number of uppercase letters in your phrase is:", up_count)
print("The number of lowercase letters in your phrase is:", lo_count)

As a note this is a less efficient approach, since you iterate through the string twice in the filter calls, but it is a different way of approaching the problem, and hopefully get you introduced to some more advanced python techniques.

@Matt 2019-02-07 09:59:05

TLDR: Looks good! This is perfectly reasonable solution for your problem. It's certainly not clumsy.

Optimisations The optimisation ShadowRanger points out, is faster, due to compiler optimisations, I wouldn't worry about this at a beginner level (and no even at an experienced level really, unless it was critical to make every optimisation.

The optimisation of checking only isupper or islower that some have pointed out probably isn't valid. If your input is guaranteed to be only alphabetic characters A-Z or a-z, then you can assume that if it's not upper, it's lower. But this doesn't apply generally. '1' is neither lower or upper for example. Checking only isupper and assuming the opposite on a False result, you would increment your 'lower' counter and that wouldn't be correct.

Your code provides a correct solution and doesn't break when the user inputs an empty string or non alphabetic characters, which is why I'd consider it good.

Possible next step: Since you say you're a beginner, I'd look up writing tests if you haven't already and learn a little about how to write good tests. Checking empty input and special characters would be an interesting start. Some terms to search would be edge-case

@Madrid_datalady 2019-02-07 11:11:22

Thank you, your comment warmed my heart and has very useful suggestions. :)

@JAD 2019-02-07 09:35:38

Small optimisation

If you know a character is an upper, you don't have to test for lower anymore:

stri = input("Give me a phrase:")
stri_up = 0
stri_lo = 0
for i in stri:
    if i.isupper():
        stri_up += 1
    elif i.islower():
        stri_lo += 1
print("The number of uppercase letters in your phrase is:", stri_up)
print("The number of lowercase letters in your phrase is:", stri_lo)

@Baldrickk 2019-02-08 09:25:14

what about punctuation?

@ShadowRanger 2019-02-07 00:09:11

Your code is mostly fine. I'd suggest more meaningful names for variables, e.g. i is typically a name for integer/index variables; since you're iterating over letters/characters, you might choose c, char, let, or letter. For stri, you might just name it phrase (that's what you asked for from the user after all). You get the idea. Make the names self-documenting.

Arguably you could make it look "prettier" by performing a single pass per test, replacing:

stri_up = 0
stri_lo = 0
for i in stri:
    if i.isupper():
        stri_up += 1
    if i.islower():
        stri_lo += 1

with:

stri_up = sum(1 for let in stri if let.isupper())
stri_lo = sum(1 for let in stri if let.islower())

That's in theory less efficient, since it has to traverse stri twice, while your original code only does it once, but in practice it's likely faster; on the CPython reference interpreter, sum is highly optimized for this case and avoids constructing a bunch of intermediate int objects while summing.

@200_success 2019-02-07 01:48:27

You can just do sum(c.isupper() for c in phrase), because boolean will be treated as 0 or 1 when summing.

@ShadowRanger 2019-02-07 02:47:49

@200_success: True, but I'm using dirty knowledge here; the sum fast path only fires for int (PyLong_Object at C layer) exactly (no int subclasses accepted, including bool); yielding bool blocks that optimization (and involves a lot more yields from the genexpr that can be avoided). Plus, I consider it more obvious to actually sum integers conditionally; using bool for numeric value is perfectly legal, just a little more magical than necessary, given the minimal benefit.

@ShadowRanger 2019-02-07 03:20:39

Just for comparison, a microbenchmark where stri/phrase is just one of each ASCII character (''.join(map(chr, range(128)))), takes 15.3 µs to complete on my computer using your code, vs. 10.5 µs for summing hardcoded 1s conditionally.

@Baldrickk 2019-02-07 16:21:48

Your theory vs practice may be a little off - for short strings, it likely matters little anyway, but putting a long string through the function may very well cause it to invalidate the cache (you just know someone's going to try passing it the entire works of shakespeare all at once). This would make the cache friendly single pass much more efficient where it really counts. Probably... I really should profile this.

@Baldrickk 2019-02-07 16:29:21

@200_success "But", I thought, "wouldn't a lot of punctiation (e.g. ./@#~";:' etc.) cause that single line to be incorrect?" = 2 uppers and 109 lowers when it should be 70 lowers.

@ShadowRanger 2019-02-07 22:14:36

@Baldrickk: I modified the microbenchmark to run against the contents of Ubuntu's american-english-insane file repeated 10 times (len of 68753140). My sum was fastest by a small amount (for 10x case, 8.34 s), the OP's code close behind (8.48 s), and the 200_success's rather further behind (11 s). The same pattern held for unrepeated american-english-insane, with the same margins. I suspect the cache doesn't matter; any system worth its salt can recognize sequential memory access and populate the cache ahead of time (Python is slow enough to give it time to do so).

@ShadowRanger 2019-02-07 22:15:51

Regardless, I was suggesting it mostly as cleaner looking code (it's shorter, and each line does exactly one obvious thing, no need for context to understand it); the mild speed boost doesn't really matter.

@Baldrickk 2019-02-08 09:24:37

@ShadowRanger thanks! Always good to know. I'm used to working with programs where that is a big thing.

Related Questions

Sponsored Content

2 Answered Questions

[SOLVED] Guess-the-number game by a Python beginner

2 Answered Questions

[SOLVED] Make a beautiful binary string

1 Answered Questions

[SOLVED] Printing stats based on two words and a letter

  • 2017-10-29 05:48:45
  • X Lefora
  • 271 View
  • 3 Score
  • 1 Answer
  • Tags:   java strings

2 Answered Questions

[SOLVED] Transform String a into b

3 Answered Questions

[SOLVED] Use Python to determine the repeating pattern in a string

2 Answered Questions

1 Answered Questions

[SOLVED] Haskell CodeEval Beautiful Strings

2 Answered Questions

[SOLVED] These words are so dirty

2 Answered Questions

[SOLVED] Uppercase the initial char of every string in a list

4 Answered Questions

Sponsored Content