[SOLVED] Counting lowercase and uppercase letters in a string in Python

I am writing a program to count the number of uppercase and lowercase letters in a string. I came up with something that works, but as I am still a beginner I have a feeling writing the code this way is probably considered "clumsy."

Here is what I have:

``````stri = input("Give me a phrase:")
stri_up = 0
stri_lo = 0
for i in stri:
if i.isupper():
stri_up += 1
if i.islower():
stri_lo += 1
print("The number of uppercase letters in your phrase is:", stri_up)
print("The number of lowercase letters in your phrase is:", stri_lo)
``````

Output:

``````Give me a phrase: tHe Sun is sHininG
The number of uppercase letters in your phrase is: 4
The number of lowercase letters in your phrase is: 11
``````

I would like to learn how to write neat, beautiful code so I am wondering if there is a more efficient and elegant way to code this.

@Matt 2019-02-07 09:59:05

TLDR: Looks good! This is perfectly reasonable solution for your problem. It's certainly not clumsy.

Optimisations The optimisation ShadowRanger points out, is faster, due to compiler optimisations, I wouldn't worry about this at a beginner level (and not even at an experienced level really, unless it was critical to make every optimisation).

The optimisation of checking only `isupper` or `islower` that some have pointed out probably isn't valid. If your input is guaranteed to be only alphabetic characters A-Z or a-z, then you can assume that if it's not upper, it's lower. But this doesn't apply generally. '1' is neither lower nor upper for example. Checking only `isupper` and assuming the opposite on a `False` result, you would increment your 'lower' counter and that wouldn't be correct.

Your code provides a correct solution and doesn't break when the user inputs an empty string or non alphabetic characters, which is why I'd consider it good.

Possible next step: Since you say you're a beginner, I'd look up writing tests if you haven't already and learn a little about how to write good tests. Checking empty input and special characters would be an interesting start. Some terms to search would be edge-case

Thank you, your comment warmed my heart and has very useful suggestions. :)

@user192377 2019-02-07 17:14:30

You can approach this in a cleaner manner by using the filter function; for example:

``````stri = input("Give me a phrase:")
# Filter will return every character in stri x, where x.isupper() returns true
stri_up = filter(str.isupper, stri)
# Filter returns an iterator, to get the length we cast to a list first
up_count = len(list(stri_up))
stri_lo = filter(str.islower, stri)
lo_count = len(list(stri_lo))
print("The number of uppercase letters in your phrase is:", up_count)
print("The number of lowercase letters in your phrase is:", lo_count)
``````

As a note this is a less efficient approach, since you iterate through the string twice in the filter calls, but it is a different way of approaching the problem, and hopefully get you introduced to some more advanced python techniques.

Small optimisation

If you know a character is an upper, you don't have to test for lower anymore:

``````stri = input("Give me a phrase:")
stri_up = 0
stri_lo = 0
for i in stri:
if i.isupper():
stri_up += 1
elif i.islower():
stri_lo += 1
print("The number of uppercase letters in your phrase is:", stri_up)
print("The number of lowercase letters in your phrase is:", stri_lo)
``````

@Baldrickk 2019-02-08 09:25:14

Your code is mostly fine. I'd suggest more meaningful names for variables, e.g. `i` is typically a name for integer/index variables; since you're iterating over letters/characters, you might choose `c`, `char`, `let`, or `letter`. For `stri`, you might just name it `phrase` (that's what you asked for from the user after all). You get the idea. Make the names self-documenting.

Arguably you could make it look "prettier" by performing a single pass per test, replacing:

``````stri_up = 0
stri_lo = 0
for i in stri:
if i.isupper():
stri_up += 1
if i.islower():
stri_lo += 1
``````

with:

``````stri_up = sum(1 for let in stri if let.isupper())
stri_lo = sum(1 for let in stri if let.islower())
``````

That's in theory less efficient, since it has to traverse `stri` twice, while your original code only does it once, but in practice it's likely faster; on the CPython reference interpreter, `sum` is highly optimized for this case and avoids constructing a bunch of intermediate `int` objects while summing.

@200_success 2019-02-07 01:48:27

You can just do `sum(c.isupper() for c in phrase)`, because boolean will be treated as 0 or 1 when summing.

@200_success: True, but I'm using dirty knowledge here; the `sum` fast path only fires for `int` (`PyLong_Object` at C layer) exactly (no `int` subclasses accepted, including `bool`); yielding `bool` blocks that optimization (and involves a lot more yields from the genexpr that can be avoided). Plus, I consider it more obvious to actually sum integers conditionally; using `bool` for numeric value is perfectly legal, just a little more magical than necessary, given the minimal benefit.

Just for comparison, a microbenchmark where `stri`/`phrase` is just one of each ASCII character (`''.join(map(chr, range(128)))`), takes 15.3 µs to complete on my computer using your code, vs. 10.5 µs for summing hardcoded `1`s conditionally.

@Baldrickk 2019-02-07 16:21:48

Your theory vs practice may be a little off - for short strings, it likely matters little anyway, but putting a long string through the function may very well cause it to invalidate the cache (you just know someone's going to try passing it the entire works of shakespeare all at once). This would make the cache friendly single pass much more efficient where it really counts. Probably... I really should profile this.

@Baldrickk 2019-02-07 16:29:21

@200_success `"But", I thought, "wouldn't a lot of punctiation (e.g. ./@#~";:' etc.) cause that single line to be incorrect?"` = 2 uppers and 109 lowers when it should be 70 lowers.

@Baldrickk: I modified the microbenchmark to run against the contents of Ubuntu's `american-english-insane` file repeated 10 times (`len` of 68753140). My `sum` was fastest by a small amount (for 10x case, 8.34 s), the OP's code close behind (8.48 s), and the 200_success's rather further behind (11 s). The same pattern held for unrepeated `american-english-insane`, with the same margins. I suspect the cache doesn't matter; any system worth its salt can recognize sequential memory access and populate the cache ahead of time (Python is slow enough to give it time to do so).

Regardless, I was suggesting it mostly as cleaner looking code (it's shorter, and each line does exactly one obvious thing, no need for context to understand it); the mild speed boost doesn't really matter.

@Baldrickk 2019-02-08 09:24:37

@ShadowRanger thanks! Always good to know. I'm used to working with programs where that is a big thing.

[SOLVED] Printing stats based on two words and a letter

• 2017-10-29 05:48:45
• X Lefora
• 446 View
• 3 Score
• Tags:   java strings