Then can you explain how complex it is so we have a better idea?
I really don't understand why just because it's 3D, it has to be difficult.
Sure, you can't have Ace Editor and suddenly manage 3D objects, but I doubt anyone is expecting that either.
Unity is not difficult. Takes some time to learn the concepts and play with the tutorials, but it cuts down development time significantly. The interface is pretty nice as well, just like how RM's interface is pretty nice.
Some people also argue about 3D models. It has nothing to do with the fact that 3D materials are much more difficult to create than 2D materials. I can't make a decent 2D map nor can I create any decent looking 2D sprites either, but that doesn't matter because I can just pay someone else who does have the skills to do it. How difficult it is to use a piece of software is not dictated by the user's own talents, and it would be especially questionable to take users with no particular talent in one field and argue that the software is too difficult because it cannot cater to people with no skills.
I'll take a stab at it as someone who has made 3D games before.
First of all, you need to make the 3D model, typically in Zbrush. This is a highly specialized skill and it takes a very experienced professional to produce a model that looks good. Heck, even many AAA games struggle to make faces that aren't disturbing-looking.
Secondly, you need to retopologize that high poly model, which is another highly specialized skill and in studios is often done by a separate person altogether (a technical artist). After that, you need to create the diffuse map, and, depending on the character's design, possibly also a specular, reflection, or emission map. At this stage, the static character is done.
Now you need to animate it. Nowadays, this is most often done with mocap, which means renting out a mocap studio, hiring mocap actors, and having animators to touch up the final product. This is another long and laborious task that requires trained professionals.
Rinse and repeat for every character.
Now let's put it in the game. 3D animations are a lot more complex than 2D ones. You have to deal with blending, cross fading, clipping, etc. Not to mention physics. Oh yeah, and when you're working with 3 dimensions, you're not talking about simple cartesian math anymore; you need to deal with quaternions, a kind of imaginary number that you have to use in order to compose 3D rotations.
How many people here can perform linear algebra using imaginary numbers? Not many. How many people have the funds to rent a mocap studio to make their custom characters? Not many.
EDIT: On top of that, in two years, all your graphics look hideously out of date due to advances in the start of the art.