Raytracing
[Preamble]
Raytracing is one of the most fundamental aspects of 3D gaming. It provides a mechanism for environment interaction by both finding potential targets in the player’s view and in detecting what a player may have hit (Both with a gun shot and with their body.) This all requires that ray tracing be relatively fast. What you see in the above screen shots is a ray traced in to a heptoroid. The triangles shown on the mesh denote the 34 hit candidates. Hit candidates are triangles in the mesh that might be hit. I’ll explain below how we divide up the surface so that we don’t have to check each triangle individually.
[Intro]
Recently I had need of a moderately fast ray tracer. Fast being around 1ms. The only stipulation being that it couldn’t be part of an existing library. Why? Scalability and practice. I’m under certain restrictions in what I can and can’t use for work software. Some 3rd party libraries just aren’t allowed and so I had to reinvent the wheel. Yay!
Before I forget, here is a great site with an object intersection chart and links to code.
Step1.) Define a simple vector class
class vector3 { public: vector3() : x( 0.0f ), y( 0.0f ), z( 0.0f ) {}; vector3( real a_X, real a_Y, real a_Z ) : x( a_X ), y( a_Y ), z( a_Z ) {}; vector3( point3D_t p ) : x( p.x ), y( p.y ), z( p.z ) {}; vector3( const vector3 *v ) : x( v->x ), y( v->y ), z( v->z ) {}; void Set( real a_X, real a_Y, real a_Z ) { x = a_X; y = a_Y; z = a_Z; } void Normalize() { real l = 1.0f / Length(); x *= l; y *= l; z *= l; } void Invert(){x*=-1;y*=-1;z*=-1;} real Length() { return (real)sqrt( x * x + y * y + z * z ); } real SqrLength() { return x * x + y * y + z * z; } real Dot( vector3 a_V ) { return x * a_V.x + y * a_V.y + z * a_V.z; } void TwoPoints(point3D_t p1, point3D_t p2){ x = p2.x - p1.x; y = p2.y - p1.y; z = p2.z - p1.z; } vector3 Cross( vector3 b ) { return vector3( y * b.z - z * b.y, z * b.x - x * b.z, x * b.y - y * b.x ); } void operator += ( vector3& a_V ) { x += a_V.x; y += a_V.y; z += a_V.z; } void operator += ( vector3* a_V ) { x += a_V->x; y += a_V->y; z += a_V->z; } void operator -= ( vector3& a_V ) { x -= a_V.x; y -= a_V.y; z -= a_V.z; } void operator -= ( vector3* a_V ) { x -= a_V->x; y -= a_V->y; z -= a_V->z; } void operator *= ( real f ) { x *= f; y *= f; z *= f; } void operator *= ( vector3& a_V ) { x *= a_V.x; y *= a_V.y; z *= a_V.z; } void operator *= ( vector3* a_V ) { x *= a_V->x; y *= a_V->y; z *= a_V->z; } void operator = (const vector3 *a_v) {x = a_v->x; y = a_v->y; z = a_v->z;} vector3 operator- () const { return vector3( -x, -y, -z ); } friend vector3 operator + ( const vector3& v1, const vector3& v2 ) { return vector3( v1.x + v2.x, v1.y + v2.y, v1.z + v2.z ); } friend vector3 operator - ( const vector3& v1, const vector3& v2 ) { return vector3( v1.x - v2.x, v1.y - v2.y, v1.z - v2.z ); } friend vector3 operator + ( const vector3& v1, vector3* v2 ) { return vector3( v1.x + v2->x, v1.y + v2->y, v1.z + v2->z ); } friend vector3 operator - ( const vector3& v1, vector3* v2 ) { return vector3( v1.x - v2->x, v1.y - v2->y, v1.z - v2->z ); } friend vector3 operator * ( const vector3& v, real f ) { return vector3( v.x * f, v.y * f, v.z * f ); } friend vector3 operator * ( const vector3& v1, vector3& v2 ) { return vector3( v1.x * v2.x, v1.y * v2.y, v1.z * v2.z ); } friend vector3 operator * ( real f, const vector3& v ) { return vector3( v.x * f, v.y * f, v.z * f ); } union { struct { real x, y, z; }; struct { real r, g, b; }; struct { real cell[3]; }; }; };
*Ignore the “TwoPoints” function, it’s meant to support legacy 3d point representations. You may also notice I create a union in the vector class with x/y/z and r/g/b, this just made it easier for me to use as a color vector too.
Step 2:) Define a ray class
class Ray { public: Ray(const vector3 &o, const vector3 &d) { _origin = o; _dir = d; _invDir = vector3(1/d.x, 1/d.y, 1/d.z); sign[0] = (_invDir.x < 0); sign[1] = (_invDir.y < 0); sign[2] = (_invDir.z < 0); } Ray(){} void Set(const vector3 &o, const vector3 &d){ _origin = o; _dir = d; _invDir = vector3(1/d.x, 1/d.y, 1/d.z); sign[0] = (_invDir.x < 0); sign[1] = (_invDir.y < 0); sign[2] = (_invDir.z < 0); } void Set(){ _invDir = vector3(1/_dir.x, 1/_dir.y, 1/_dir.z); sign[0] = (_invDir.x < 0); sign[1] = (_invDir.y < 0); sign[2] = (_invDir.z < 0); } vector3 _origin; vector3 _dir; vector3 _invDir; int sign[3]; };
* Note the inverse direction (_invDir) and sign are precalculated. This will speed things up later. Also make note that there are no checks for division by zero but they are not without reason. The CSAIL ray/boundingbox intersection code makes use of a floating point property during division by zero that will aid in their actual algorithm. If you force these values to be zero or near zero it will cause problems later.
Step 3:) Define an axis aligned bounding box
inline int planeBoxOverlap(vector3 normal,float d, vector3 maxbox); //Axis aligned bounding box #define FINDMINMAX(x0,x1,x2,min,max) \ min = max = x0; \ if(x1max) max=x1;\ if(x2max) max=x2; int planeBoxOverlap(vector3 normal,float d, vector3 maxbox) { int q; vector3 vmin,vmax; for(q=0;q<=2;q++) { if(normal.cell[q]>0.0f) { vmin.cell[q]=-maxbox.cell[q]; vmax.cell[q]=maxbox.cell[q]; } else { vmin.cell[q]=maxbox.cell[q]; vmax.cell[q]=-maxbox.cell[q]; } } if(normal.Dot(vmin)+d>0.0f) return 0; if(normal.Dot(vmax)+d>=0.0f) return 1; return 0; } /*======================== X-tests ========================*/ #define AXISTEST_X01(a, b, fa, fb) \ p0 = a*v0.y - b*v0.z; \ p2 = a*v2.y - b*v2.z; \ if(p0 rad || max<-rad) return 0; #define AXISTEST_X2(a, b, fa, fb) \ p0 = a*v0.y - b*v0.z; \ p1 = a*v1.y - b*v1.z; \ if(p0 rad || max<-rad) return 0; /*======================== Y-tests ========================*/ #define AXISTEST_Y02(a, b, fa, fb) \ p0 = -a*v0.x + b*v0.z; \ p2 = -a*v2.x + b*v2.z; \ if(p0 rad || max<-rad) return 0; #define AXISTEST_Y1(a, b, fa, fb) \ p0 = -a*v0.x + b*v0.z; \ p1 = -a*v1.x + b*v1.z; \ if(p0 rad || max<-rad) return 0; /*======================== Z-tests ========================*/ #define AXISTEST_Z12(a, b, fa, fb) \ p1 = a*v1.x - b*v1.y; \ p2 = a*v2.x - b*v2.y; \ if(p2 rad || max<-rad) return 0; #define AXISTEST_Z0(a, b, fa, fb) \ p0 = a*v0.x - b*v0.y; \ p1 = a*v1.x - b*v1.y; \ if(p0 rad || max<-rad) return 0; class AABB { public: AABB() : m_Pos( vector3( 0, 0, 0 ) ), m_Size( vector3( 0, 0, 0 ) ), m_Cent(vector3(0,0,0) ) { m_HalfSize = vector3(m_Size.x/2,m_Size.y/2, m_Size.z/2); }; AABB( vector3& a_Pos, vector3& a_Size ) : m_Pos( a_Pos ), m_Size( a_Size ) { m_Cent = vector3(m_Pos.x + m_Size.x/2,m_Pos.y + m_Size.y/2,m_Pos.z + m_Size.z/2); m_HalfSize = vector3(m_Size.x/2,m_Size.y/2, m_Size.z/2); }; AABB( vector3& a_Pos, vector3& a_Size, vector3& a_Cent ) : m_Pos( a_Pos ), m_Size( a_Size ), m_Cent(a_Cent) { m_HalfSize = vector3(m_Size.x/2,m_Size.y/2, m_Size.z/2); }; AABB( const AABB *ptr ) : m_Pos( ptr->m_Pos ), m_Size( ptr->m_Size ), m_Cent(ptr->m_Cent) { m_HalfSize = vector3(m_Size.x/2,m_Size.y/2, m_Size.z/2); }; vector3 GetMax() { return vector3(m_Pos + m_Size); } vector3& GetMin() { return m_Pos; } vector3& GetPos() { return m_Pos; } vector3& GetSize() { return m_Size; } vector3& GetCent() { return m_Cent; } vector3 max(vector3 a, vector3 b){ if(a.SqrLength() > b.SqrLength())return a; else return b; } vector3 min(vector3 a, vector3 b){ if(a.SqrLength() < b.SqrLength())return a; else return b; } void Set(const vector3& pos, const vector3& size){ m_Pos = pos; m_Size = size; m_Cent = vector3(m_Pos.x + m_Size.x/2,m_Pos.y + m_Size.y/2,m_Pos.z + m_Size.z/2); m_HalfSize = vector3(size.x/2,size.y/2,size.z/2); } bool Intersect( AABB& b2 ) { vector3 v1 = b2.GetPos(), v2 = b2.GetPos() + b2.GetSize(); vector3 v3 = m_Pos, v4 = m_Pos + m_Size; return ((v4.x >= v1.x) && (v3.x <= v2.x) && // x-axis overlap (v4.y >= v1.y) && (v3.y <= v2.y) && // y-axis overlap (v4.z >= v1.z) && (v3.z <= v2.z)); // z-axis overlap } bool Intersect (Ray r, float t0, float t1){ float tmin, tmax, tymin, tymax, tzmin, tzmax; vector3 bounds[2]; bounds[0] = GetPos(); bounds[1] = GetMax(); tmin = (bounds[r.sign[0]].x - r._origin.x) * r._invDir.x; tmax = (bounds[1-r.sign[0]].x - r._origin.x) * r._invDir.x; tymin = (bounds[r.sign[1]].y - r._origin.y) * r._invDir.y; tymax = (bounds[1-r.sign[1]].y - r._origin.y) * r._invDir.y; if ( (tmin > tymax) || (tymin > tmax) ){ return false; } if (tymin > tmin) tmin = tymin; if (tymax < tmax) tmax = tymax; tzmin = (bounds[r.sign[2]].z - r._origin.z) * r._invDir.z; tzmax = (bounds[1-r.sign[2]].z - r._origin.z) * r._invDir.z; if ( (tmin > tzmax) || (tzmin > tmax) ){ return false; } if (tzmin > tmin) tmin = tzmin; if (tzmax < tmax) tmax = tzmax; if( (tmin <= t1) && (tmax >= t0)){ return true; } return false; } bool Contains( vector3 a_Pos ) { vector3 v1 = m_Pos, v2 = m_Pos + m_Size; return ((a_Pos.x >= v1.x) && (a_Pos.x <= v2.x) && (a_Pos.y >= v1.y) && (a_Pos.y <= v2.y) && (a_Pos.z >= v1.z) && (a_Pos.z <= v2.z)); } bool Contains( double ax, double ay, double az ) { vector3 v1 = m_Pos, v2 = m_Pos + m_Size; return ((ax >= v1.x) && (ax <= v2.x) && (ay >= v1.y) && (ay <= v2.y) && (az >= v1.z) && (az <= v2.z)); } bool ContainsFace( vector3 pt0,vector3 pt1,vector3 pt2) { vector3 axis,normal,e0,e1,e2,v0,v1,v2; float min, max, d, p0, p1, p2,rad,fex,fey,fez; v0 = pt0 - m_Cent; v1 = pt1 - m_Cent; v2 = pt2 - m_Cent; e0 = v1 - v0; e1 = v2 - v1; e2 = v0 - v2; fex = fabs(e0.x); fey = fabs(e0.y); fez = fabs(e0.z); AXISTEST_X01(e0.z, e0.y, fez, fey); AXISTEST_Y02(e0.z, e0.x, fez, fex); AXISTEST_Z12(e0.y, e0.x, fey, fex); fex = fabs(e1.x); fey = fabs(e1.y); fez = fabs(e1.z); AXISTEST_X01(e1.z, e1.y, fez, fey); AXISTEST_Y02(e1.z, e1.x, fez, fex); AXISTEST_Z0(e1.y, e1.x, fey, fex); fex = fabs(e2.x); fey = fabs(e2.y); fez = fabs(e2.z); AXISTEST_X2(e2.z, e2.y, fez, fey); AXISTEST_Y1(e2.z, e2.x, fez, fex); AXISTEST_Z12(e2.y, e2.x, fey, fex); /* test in X-direction */ FINDMINMAX(v0.x,v1.x,v2.x,min,max); if(min>m_HalfSize.x || max<-m_HalfSize.x) return false; /* test in Y-direction */ FINDMINMAX(v0.y,v1.y,v2.y,min,max); if(min>m_HalfSize.y || max<-m_HalfSize.y) return false; /* test in Z-direction */ FINDMINMAX(v0.z,v1.z,v2.z,min,max); if(min>m_HalfSize.z || max<-m_HalfSize.z) return false; /* Bullet 2: */ /* test if the box intersects the plane of the triangle */ /* compute plane equation of triangle: normal*x+d=0 */ normal = e0.Cross(e1); d=-1 * normal.Dot(v0); /* plane eq: normal.x+d=0 */ if(!planeBoxOverlap(normal,d,m_HalfSize)) return false; return true; } void operator =(const AABB *ptr){ m_Pos = ptr->m_Pos; m_Size = ptr->m_Size; m_Cent = ptr->m_Cent; m_HalfSize = vector3(m_Size.x/2,m_Size.y/2, m_Size.z/2); } double w() { return m_Size.x; } double h() { return m_Size.y; } double d() { return m_Size.z; } double x() { return m_Pos.x; } double y() { return m_Pos.y; } double z() { return m_Pos.z; } double cx() { return m_Cent.x;} double cy() { return m_Cent.y;} double cz() { return m_Cent.z;} private: vector3 m_Pos, m_Size, m_Cent, m_HalfSize; };
Ray/Boundingbox intersection code was pulled from a paper out of CSAIL: “An Efficient and Robust Ray–Box Intersection Algorithm” The Boundingbox/Triangle intersection code is from Moller.
Right so you’ve got a ray, a vector, and a bounding box, what’s next? Testing your ray against every triangle in a 3D object would be a waste of time and processing power. Our tests are usually on objects with 400k-500k faces(triangles.) A single ray trace would take forever if we checked each one. It’s much cheaper to cull unnecessary faces so that we are left with a small subset on which to do our tray/riangle intersection test. So how do we divide up our 3D object? How about using that nice boundingbox class we made above? Subdivide the 3D surface volumetrically using axis aligned bounding boxes. In other words we encapsulate our entire object in a single box and then subdivide that box down until some metric is met. Usually that the box contains less than N faces.
Subdivision can be done simply along the centroid of each box. Don’t let anyone tell you different, you don’t need the world’s most complicated heuristic to get a working solution. I’ve seen ray tracers that subdivided the surface based on no less than 10 metrics of comparison. Total build time for those algs. was 15 minutes on my machine. Compare this to the 12-13 seconds it takes mine to build. In their alg. each ray trace took 0.9ms, in mine it takes about 1.3ms. While their time per shot was less their overall time is longer for all of our needs. So their method will only be faster if we shoot ~1,278,333 shots.
The method I’m describing is called an OcTree(as you may have seen from the link above). I’ll try and post the code to my repository by this evening. An OcTree divides the surface up in to smaller and smaller cubes until some metric is met. Remember what I said about simple heuristics? You could use a complex KD-Tree that subdivides based on along the principle axis of each cube but why? Each OcTree node can be divided simply by create 8 cubes of exactly the same size using the node’s centroid. It’s a fast and efficient way to divide a 3D mesh. Just prior to this writing I actually realized a small mistake in my code.
Originally my OcTree node test was the number of vertices contained in each cube. This works well enough when your mesh is of sufficient density (>150k faces) however when you get large faces you end up with gaps in the surface. What happens is that 1 cube contains 1 point of a face and another cube contains the other 2 points. But between these 2 cubes could be a 3rd cube that, while it doesn’t encompass a triangle end point, still contains the edges of that triangle. This is where Moller’s Tri/Boundingbox intersection code comes in handy.
With the object cubed up the next step is to trace your ray. Tracing has 3 stages. The first stage tests the largest box around each object to see if it hit. Believe me when I say that ray/boundingbox intersection checks are faster than ray/triangle checks. The second stage, assuming stage 1 passed, is to trace the ray through each boundingbox. Any boundingbox the ray hits that contains faces is added to a hit list. The third and final stage is test your ray against all of the faces in the hit list. This is where the ray/triangle intersection code is done. By this point you should have anywhere between 10-100 faces depending on your bucketing scheme.
Stage 1:)
int TraceRayRoot(Ray *r){ OcTreeNode *root = _oct.GetRoot(); //Get root if(root == NULL){ printf(" - ERROR. Root not does not exist, have you built the OcTree?\n"); return 0; } //We only need a single hit detection //TODO: We've hard coded the intersection time between 0 and 1. We should really make this more dynamic or infinitely large. if(root->GetBB()->Intersect(*r,_iTIntervalMin,_iTIntervalMax))return 1; return 0; }
Simple yes?
Stage 2:)
/* TraceRay() Summary: Trace a ray and determine if it hits this object. Input: Ray *r - Pointer to a ray; vector3 *vPt - Pointer to our ray/face intersection point double *mu - Double for time(distance) of ray origin from intersection Return: Integer that represents face hit. (Index value.) */ int TraceRay(Ray *r, vector3 *vPt, double *mu, int method){ OcTreeNode *root = _oct.GetRoot(); //Get OcTree root node std::vector::iterator iter; //Vector iterator int nTrisHit = 0; //Number of triangles hit if(root == NULL){ printf(" - ERROR. Root not does not exist, have you built the OcTree?\n"); return -1; } //Clear last hitlist _nHits = 0; //# of faces boxes hit (TODO: Variable isn't clear, rename) _nBBHits = 0; _vHitList.clear(); //List of hit faces memset(_bUsed,0,_ply.n_face); //Clear the "used" array. //Recursively iterate through all nodes until we find ones with data. TraceRayStep(root,r); double ret; double min_r = 9999999; int min_idx = -1; int idx; //Parse hit list (hit list is a vector of indexes in to the _ply.face array) for(iter = _vHitList.begin(); iter != _vHitList.end(); iter++){ idx = (*iter); ret = Intersect_Ray_Face(r,&_ply.face[idx]); //Moller/Trumbore Method 2 (Works) if(ret>0){ //printf("Face[%d] was hit at: %f\n",idx,ret); //We only care about the smallest Time(Distance) value as that is the closest intersection to our ray. if(ret < min_r){ min_r = ret; min_idx = idx; } } } //Find our ray/face intersection point and move it along by TRACE_OFFSET amount to make sure we don't hit this face again. *vPt = r->_origin + (r->_dir * (min_r + TRACE_OFFSET)); *mu = min_r; return min_idx; } /* TraceRayStep() Summary: Recursive portion of TraceRay that interates through the OcTree structure locating viable nodes. A viable node is one containg both the ray and N > 0 data. Input: OcTreeNode *node - Pointer to an OcTree node Ray *r - Pointer to a ray; Return: None. Hit values are stored in the _vHitList vector; Authors Note: We tend to cheat a bit here. The OcTree buckets indices, not faces. As such it's quite possible a single face has 2 points in one bucket (or node) and 1 in another. In such a case this face might be counted twice. That's why we use the _bUsed array. Any face previously added to the list is avoided in future passes. For the cost of a small amt. of memory we avoid having to search for similar values at every iteration. */ void TraceRayStep(OcTreeNode *node, Ray *r){ //First check if this node is even worth a look if(node->IsLeaf() && node->IsEmpty()){ return; } //No intersection? Return from whence you came. if(!node->GetBB()->Intersect(*r,_iTIntervalMin,_iTIntervalMax)){ return; } //Only leaves(end nodes) that also contain data are checked for a hit; if(node->IsLeaf() && !node->IsEmpty()){ int *idx = node->GetIndexList(); //Pointer to the array of index values in the current node. Index is in to a vertex array for(int k=0;kGetSize();k++){ //Face not already in list, add it. if(!_bUsed[faces[k]]) { _bUsed[faces[k]] = true; _vHitList.push_back(faces[k]); _nHits++; } } _nBBHits++; //Increment number of bb hits. return; } //Parse children. for(int i = 0; i < 8 ; i++){ TraceRayStep(node->GetChild(i),r); } return; }
bUsed is a boolean array that keeps us from using the same face twice. _vHitList is an std::vector. It contains a list of each face that is a candidate for intersection.
Stage 3: Last but certainly not least is the actual ray/triangle intersection code, courtesy of Moller and Trumbore:)
double Intersect_Ray_Face(Ray *r, Face *f) { float u,v,t; vector3 edge1, edge2, normal; vector3 tvec,pvec,qvec; float det,inv_det; /* find vectors for two edges sharing vert0 */ edge1 = _ply.vtx[f->verts[0]] - _ply.vtx[f->verts[1]]; //Edge 1 edge2 = _ply.vtx[f->verts[0]] - _ply.vtx[f->verts[2]]; //Edge 2 /* begin calculating determinant - also used to calculate U parameter */ pvec = r->_dir.Cross(edge2); /* if determinant is near zero, ray lies in plane of triangle */ det = edge1.Dot(pvec); if (det == 0) return 0; /* calculate distance from vert0 to ray origin */ tvec = r->_origin - vector3(_ply.vtx[f->verts[0]].x,_ply.vtx[f->verts[0]].y,_ply.vtx[f->verts[0]].z); inv_det = 1.0f / det; qvec = tvec.Cross(edge1); if (det > EPSILON){ u = tvec.Dot(pvec); //if(bOutputIntersect)printf("a.) u: %f det: %f\n",u,det); if (u < 0.0 || u > det)return 0; /* calculate V parameter and test bounds */ v = r->_dir.Dot(qvec); //if(bOutputIntersect)printf("u: %f v: %f vu: %f det: %f\n",u,v,u+v,det); if (v < 0.0 || u + v > det)return 0; } else if(det < -EPSILON){ /* calculate U parameter and test bounds */ u = tvec.Dot(pvec); //if(bOutputIntersect)printf("b.) u: %f det: %f\n",u,det); if (u > 0.0 || u < det)return 0; /* calculate V parameter and test bounds */ v = r->_dir.Dot(qvec); //if(bOutputIntersect)printf("u: %f v: %f vu: %f det: %f\n",u,v,u+v,det); if (v > 0.0 || u + v < det)return 0; } else { return 0; /* ray is parallell to the plane of the triangle */ } t = edge2.Dot(qvec) * inv_det; u *= inv_det; v *= inv_det; return t; }
By now you’ve probably noticed the _ply variable. This is merely a struct that contains a list of vertices and faces for our 3D object, the one we also fed to OcTree. Ply is of type Object from my 3DIOt header files, also located in my repo. (Ply is a file type developed at Stanford.)
And that’s it in a nutshell. Again the code should be posted by this evening. If you’ve never used my site before please use the ‘SVN Access’ tab on the left side of this blog for instructions on how to access the repository.
Peace,
Dakota
Edit[9/29/08 10:27 est]: Small edit. If you decide to look at my OcTree code, follow the code for “SplitLinear.” Plain ol’ recursive “Split” can cause stack overflow on small bucket sizes. There might also be problems if you have numerous points at origin. The OcTree code builds up a boundingbox from all the data that split from the parent. A zero size bounding box may cause the code to enter an infinite loop. I’m fairly sure that problem has been fixed but be aware of it.
Edit2[10/2/08 21:03 est]: The OcTree code hasn’t been posted yet, only the precompiled .lib file and headers. My apologies I just haven’t had the wherewithal to pull the code from my other desktop which is a standalone machine, not on the net…. don’t ask. It’s like the computer in Mission Impossible with no outside lines only mine is much more lame.
Edit3[10/16/08 11:03 est]: OcTree code has been uploaded. Sorry for the delay.
Edit4[4/02/09 10:55 est]: I don’t know what happened to my code above. It used to be formatted correctly but wordpress seems to have removed the carriage returns. Sorry.
,



